You need to enable JavaScript to run this app.
导航

手动创建告警规则

最近更新时间2024.04.08 14:31:12

首次发布时间2023.02.07 20:20:49

本文为您介绍如何手动创建告警规则。

背景信息

手动创建告警规则需要使用标准 PromQL 语句配置监控对象或指标,本节为您介绍 API 网关支持配置告警的监控指标及每个监控指标对应的 PromQL 语句。

  • 实例级别监控指标对应的 PromQL 语句

说明

下文所列 PromQL 语句,均表示对 API 网关下所有实例的对应指标进行监控。如果您需要监控特定实例,请将apig_io_gateway_id=~".*"的值替换为对应的实例 ID。

# 平均响应时间
sum(rate(istio_request_duration_milliseconds_sum{apig_io_gateway_id=~".*"}[1m])) by (apig_io_gateway_id) / sum(rate(istio_request_duration_milliseconds_count{apig_io_gateway_id=~".*"}[1m])) by (apig_io_gateway_id)

# P90 响应时间
histogram_quantile(0.90, sum(rate(istio_request_duration_milliseconds_bucket{apig_io_gateway_id=~".*"}[1m])) by (le, apig_io_gateway_id))

# P99 响应时间
histogram_quantile(0.99, sum(rate(istio_request_duration_milliseconds_bucket{apig_io_gateway_id=~".*"}[1m])) by (le, apig_io_gateway_id))

# HTTP-4XX 错误次数
sum(increase(istio_requests_total{apig_io_gateway_id=~".*",response_code=~"4.*"}[1m])) by (apig_io_gateway_id)

# HTTP-5XX 错误次数
sum(increase(istio_requests_total{apig_io_gateway_id=~".*",response_code=~"5.*"}[1m])) by (apig_io_gateway_id)

# QPS
sum(rate(istio_requests_total{apig_io_gateway_id=~".*"}[1m])) by (apig_io_gateway_id)

# HTTP 响应报文大小
sum(rate(istio_response_bytes_sum{apig_io_gateway_id=~".*"}[1m])) by (apig_io_gateway_id) / sum(rate(istio_response_bytes_count{apig_io_gateway_id=~".*"}[1m])) by (apig_io_gateway_id)

# HTTP 请求报文大小
sum(rate(istio_request_bytes_sum{apig_io_gateway_id=~".*"}[1m])) by (apig_io_gateway_id) / sum(rate(istio_request_bytes_count{apig_io_gateway_id=~".*"}[1m])) by (apig_io_gateway_id)

# 入口公网带宽
apig_loadbalancer_in_bytes{apig_io_gateway_id=~".*"}

# 出口公网带宽
apig_loadbalancer_out_bytes{apig_io_gateway_id=~".*"}

# 新建连接数
apig_loadbalancer_new_connections{apig_io_gateway_id=~".*"}
  • 服务级别监控指标对应的 PromQL 语句

说明

下文所列 PromQL 语句,均表示对 API 网关下所有实例的所有服务对应指标进行监控。如果您需要监控特定实例下的特定服务,请将apig_io_gateway_id=~".*"的值替换为对应的实例 ID,将request_service_name=~".*"的值替换为对应[服务名称].[服务 ID]

示例:查询某实例(实例 ID gcek****)下某服务(服务名app,服务 ID scer****)的 QPS 指标。

sum(rate(istio_requests_total{request_service_name=~"app.scer****", apig_io_gateway_id=~"gcek****"}[1m])) by (request_service_name)
# 平均响应时间
sum(rate(istio_request_duration_milliseconds_sum{request_service_name=~".*", apig_io_gateway_id=~".*"}[1m])) by (request_service_name) / sum(rate(istio_request_duration_milliseconds_count{request_service_name=~".*", apig_io_gateway_id=~".*"}[1m])) by (request_service_name)

# P90 响应时间
histogram_quantile(0.90, sum(rate(istio_request_duration_milliseconds_bucket{request_service_name=~".*", apig_io_gateway_id=~".*"}[1m])) by (le, request_service_name))

# P99 响应时间
histogram_quantile(0.99, sum(rate(istio_request_duration_milliseconds_bucket{request_service_name=~".*", apig_io_gateway_id=~".*"}[1m])) by (le, request_service_name))

# HTTP-4XX 错误次数
sum(increase(istio_requests_total{request_service_name=~".*",response_code=~"4.*", apig_io_gateway_id=~".*"}[1m])) by (request_service_name)

# HTTP-5XX 错误次数
sum(increase(istio_requests_total{request_service_name=~".*",response_code=~"5.*", apig_io_gateway_id=~".*"}[1m])) by (request_service_name)

# QPS
sum(rate(istio_requests_total{request_service_name=~".*", apig_io_gateway_id=~".*"}[1m])) by (request_service_name)

# HTTP 响应报文大小
sum(rate(istio_response_bytes_sum{request_service_name=~".*", apig_io_gateway_id=~".*"}[1m])) by (request_service_name) / sum(rate(istio_response_bytes_count{request_service_name=~".*", apig_io_gateway_id=~".*"}[1m])) by (request_service_name)

# HTTP 请求报文大小
sum(rate(istio_request_bytes_sum{request_service_name=~".*", apig_io_gateway_id=~".*"}[1m])) by (request_service_name) / sum(rate(istio_request_bytes_count{request_service_name=~".*", apig_io_gateway_id=~".*"}[1m])) by (request_service_name)

前提条件

操作步骤

  1. 登录 API 网关控制台
  2. 在顶部导航栏,选择目标地域。
  3. 在左侧导航栏选择 实例管理,单击目标实例名称,进入实例概览页面。
  4. 单击 监控信息,切换至 监控信息 页签。
  5. 在 监控信息 区域,单击 进入 VMP 管理告警通知,跳转至 VMP 服务控制台的 告警规则 页面。
  6. 单击 创建 > 手动创建,为您需要监控的指标配置告警规则。参数介绍可参见 创建告警规则

说明

配置告警规则时,需要选择与 API 网关绑定的 VMP workspace。您可以在 API 网关控制台 监控信息 区域查看该信息。