本文为您介绍如何手动创建告警规则。
手动创建告警规则需要使用标准 PromQL 语句配置监控对象或指标,本节为您介绍 API 网关支持配置告警的监控指标及每个监控指标对应的 PromQL 语句。
说明
下文所列 PromQL 语句,均表示对 API 网关下所有实例的对应指标进行监控。如果您需要监控特定实例,请将apig_io_gateway_id=~".*"
的值替换为对应的实例 ID。
# 平均响应时间 sum(rate(istio_request_duration_milliseconds_sum{apig_io_gateway_id=~".*"}[1m])) by (apig_io_gateway_id) / sum(rate(istio_request_duration_milliseconds_count{apig_io_gateway_id=~".*"}[1m])) by (apig_io_gateway_id) # P90 响应时间 histogram_quantile(0.90, sum(rate(istio_request_duration_milliseconds_bucket{apig_io_gateway_id=~".*"}[1m])) by (le, apig_io_gateway_id)) # P99 响应时间 histogram_quantile(0.99, sum(rate(istio_request_duration_milliseconds_bucket{apig_io_gateway_id=~".*"}[1m])) by (le, apig_io_gateway_id)) # HTTP-4XX 错误次数 sum(increase(istio_requests_total{apig_io_gateway_id=~".*",response_code=~"4.*"}[1m])) by (apig_io_gateway_id) # HTTP-5XX 错误次数 sum(increase(istio_requests_total{apig_io_gateway_id=~".*",response_code=~"5.*"}[1m])) by (apig_io_gateway_id) # QPS sum(rate(istio_requests_total{apig_io_gateway_id=~".*"}[1m])) by (apig_io_gateway_id) # HTTP 响应报文大小 sum(rate(istio_response_bytes_sum{apig_io_gateway_id=~".*"}[1m])) by (apig_io_gateway_id) / sum(rate(istio_response_bytes_count{apig_io_gateway_id=~".*"}[1m])) by (apig_io_gateway_id) # HTTP 请求报文大小 sum(rate(istio_request_bytes_sum{apig_io_gateway_id=~".*"}[1m])) by (apig_io_gateway_id) / sum(rate(istio_request_bytes_count{apig_io_gateway_id=~".*"}[1m])) by (apig_io_gateway_id) # 入口公网带宽 apig_loadbalancer_in_bytes{apig_io_gateway_id=~".*"} # 出口公网带宽 apig_loadbalancer_out_bytes{apig_io_gateway_id=~".*"} # 新建连接数 apig_loadbalancer_new_connections{apig_io_gateway_id=~".*"}
说明
下文所列 PromQL 语句,均表示对 API 网关下所有实例的所有服务对应指标进行监控。如果您需要监控特定实例下的特定服务,请将apig_io_gateway_id=~".*"
的值替换为对应的实例 ID,将request_service_name=~".*"
的值替换为对应[服务名称].[服务 ID]
。
示例:查询某实例(实例 ID gcek****
)下某服务(服务名app
,服务 ID scer****
)的 QPS 指标。
sum(rate(istio_requests_total{request_service_name=~"app.scer****", apig_io_gateway_id=~"gcek****"}[1m])) by (request_service_name)
# 平均响应时间 sum(rate(istio_request_duration_milliseconds_sum{request_service_name=~".*", apig_io_gateway_id=~".*"}[1m])) by (request_service_name) / sum(rate(istio_request_duration_milliseconds_count{request_service_name=~".*", apig_io_gateway_id=~".*"}[1m])) by (request_service_name) # P90 响应时间 histogram_quantile(0.90, sum(rate(istio_request_duration_milliseconds_bucket{request_service_name=~".*", apig_io_gateway_id=~".*"}[1m])) by (le, request_service_name)) # P99 响应时间 histogram_quantile(0.99, sum(rate(istio_request_duration_milliseconds_bucket{request_service_name=~".*", apig_io_gateway_id=~".*"}[1m])) by (le, request_service_name)) # HTTP-4XX 错误次数 sum(increase(istio_requests_total{request_service_name=~".*",response_code=~"4.*", apig_io_gateway_id=~".*"}[1m])) by (request_service_name) # HTTP-5XX 错误次数 sum(increase(istio_requests_total{request_service_name=~".*",response_code=~"5.*", apig_io_gateway_id=~".*"}[1m])) by (request_service_name) # QPS sum(rate(istio_requests_total{request_service_name=~".*", apig_io_gateway_id=~".*"}[1m])) by (request_service_name) # HTTP 响应报文大小 sum(rate(istio_response_bytes_sum{request_service_name=~".*", apig_io_gateway_id=~".*"}[1m])) by (request_service_name) / sum(rate(istio_response_bytes_count{request_service_name=~".*", apig_io_gateway_id=~".*"}[1m])) by (request_service_name) # HTTP 请求报文大小 sum(rate(istio_request_bytes_sum{request_service_name=~".*", apig_io_gateway_id=~".*"}[1m])) by (request_service_name) / sum(rate(istio_request_bytes_count{request_service_name=~".*", apig_io_gateway_id=~".*"}[1m])) by (request_service_name)
说明
配置告警规则时,需要选择与 API 网关绑定的 VMP workspace。您可以在 API 网关控制台 监控信息 区域查看该信息。