You need to enable JavaScript to run this app.
优惠活动
大模型
产品
解决方案
定价
更多
文档控制台
免费开始使用

如何配置独立Prometheus Server监控多环境及Kubernetes集群

嗨,这个场景我刚好实操过,用独立的Prometheus Server监控外部K8s集群其实没那么复杂,核心就是搞定Prometheus和K8s API Server的权限认证,再配置好服务发现规则就行。下面一步步给你拆解:

1. 先在Kubernetes集群中配置RBAC权限

外部Prometheus需要通过K8s API Server获取集群资源的监控信息,所以得先给它分配足够的访问权限。创建一个RBAC配置文件:

# prometheus-external-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus-external
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-external
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: ["extensions", "apps"]
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus-external
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus-external
subjects:
- kind: ServiceAccount
  name: prometheus-external
  namespace: monitoring

执行命令应用配置:
kubectl apply -f prometheus-external-rbac.yaml

2. 获取K8s API Server的访问凭证

接下来要拿到Prometheus访问API Server的必要信息:

  • 获取ServiceAccount的Token
    先查对应的secret:
    kubectl get secrets -n monitoring | grep prometheus-external
    然后解码Token:
    kubectl get secret <你的secret名称> -n monitoring -o jsonpath='{.data.token}' | base64 -d
  • 获取API Server地址
    kubectl cluster-info | grep 'Kubernetes control plane'
3. 配置Prometheus的scrape规则(prometheus.yml)

这是核心配置,既要覆盖K8s集群的自动发现,也要添加DevOps服务器、QA/Prod部署的静态监控目标:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  # 监控Prometheus自身
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # 监控DevOps服务器(静态目标,假设已部署node-exporter)
  - job_name: 'devops-servers'
    static_configs:
      - targets: ['devops-server-ip:9100']

  # 自动发现K8s集群节点
  - job_name: 'kubernetes-nodes'
    kubernetes_sd_configs:
    - api_server: 'https://<你的K8s-API-Server地址>:6443'
      bearer_token: '<刚才拿到的ServiceAccount Token>'
      tls_config:
        # 生产环境请替换为CA证书路径,不要跳过验证
        insecure_skip_verify: true
    relabel_configs:
    - source_labels: [__meta_kubernetes_node_name]
      action: replace
      target_label: kubernetes_node

  # 自动发现带监控注解的K8s Pod
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
    - api_server: 'https://<你的K8s-API-Server地址>:6443'
      bearer_token: '<刚才拿到的ServiceAccount Token>'
      tls_config:
        insecure_skip_verify: true
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
      action: keep
      regex: true
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
    - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
      action: replace
      target_label: __address__
      regex: ([^:]+)(?::\d+)?;(\d+)
      replacement: $1:$2

  # 监控QA部署(如果是外部服务就用静态目标,K8s内服务靠上面的自动发现)
  - job_name: 'qa-deployment'
    static_configs:
      - targets: ['qa-app-ip:9091']

  # 监控Prod部署
  - job_name: 'prod-deployment'
    static_configs:
      - targets: ['prod-app-ip:9091']

注意:生产环境要把tls_config里的insecure_skip_verify替换为ca_file: '/path/to/k8s-ca.crt',用K8s的CA证书做安全验证。

4. 验证配置并重启Prometheus

先检查配置文件是否合法:
promtool check config prometheus.yml
如果没有报错,重启Prometheus服务(以systemd管理为例):
systemctl restart prometheus

5. 额外注意事项
  • 确保K8s集群的API Server端口(默认6443)能被外部Prometheus服务器访问,必要时在防火墙/安全组开放端口
  • 对于K8s内需要监控的Pod,要添加prometheus.io/scrape: "true"prometheus.io/port: "你的metrics端口"这类注解,Prometheus才会自动发现它们
  • 如果需要监控K8s资源状态(比如Deployment、Service的状态),记得在K8s集群内部署kube-state-metrics,然后通过Service自动发现加入监控

内容的提问来源于stack exchange,提问作者anujkum

火山引擎 最新活动