如何配置独立Prometheus Server监控多环境及Kubernetes集群
嗨,这个场景我刚好实操过,用独立的Prometheus Server监控外部K8s集群其实没那么复杂,核心就是搞定Prometheus和K8s API Server的权限认证,再配置好服务发现规则就行。下面一步步给你拆解:
1. 先在Kubernetes集群中配置RBAC权限
外部Prometheus需要通过K8s API Server获取集群资源的监控信息,所以得先给它分配足够的访问权限。创建一个RBAC配置文件:
# prometheus-external-rbac.yaml apiVersion: v1 kind: ServiceAccount metadata: name: prometheus-external namespace: monitoring --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus-external rules: - apiGroups: [""] resources: - nodes - nodes/proxy - services - endpoints - pods verbs: ["get", "list", "watch"] - apiGroups: ["extensions", "apps"] resources: - ingresses verbs: ["get", "list", "watch"] - nonResourceURLs: ["/metrics"] verbs: ["get"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus-external roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus-external subjects: - kind: ServiceAccount name: prometheus-external namespace: monitoring
执行命令应用配置:kubectl apply -f prometheus-external-rbac.yaml
2. 获取K8s API Server的访问凭证
接下来要拿到Prometheus访问API Server的必要信息:
- 获取ServiceAccount的Token:
先查对应的secret:kubectl get secrets -n monitoring | grep prometheus-external
然后解码Token:kubectl get secret <你的secret名称> -n monitoring -o jsonpath='{.data.token}' | base64 -d - 获取API Server地址:
kubectl cluster-info | grep 'Kubernetes control plane'
3. 配置Prometheus的scrape规则(prometheus.yml)
这是核心配置,既要覆盖K8s集群的自动发现,也要添加DevOps服务器、QA/Prod部署的静态监控目标:
global: scrape_interval: 15s evaluation_interval: 15s scrape_configs: # 监控Prometheus自身 - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] # 监控DevOps服务器(静态目标,假设已部署node-exporter) - job_name: 'devops-servers' static_configs: - targets: ['devops-server-ip:9100'] # 自动发现K8s集群节点 - job_name: 'kubernetes-nodes' kubernetes_sd_configs: - api_server: 'https://<你的K8s-API-Server地址>:6443' bearer_token: '<刚才拿到的ServiceAccount Token>' tls_config: # 生产环境请替换为CA证书路径,不要跳过验证 insecure_skip_verify: true relabel_configs: - source_labels: [__meta_kubernetes_node_name] action: replace target_label: kubernetes_node # 自动发现带监控注解的K8s Pod - job_name: 'kubernetes-pods' kubernetes_sd_configs: - api_server: 'https://<你的K8s-API-Server地址>:6443' bearer_token: '<刚才拿到的ServiceAccount Token>' tls_config: insecure_skip_verify: true relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 # 监控QA部署(如果是外部服务就用静态目标,K8s内服务靠上面的自动发现) - job_name: 'qa-deployment' static_configs: - targets: ['qa-app-ip:9091'] # 监控Prod部署 - job_name: 'prod-deployment' static_configs: - targets: ['prod-app-ip:9091']
注意:生产环境要把
tls_config里的insecure_skip_verify替换为ca_file: '/path/to/k8s-ca.crt',用K8s的CA证书做安全验证。
4. 验证配置并重启Prometheus
先检查配置文件是否合法:promtool check config prometheus.yml
如果没有报错,重启Prometheus服务(以systemd管理为例):systemctl restart prometheus
5. 额外注意事项
- 确保K8s集群的API Server端口(默认6443)能被外部Prometheus服务器访问,必要时在防火墙/安全组开放端口
- 对于K8s内需要监控的Pod,要添加
prometheus.io/scrape: "true"、prometheus.io/port: "你的metrics端口"这类注解,Prometheus才会自动发现它们 - 如果需要监控K8s资源状态(比如Deployment、Service的状态),记得在K8s集群内部署
kube-state-metrics,然后通过Service自动发现加入监控
内容的提问来源于stack exchange,提问作者anujkum




