You need to enable JavaScript to run this app.
优惠活动
大模型
产品
解决方案
定价
更多
文档控制台
免费开始使用

Kubernetes同一命名空间内服务连接超时问题排查求助

问题:alpha-app连接config-server超时导致探针失败

我部署了两个应用:config-server和作为业务逻辑应用的alpha-app,后者需从前者获取配置数据。两者均运行在本地Kubernetes环境的kubernetes-learning命名空间中,但alpha-app连接config-server时出现连接超时异常,进而导致livenessProbe(存活探针)和readinessProbe(就绪探针)失败。以下是两个应用的YAML配置文件及异常堆栈信息。

config-server.yaml配置

# Config server
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kubernetes-learning-config-server
  namespace: kubernetes-learning
  labels:
    app: kubernetes-learning-config-server
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
  selector:
    matchLabels:
      app: kubernetes-learning-config-server
  template:
    metadata:
      name: kubernetes-learning-config-server
      labels:
        app: kubernetes-learning-config-server
    spec:
      containers:
        - name: kubernetes-learning-config-server
          image: ghcr.io/kubernetes/learning.config-server
          imagePullPolicy: Always
          ports:
            - containerPort: 8888
              protocol: TCP
            - containerPort: 48888
              protocol: TCP
          env:
            - name: BPL_JVM_THREAD_COUNT
              value: "50"
            - name: BPL_DEBUG_ENABLED
              value: "true"
            - name: BPL_DEBUG_PORT
              value: "48888"
            - name: GITHUB_CONFIG_DATA_URL
              value: https://github.com/kubernetes/config-data
            - name: GITHUB_CONFIG_DATA_USERNAME
              value: github_user
            - name: GITHUB_CONFIG_DATA_PERSONAL_ACCESS_TOKEN
              value: github_sampletoken  
          livenessProbe:
            httpGet:
              path: /alpha-app/local
              port: 8888
            initialDelaySeconds: 30
            periodSeconds: 20
            timeoutSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /alpha-app/local
              port: 8888
            initialDelaySeconds: 30
            periodSeconds: 20
            timeoutSeconds: 10
            successThreshold: 1
            failureThreshold: 3

      restartPolicy: Always

# Expose Config server
---
apiVersion: v1
kind: Service
metadata:
  name: kubernetes-learning-config-server
  labels:
    app: kubernetes-learning-config-server
spec:
  type: ClusterIP
  selector:
    app: kubernetes-learning-config-server
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8888

alpha-app.yaml配置

# app applications
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kubernetes-learning-app
  namespace: kubernetes-learning
  labels:
    app: kubernetes-learning-app
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
  selector:
    matchLabels:
      app: kubernetes-learning-app
  template:
    metadata:
      name: kubernetes-learning-app
      labels:
        app: kubernetes-learning-app
    spec:
      containers:
        - name: kubernetes-learning-alpha-app
          image: ghcr.io/kubernetes/learning.alpha-app
          imagePullPolicy: Always
          ports:
            - containerPort: 8441
              protocol: TCP
            - containerPort: 48441
              protocol: TCP
          env:
            - name: BPL_JVM_THREAD_COUNT
              value: "50"
            - name: BPL_DEBUG_ENABLED
              value: "true"
            - name: BPL_DEBUG_PORT
              value: "48441"
            - name: SPRING_PROFILES_ACTIVE
              value: kube
            - name: SPRING_CLOUD_CONFIG_FAIL_FAST
              value: "true"
            - name: SPRING_CLOUD_CONFIG_RETRY_INITIAL_INTERVAL
              value: "1000"
            - name: SPRING_CLOUD_CONFIG_RETRY_MAX_INTERVAL
              value: "10000"
            - name: SPRING_CLOUD_CONFIG_RETRY_MULTIPLIER
              value: "2"
            - name: SPRING_CLOUD_CONFIG_RETRY_MAX_ATTEMPTS
              value: "5"
            - name: SPRING_CLOUD_CONFIG_URI
              value: http://kubernetes-learning-config-server:8888
          livenessProbe:
            httpGet:
              path: /info
              port: 8441
            initialDelaySeconds: 60
            timeoutSeconds: 15
            periodSeconds: 30
            successThreshold: 1
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /info
              port: 8441
            initialDelaySeconds: 60
            timeoutSeconds: 15
            periodSeconds: 30
            successThreshold: 1
            failureThreshold: 3

      restartPolicy: Always


# Expose Config server
---
apiVersion: v1
kind: Service
metadata:
  name: kubernetes-learning-app
  labels:
    app: kubernetes-learning-app
spec:
  type: ClusterIP
  selector:
    app: kubernetes-learning-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8441

异常堆栈信息

Caused by: org.springframework.web.client.ResourceAccessException: I/O error on GET request for "http://kubernetes-learning-config-server:8888/alpha-app/kube": Connect timed out
    at org.springframework.web.client.RestTemplate.createResourceAccessException(RestTemplate.java:926) ~[spring-web-6.2.1.jar:6.2.1]
    at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:906) ~[spring-web-6.2.1.jar:6.2.1]
    at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:801) ~[spring-web-6.2.1.jar:6.2.1]
    at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:683) ~[spring-web-6.2.1.jar:6.2.1]
    at org.springframework.cloud.config.client.ConfigServerConfigDataLoader.getRemoteEnvironment(ConfigServerConfigDataLoader.java:349) ~[spring-cloud-config-client-4.2.0.jar:4.2.0]
    at org.springframework.cloud.config.client.ConfigServerConfigDataLoader.doLoad(ConfigServerConfigDataLoader.java:130) ~[spring-cloud-config-client-4.2.0.jar:4.2.0]
    ... 37 common frames omitted
Caused by: java.net.SocketTimeoutException: Connect timed out
    at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(Unknown Source) ~[na:na]
    at java.base/sun.nio.ch.NioSocketImpl.connect(Unknown Source) ~[na:na]
    at java.base/java.net.Socket.connect(Unknown Source) ~[na:na]
    at java.base/sun.net.NetworkClient.doConnect(Unknown Source) ~[na:na]
    at java.base/sun.net.www.http.HttpClient.openServer(Unknown Source) ~[na:na]
    at java.base/sun.net.www.http.HttpClient.openServer(Unknown Source) ~[na:na]
    at java.base/sun.net.www.http.HttpClient.<init>(Unknown Source) ~[na:na]
    at java.base/sun.net.www.http.HttpClient.New(Unknown Source) ~[na:na]
    at java.base/sun.net.www.http.HttpClient.New(Unknown Source) ~[na:na]
    at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(Unknown Source) ~[na:na]
    at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(Unknown Source) ~[na:na]
    at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source) ~[na:na]
    at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(Unknown Source) ~[na:na]
    at org.springframework.http.client.SimpleClientHttpRequest.executeInternal(SimpleClientHttpRequest.java:79) ~[spring-web-6.2.1.jar:6.2.1]
    at org.springframework.http.client.AbstractStreamingClientHttpRequest.executeInternal(AbstractStreamingClientHttpRequest.java:71) ~[spring-web-6.2.1.jar:6.2.1]
    at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:81) ~[spring-web-6.2.1.jar:6.2.1]
    at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:900) ~[spring-web-6.2.1.jar:6.2.1]
    ... 41 common frames omitted

问题排查与修复方案

1. 服务端口匹配错误

alpha-app的环境变量SPRING_CLOUD_CONFIG_URI配置的是http://kubernetes-learning-config-server:8888,但config-server的Service暴露的集群访问端口是80,而非容器端口8888。Kubernetes Service的port字段是集群内访问的端口,targetPort才是映射到容器的端口。

修复:将alpha-app的SPRING_CLOUD_CONFIG_URI改为http://kubernetes-learning-config-server:80,或者修改config-server的Service,把port字段设为8888,保持与容器端口一致。

2. config-server探针路径有效性验证

config-server的存活/就绪探针路径是/alpha-app/local,但alpha-app请求的是对应kube环境的/alpha-app/kube路径。需确认config-server是否能正确响应/alpha-app/local路径,若该路径不存在,会导致config-server自身探针失败,无法对外提供服务。

验证方式:进入任意一个config-server容器,执行curl http://localhost:8888/alpha-app/local,检查是否返回正常的配置数据。若路径错误,需调整探针路径为正确的配置端点。

3. 网络连通性手动验证

若上述配置修正后仍有问题,可在alpha-app容器内执行以下命令排查网络:

  • 解析服务域名:nslookup kubernetes-learning-config-server,确认能正常解析到Service的ClusterIP
  • 测试端口连通性:telnet kubernetes-learning-config-server 80(或修改后的端口),确认端口能正常建立连接

4. 启动顺序与重试配置优化

alpha-app的SPRING_CLOUD_CONFIG_RETRY_MAX_ATTEMPTS设为5次,总重试时长约31秒(1+2+4+8+16),而探针初始延迟是60秒,理论上足够,但如果config-server启动过慢,仍可能导致连接超时。可适当调大重试次数(如设为10)或延长探针初始延迟(如设为90秒)。

内容的提问来源于stack exchange,提问作者JMD

火山引擎 最新活动