如何为跨区域GKE集群中的SFTP服务配置GCP全局负载均衡？

阿华AIGC实验室

2026-5-13

跨区域GKE集群SFTP服务：全局负载均衡+故障切换方案

Got it, let's tackle your problem head-on. You need global load balancing with failover for TCP-based SFTP services across two regional GKE clusters, and you can't use kubemci (since it's HTTP-only) or static GCE internal IPs (due to frequent cluster updates). Here are two solid solutions, with the first being the most aligned with your dynamic cluster needs:

方案一：Global TCP Proxy Load Balancer + Serverless NEGs（强烈推荐）

This approach leverages GCP's Serverless Network Endpoint Groups (NEGs) to directly tie into your GKE SFTP Services. The best part? NEGs automatically sync with your Pod endpoints as they're created, destroyed, or upgraded—no manual IP management required, which is perfect for your frequently changing cluster.

Step-by-Step Implementation

Set up your SFTP Service in each GKE cluster
First, create a ClusterIP Service for your SFTP deployment (this keeps internal traffic secure; NodePort works too but isn't necessary here). Make sure it listens on SFTP's default port 22:

apiVersion: v1
kind: Service
metadata:
  name: sftp-service
  namespace: your-namespace
spec:
  selector:
    app: sftp # Match your SFTP deployment's labels
  ports:
    - port: 22
      targetPort: 22
      protocol: TCP
  type: ClusterIP

Create a Serverless NEG for each cluster
Use gcloud or the GCP Console to create a NEG linked to each cluster's SFTP Service. The NEG will automatically track all active Pods behind the Service, so you never have to update IP lists manually.

# For your us-central1 cluster
gcloud compute network-endpoint-groups create sftp-neg-us-central1 \
  --region=us-central1 \
  --network-endpoint-type=SERVERLESS \
  --gke-cluster=your-us-central1-cluster \
  --gke-namespace=your-namespace \
  --gke-service=sftp-service

# For your europe-west1 cluster
gcloud compute network-endpoint-groups create sftp-neg-europe-west1 \
  --region=europe-west1 \
  --network-endpoint-type=SERVERLESS \
  --gke-cluster=your-europe-west1-cluster \
  --gke-namespace=your-namespace \
  --gke-service=sftp-service

Build a global Backend Service with health checks
Create a global Backend Service that includes both NEGs, and add a TCP health check to monitor SFTP port 22. This health check is what enables automatic failover—if one cluster's SFTP instances go down, the LB stops sending traffic there.

# First, create the TCP health check for SFTP
gcloud compute health-checks create tcp sftp-tcp-health-check \
  --port=22 \
  --check-interval=5s \ # How often we check health
  --timeout=5s \ # How long to wait for a response
  --unhealthy-threshold=3 \ # Mark as unhealthy after 3 failures
  --healthy-threshold=2 \ # Mark as healthy after 2 successes

# Create the global Backend Service
gcloud compute backend-services create sftp-global-backend \
  --global \
  --load-balancing-scheme=EXTERNAL \
  --protocol=TCP \
  --health-checks=sftp-tcp-health-check \
  --enable-logging # Optional, but helpful for debugging

# Add both NEGs as backends
gcloud compute backend-services add-backend sftp-global-backend \
  --global \
  --network-endpoint-group=sftp-neg-us-central1 \
  --network-endpoint-group-region=us-central1 \
  --balancing-mode=UTILIZATION \
  --max-utilization=0.8 # Adjust based on your capacity needs

gcloud compute backend-services add-backend sftp-global-backend \
  --global \
  --network-endpoint-group=sftp-neg-europe-west1 \
  --network-endpoint-group-region=europe-west1 \
  --balancing-mode=UTILIZATION \
  --max-utilization=0.8

Set up the Global TCP Proxy Load Balancer
Reserve a global static IP (so your users have a consistent endpoint) and create a Forwarding Rule that ties this IP to your Backend Service on port 22:

# Reserve a global static IP address
gcloud compute addresses create sftp-global-ip --global

# Create the forwarding rule to route traffic to your backend
gcloud compute forwarding-rules create sftp-global-forwarding-rule \
  --global \
  --load-balancing-scheme=EXTERNAL \
  --address=sftp-global-ip \
  --ports=22 \
  --backend-service=sftp-global-backend

Why this works for you:

Zero manual IP management: NEGs auto-update as your GKE Pods scale or upgrade—no more chasing changing IPs.
Seamless failover: The TCP health check ensures traffic only goes to healthy clusters.
Global performance: GCP's Global TCP LB routes users to the closest healthy cluster by default, balancing speed and reliability.

方案二：Global TCP Proxy Load Balancer + Regional GKE LoadBalancer Services

If you're working with an older GKE cluster that doesn't support Serverless NEGs, this fallback option uses regional GKE LoadBalancer Services as backends for the global LB.

Step-by-Step Implementation

Create regional LoadBalancer Services in each cluster
Deploy a LoadBalancer Service for SFTP in each cluster—this gives you a regional external IP for each cluster's SFTP service:

apiVersion: v1
kind: Service
metadata:
  name: sftp-service
  namespace: your-namespace
spec:
  selector:
    app: sftp
  ports:
    - port: 22
      targetPort: 22
      protocol: TCP
  type: LoadBalancer

Build the global Backend Service with regional LB IPs
Create the same TCP health check as before, then build a global Backend Service that includes the regional LB IPs as backends:

# Reuse the health check from方案一
gcloud compute backend-services create sftp-global-backend \
  --global \
  --load-balancing-scheme=EXTERNAL \
  --protocol=TCP \
  --health-checks=sftp-tcp-health-check

# Add your regional LB IPs (replace with your actual Service IPs)
gcloud compute backend-services add-backend sftp-global-backend \
  --global \
  --ip-addresses=1.2.3.4 \ # IP of us-central1's LoadBalancer Service
  --balancing-mode=RATE \
  --max-rate-per-endpoint=1000 # Adjust based on traffic needs

gcloud compute backend-services add-backend sftp-global-backend \
  --global \
  --ip-addresses=5.6.7.8 \ # IP of europe-west1's LoadBalancer Service
  --balancing-mode=RATE \
  --max-rate-per-endpoint=1000

Set up the Global TCP Proxy Load Balancer
Same as方案一: reserve a global static IP and create a Forwarding Rule to route port 22 traffic to your Backend Service.

Caveats to note:

Manual IP updates: If your regional LoadBalancer Service is recreated (e.g., during cluster upgrades), its IP might change—you'll need to manually update the Backend Service with the new IP.
Extra overhead: Each cluster requires a regional LB, adding minor cost and management complexity.

Key Things to Remember

Firewall rules: Make sure you allow GCP's health check IP ranges to access port 22 on your SFTP Pods/Service, and allow incoming traffic from the global LB to your clusters.
Permissions: Ensure your GCP account has roles like roles/compute.loadBalancerAdmin and roles/container.admin to manage these resources.
Health check tuning: Adjust the health check intervals/thresholds to match your tolerance for downtime—shorter intervals mean faster failover, but more frequent checks.

内容的提问来源于stack exchange，提问作者Amit Yadav