如何为跨区域GKE集群中的SFTP服务配置GCP全局负载均衡?
Got it, let's tackle your problem head-on. You need global load balancing with failover for TCP-based SFTP services across two regional GKE clusters, and you can't use kubemci (since it's HTTP-only) or static GCE internal IPs (due to frequent cluster updates). Here are two solid solutions, with the first being the most aligned with your dynamic cluster needs:
方案一:Global TCP Proxy Load Balancer + Serverless NEGs(强烈推荐)
This approach leverages GCP's Serverless Network Endpoint Groups (NEGs) to directly tie into your GKE SFTP Services. The best part? NEGs automatically sync with your Pod endpoints as they're created, destroyed, or upgraded—no manual IP management required, which is perfect for your frequently changing cluster.
Step-by-Step Implementation
Set up your SFTP Service in each GKE cluster
First, create aClusterIPService for your SFTP deployment (this keeps internal traffic secure; NodePort works too but isn't necessary here). Make sure it listens on SFTP's default port 22:apiVersion: v1 kind: Service metadata: name: sftp-service namespace: your-namespace spec: selector: app: sftp # Match your SFTP deployment's labels ports: - port: 22 targetPort: 22 protocol: TCP type: ClusterIPCreate a Serverless NEG for each cluster
Usegcloudor the GCP Console to create a NEG linked to each cluster's SFTP Service. The NEG will automatically track all active Pods behind the Service, so you never have to update IP lists manually.# For your us-central1 cluster gcloud compute network-endpoint-groups create sftp-neg-us-central1 \ --region=us-central1 \ --network-endpoint-type=SERVERLESS \ --gke-cluster=your-us-central1-cluster \ --gke-namespace=your-namespace \ --gke-service=sftp-service # For your europe-west1 cluster gcloud compute network-endpoint-groups create sftp-neg-europe-west1 \ --region=europe-west1 \ --network-endpoint-type=SERVERLESS \ --gke-cluster=your-europe-west1-cluster \ --gke-namespace=your-namespace \ --gke-service=sftp-serviceBuild a global Backend Service with health checks
Create a global Backend Service that includes both NEGs, and add a TCP health check to monitor SFTP port 22. This health check is what enables automatic failover—if one cluster's SFTP instances go down, the LB stops sending traffic there.# First, create the TCP health check for SFTP gcloud compute health-checks create tcp sftp-tcp-health-check \ --port=22 \ --check-interval=5s \ # How often we check health --timeout=5s \ # How long to wait for a response --unhealthy-threshold=3 \ # Mark as unhealthy after 3 failures --healthy-threshold=2 \ # Mark as healthy after 2 successes # Create the global Backend Service gcloud compute backend-services create sftp-global-backend \ --global \ --load-balancing-scheme=EXTERNAL \ --protocol=TCP \ --health-checks=sftp-tcp-health-check \ --enable-logging # Optional, but helpful for debugging # Add both NEGs as backends gcloud compute backend-services add-backend sftp-global-backend \ --global \ --network-endpoint-group=sftp-neg-us-central1 \ --network-endpoint-group-region=us-central1 \ --balancing-mode=UTILIZATION \ --max-utilization=0.8 # Adjust based on your capacity needs gcloud compute backend-services add-backend sftp-global-backend \ --global \ --network-endpoint-group=sftp-neg-europe-west1 \ --network-endpoint-group-region=europe-west1 \ --balancing-mode=UTILIZATION \ --max-utilization=0.8Set up the Global TCP Proxy Load Balancer
Reserve a global static IP (so your users have a consistent endpoint) and create a Forwarding Rule that ties this IP to your Backend Service on port 22:# Reserve a global static IP address gcloud compute addresses create sftp-global-ip --global # Create the forwarding rule to route traffic to your backend gcloud compute forwarding-rules create sftp-global-forwarding-rule \ --global \ --load-balancing-scheme=EXTERNAL \ --address=sftp-global-ip \ --ports=22 \ --backend-service=sftp-global-backend
Why this works for you:
- Zero manual IP management: NEGs auto-update as your GKE Pods scale or upgrade—no more chasing changing IPs.
- Seamless failover: The TCP health check ensures traffic only goes to healthy clusters.
- Global performance: GCP's Global TCP LB routes users to the closest healthy cluster by default, balancing speed and reliability.
方案二:Global TCP Proxy Load Balancer + Regional GKE LoadBalancer Services
If you're working with an older GKE cluster that doesn't support Serverless NEGs, this fallback option uses regional GKE LoadBalancer Services as backends for the global LB.
Step-by-Step Implementation
Create regional LoadBalancer Services in each cluster
Deploy aLoadBalancerService for SFTP in each cluster—this gives you a regional external IP for each cluster's SFTP service:apiVersion: v1 kind: Service metadata: name: sftp-service namespace: your-namespace spec: selector: app: sftp ports: - port: 22 targetPort: 22 protocol: TCP type: LoadBalancerBuild the global Backend Service with regional LB IPs
Create the same TCP health check as before, then build a global Backend Service that includes the regional LB IPs as backends:# Reuse the health check from方案一 gcloud compute backend-services create sftp-global-backend \ --global \ --load-balancing-scheme=EXTERNAL \ --protocol=TCP \ --health-checks=sftp-tcp-health-check # Add your regional LB IPs (replace with your actual Service IPs) gcloud compute backend-services add-backend sftp-global-backend \ --global \ --ip-addresses=1.2.3.4 \ # IP of us-central1's LoadBalancer Service --balancing-mode=RATE \ --max-rate-per-endpoint=1000 # Adjust based on traffic needs gcloud compute backend-services add-backend sftp-global-backend \ --global \ --ip-addresses=5.6.7.8 \ # IP of europe-west1's LoadBalancer Service --balancing-mode=RATE \ --max-rate-per-endpoint=1000Set up the Global TCP Proxy Load Balancer
Same as方案一: reserve a global static IP and create a Forwarding Rule to route port 22 traffic to your Backend Service.
Caveats to note:
- Manual IP updates: If your regional LoadBalancer Service is recreated (e.g., during cluster upgrades), its IP might change—you'll need to manually update the Backend Service with the new IP.
- Extra overhead: Each cluster requires a regional LB, adding minor cost and management complexity.
Key Things to Remember
- Firewall rules: Make sure you allow GCP's health check IP ranges to access port 22 on your SFTP Pods/Service, and allow incoming traffic from the global LB to your clusters.
- Permissions: Ensure your GCP account has roles like
roles/compute.loadBalancerAdminandroles/container.adminto manage these resources. - Health check tuning: Adjust the health check intervals/thresholds to match your tolerance for downtime—shorter intervals mean faster failover, but more frequent checks.
内容的提问来源于stack exchange,提问作者Amit Yadav




