如何将GCP Cloud Storage存储桶的CSV文件传输至FTP服务器？能否用BigQuery？

阿华AIGC实验室

2026-5-7

Transferring CSV Files from GCP Cloud Storage to an FTP Server

Great question! Let’s break down your queries and walk through practical, scalable solutions:

Can BigQuery directly transfer files to an FTP server?

Short answer: No. BigQuery is built as a fully managed data warehouse, and its native export capabilities are limited to:

Exporting query results or table data to Cloud Storage (in formats like CSV, JSON, Parquet)
Exporting to local machines via the bq command-line tool
Integrating with other GCP services (like Dataflow for data processing pipelines)

There’s no built-in feature to send data directly from BigQuery to an external FTP server. You’ll need an intermediate step to move the data—either first exporting BigQuery data to Cloud Storage, then transferring from Cloud Storage to FTP, or working directly with existing Cloud Storage CSV files.

Practical Solutions to Transfer CSV from Cloud Storage to FTP

Below are the most common approaches using GCP’s ecosystem:

1. Cloud Functions (Serverless, Event-Driven)

This is perfect if you want to automate transfers whenever a new CSV is uploaded to Cloud Storage, or run scheduled transfers on a timeline.

Steps:

Create a Cloud Function triggered by:
- Cloud Storage Object Finalize: Runs automatically when a new CSV is uploaded to your bucket
- Cloud Scheduler: Runs on a fixed schedule (e.g., daily at midnight)
In the function code, use an FTP client library to:
1. Download the CSV file from Cloud Storage to the function’s temporary filesystem (or stream it directly to avoid storing it locally)
2. Upload the file to your FTP server

Example Python Snippet:

import ftplib
from google.cloud import storage

def transfer_to_ftp(event, context):
    # Initialize GCS client
    storage_client = storage.Client()
    bucket = storage_client.bucket(event['bucket'])
    blob = bucket.blob(event['name'])
    
    # Stream file directly to FTP without saving to disk
    with blob.open("rb") as gcs_file:
        # Connect to FTP server
        ftp = ftplib.FTP_TLS("ftp.your-server.com")
        ftp.login("your-username", "your-password")
        ftp.prot_p()  # Enable secure data connection
        ftp.cwd("/target-ftp-directory")
        
        # Upload the file
        ftp.storbinary(f"STOR {event['name']}", gcs_file)
    
    ftp.quit()

Notes:

Assign the Cloud Function service account the roles/storage.objectViewer permission on your Cloud Storage bucket
Ensure your FTP server allows inbound connections from GCP’s outgoing IP ranges (or use VPC peering/VPN for private FTP servers)
Use FTP_TLS() to enable FTPS and encrypt data in transit

2. Compute Engine VM (Batch/Heavy Transfers)

If you’re dealing with large files or need full control over the transfer process (like custom retry logic or batch processing), a Compute Engine VM is a reliable choice.

Steps:

Spin up a small VM instance (e.g., e2-micro) using GCP’s default Debian/Ubuntu image
Install necessary tools:
- gsutil (pre-installed on GCP images) for accessing Cloud Storage
- lftp (a robust FTP client) via sudo apt install lftp

Write a shell script to automate transfers:

#!/bin/bash
# Download all CSVs from GCS to temp directory
gsutil cp gs://your-bucket/path/to/*.csv /tmp/ftp-transfers/

# Upload to FTP server with error handling
lftp -u your-username,your-password ftps://your-server.com << EOF
set ssl:verify-certificate no  # Skip if your server uses self-signed certs
cd /target-ftp-folder
mput /tmp/ftp-transfers/*.csv
quit
EOF

# Clean up temp files
rm /tmp/ftp-transfers/*.csv

Use cron to schedule the script (e.g., add 0 0 * * * /path/to/your/script.sh to run daily at midnight)

3. Cloud Run (Long-Running Serverless Tasks)

Similar to Cloud Functions, but better suited for transfers that exceed Cloud Functions’ maximum execution time (currently 9 minutes for most regions). It’s ideal for large file transfers that need more time.

Steps:

Build a container image that includes gsutil and an FTP client (e.g., lftp)
Deploy the container to Cloud Run, assigning it the roles/storage.objectViewer permission for your Cloud Storage bucket
Trigger the service via HTTP requests or Cloud Scheduler to initiate transfers

Key Considerations

Security: Always use FTPS or SFTP instead of plain FTP to encrypt your data during transfer
Error Handling: Add retry logic and enable Cloud Logging to track and debug transfer failures
Large Files: For files over 1GB, use streaming (like the Python example above) to avoid consuming too much memory
Cost: All these approaches are cost-efficient—Cloud Functions/Run only charge for execution time, while Compute Engine VMs can be stopped when not in use to save costs

内容的提问来源于stack exchange，提问作者Tushar Shinde