如何用Python程序化下载Google Drive指定文件并获取文件ID

阿华AIGC实验室

2026-5-7

批量下载Google Drive中CSV指定路径的文件

首先，你遇到的drive_service is not defined错误很明确——代码里用到了drive_service但没有初始化Google Drive API的服务实例。另外，Google Drive API是通过文件ID来操作文件的，不能直接用你CSV里的路径，所以我们需要先解决这两个核心问题：初始化API服务、从路径映射到文件ID，然后再实现批量下载。

下面是完整的解决方案，分步骤来：

1. 准备工作：安装依赖并配置API权限

首先安装需要的Python库：

pip install google-api-python-client google-auth-httplib2 google-auth-oauthlib pandas

然后你需要在Google Cloud Console里创建一个项目，启用Google Drive API，并下载credentials.json文件（具体步骤：创建项目 → 搜索并启用Drive API → 创建OAuth客户端ID → 下载JSON文件），把这个文件放到你的脚本所在目录。

2. 初始化Google Drive服务实例

这个函数会帮你完成认证，生成可用的drive_service，解决未定义的问题：

from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
import os
import pickle

# 权限范围，只读权限足够，因为我们只下载文件
SCOPES = ['https://www.googleapis.com/auth/drive.readonly']

def get_drive_service():
    creds = None
    # token.pickle会保存你的授权信息，首次认证后自动生成
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # 处理凭据过期或不存在的情况
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)  # 替换成你的credentials.json路径
            creds = flow.run_local_server(port=0)
        # 保存凭据供下次使用
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    # 创建并返回Drive服务实例
    service = build('drive', 'v3', credentials=creds)
    return service

3. 根据文件路径获取Drive文件ID

Google Drive不支持直接通过路径访问文件，所以我们需要写一个函数，从根目录开始递归查找路径对应的文件ID：

def get_file_id(service, file_path):
    # 拆分路径，跳过CSV里的/content/drive/前缀
    path_parts = file_path.split('/')
    if path_parts[:3] == ['', 'content', 'drive']:
        path_parts = path_parts[3:]
    
    parent_id = 'root'  # My Drive的根目录ID
    for part in path_parts:
        if not part:
            continue
        # 查询当前父目录下匹配名称的项
        results = service.files().list(
            q=f"'{parent_id}' in parents and name='{part}' and trashed=false",
            fields="files(id, name, mimeType)"
        ).execute()
        items = results.get('files', [])
        if not items:
            print(f"警告：找不到路径片段 '{part}'，跳过该文件")
            return None
        # 更新父目录ID为当前项的ID
        parent_id = items[0]['id']
        # 如果当前项是文件，直接返回ID
        if items[0]['mimeType'] != 'application/vnd.google-apps.folder':
            return parent_id
    # 如果遍历完路径最后还是文件夹，说明路径不对
    return None

4. 批量下载CSV中的文件

结合上面的函数，读取CSV并逐个下载文件：

import pandas as pd
from googleapiclient.http import MediaIoBaseDownload
import io

def download_file(service, file_id, save_path):
    fh = io.BytesIO()
    request = service.files().get_media(fileId=file_id)
    downloader = MediaIoBaseDownload(fh, request)
    done = False
    while done is False:
        status, done = downloader.next_chunk()
        print(f"下载中：{int(status.progress() * 100)}%")
    # 将内存中的文件写入本地
    os.makedirs(os.path.dirname(save_path), exist_ok=True)
    with open(save_path, 'wb') as f:
        fh.seek(0)
        f.write(fh.read())
    print(f"文件已保存到：{save_path}")

# 主程序入口
if __name__ == '__main__':
    # 初始化Drive服务
    drive_service = get_drive_service()
    # 读取CSV文件（根据你的CSV结构，第0列是文件路径）
    df = pd.read_csv("/home/ram/Downloads/Data_Science/Kaggle Competition/BBox_List_2017_path_colab.csv", header=None)
    
    # 遍历每一行的文件路径
    for idx, row in df.iterrows():
        file_path = row[0]
        print(f"处理文件：{file_path}")
        # 获取文件ID
        file_id = get_file_id(drive_service, file_path)
        if not file_id:
            continue
        # 设置本地保存路径（这里用原文件名保存到当前目录的downloads文件夹）
        filename = file_path.split('/')[-1]
        save_path = f"./downloads/{filename}"
        # 下载文件
        download_file(drive_service, file_id, save_path)

重要提示

首次运行脚本时，会弹出浏览器让你登录Google账号授权，授权后会生成token.pickle，后续运行无需再次登录。
因为你有10万个文件，API调用会比较频繁，Google Drive API的免费配额是每天10000次请求，所以可能需要分多天处理，或者在Cloud Console申请提高配额。
建议先测试少量文件（比如前10个），确认流程没问题后再批量运行，避免出错。
可以考虑把获取到的文件ID和路径对应关系保存到一个新的CSV里，下次下载时直接用ID，避免重复调用API查找路径，节省时间和配额。

内容的提问来源于stack exchange，提问作者Fasty