如何用Python的requests库自动下载网站导出的Excel文件？

阿华AIGC实验室

2026-5-11

Got it, let's work through how to download that Excel export properly with Python's requests library. Based on common scenarios for web-based Excel exports, here's a step-by-step approach to fix your issue:

Step 1: Extract Key Details from Your Request Screenshot

First, you’ll need to pull critical info from the browser’s developer tools (which your screenshot should show) that tells requests exactly how to replicate the export request:

Export Request URL: This is not the page URL—it’s the specific endpoint the "Export to Excel" button calls (look for a request with a GET or POST method that returns an Excel file).
Request Headers: Pay close attention to Cookie (for authentication), User-Agent, and Referer—many websites block requests that don’t include these to prevent scraping.
Request Parameters/Form Data: If it’s a POST request, there’s likely form data (like date ranges, filters) that tells the server what data to export. Copy these key-value pairs exactly.
Response Content-Type: Confirm the response is marked as application/vnd.openxmlformats-officedocument.spreadsheetml.sheet (for .xlsx) or application/vnd.ms-excel (for .xls)—this ensures you’re getting a real Excel file, not HTML.

Step 2: Code Templates for GET/POST Requests

Case 1: Export Uses a GET Request

If your screenshot shows the export is triggered by a GET request, use this template:

import requests

# Replace with your actual export URL from the screenshot
export_url = "https://your-website.com/export/excel"

# Copy headers from your browser's request (update these values!)
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
    "Cookie": "session_id=abc123; auth_token=xyz789",
    "Referer": "https://your-website.com/dashboard"
}

# Send request with stream=True to handle large files efficiently
response = requests.get(export_url, headers=headers, stream=True)

# Save the file if request succeeds
if response.status_code == 200:
    # Try to get the filename from the response header (if provided)
    content_disposition = response.headers.get("Content-Disposition")
    filename = content_disposition.split("filename=")[1].strip('"') if content_disposition else "exported_data.xlsx"
    
    with open(filename, "wb") as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)
    print(f"Success! Saved Excel file as {filename}")
else:
    print(f"Download failed. Status code: {response.status_code}")

Case 2: Export Uses a POST Request (More Common)

Most export features use POST to send filter criteria to the server. Use this template if your screenshot shows a POST request:

import requests

export_url = "https://your-website.com/api/export/excel"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...",
    "Cookie": "session_id=abc123; auth_token=xyz789",
    "Referer": "https://your-website.com/dashboard",
    "Content-Type": "application/x-www-form-urlencoded"  # Match the Content-Type from your screenshot
}

# Copy form data from your browser's request (update these values!)
form_data = {
    "start_date": "2024-01-01",
    "end_date": "2024-06-30",
    "export_scope": "all_records"
}

response = requests.post(export_url, headers=headers, data=form_data, stream=True)

if response.status_code == 200:
    content_disposition = response.headers.get("Content-Disposition")
    filename = content_disposition.split("filename=")[1].strip('"') if content_disposition else "exported_data.xlsx"
    
    with open(filename, "wb") as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)
    print(f"Success! Saved Excel file as {filename}")
else:
    print(f"Download failed. Status code: {response.status_code}")
    # Debug: Print response text to see if it's an error page
    # print(response.text)

Common Troubleshooting Tips

Why your initial attempt failed: You were probably fetching the HTML page instead of the dedicated export endpoint. The Excel export is generated dynamically by the server when you trigger the right request—you can’t just save the page as Excel.
403/401 Errors: Double-check your Cookie (it might have expired—re-copy from your browser). Also ensure User-Agent and Referer match exactly what your browser sends.
Corrupted File: If the saved file won’t open, check if the response is actually HTML (print response.text). This usually means your request is missing parameters or authentication.
CSRF Tokens: Some websites require a CSRF token in the request. Look for a <meta name="csrf-token" content="xxxx"> tag on the page, then add "X-CSRFToken": "xxxx" to your headers.

内容的提问来源于stack exchange，提问作者R-A