如何用Python的requests库自动下载网站导出的Excel文件?
Got it, let's work through how to download that Excel export properly with Python's requests library. Based on common scenarios for web-based Excel exports, here's a step-by-step approach to fix your issue:
Step 1: Extract Key Details from Your Request Screenshot
First, you’ll need to pull critical info from the browser’s developer tools (which your screenshot should show) that tells requests exactly how to replicate the export request:
- Export Request URL: This is not the page URL—it’s the specific endpoint the "Export to Excel" button calls (look for a request with a
GETorPOSTmethod that returns an Excel file). - Request Headers: Pay close attention to
Cookie(for authentication),User-Agent, andReferer—many websites block requests that don’t include these to prevent scraping. - Request Parameters/Form Data: If it’s a
POSTrequest, there’s likely form data (like date ranges, filters) that tells the server what data to export. Copy these key-value pairs exactly. - Response Content-Type: Confirm the response is marked as
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet(for .xlsx) orapplication/vnd.ms-excel(for .xls)—this ensures you’re getting a real Excel file, not HTML.
Step 2: Code Templates for GET/POST Requests
Case 1: Export Uses a GET Request
If your screenshot shows the export is triggered by a GET request, use this template:
import requests # Replace with your actual export URL from the screenshot export_url = "https://your-website.com/export/excel" # Copy headers from your browser's request (update these values!) headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36", "Cookie": "session_id=abc123; auth_token=xyz789", "Referer": "https://your-website.com/dashboard" } # Send request with stream=True to handle large files efficiently response = requests.get(export_url, headers=headers, stream=True) # Save the file if request succeeds if response.status_code == 200: # Try to get the filename from the response header (if provided) content_disposition = response.headers.get("Content-Disposition") filename = content_disposition.split("filename=")[1].strip('"') if content_disposition else "exported_data.xlsx" with open(filename, "wb") as f: for chunk in response.iter_content(chunk_size=8192): f.write(chunk) print(f"Success! Saved Excel file as {filename}") else: print(f"Download failed. Status code: {response.status_code}")
Case 2: Export Uses a POST Request (More Common)
Most export features use POST to send filter criteria to the server. Use this template if your screenshot shows a POST request:
import requests export_url = "https://your-website.com/api/export/excel" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...", "Cookie": "session_id=abc123; auth_token=xyz789", "Referer": "https://your-website.com/dashboard", "Content-Type": "application/x-www-form-urlencoded" # Match the Content-Type from your screenshot } # Copy form data from your browser's request (update these values!) form_data = { "start_date": "2024-01-01", "end_date": "2024-06-30", "export_scope": "all_records" } response = requests.post(export_url, headers=headers, data=form_data, stream=True) if response.status_code == 200: content_disposition = response.headers.get("Content-Disposition") filename = content_disposition.split("filename=")[1].strip('"') if content_disposition else "exported_data.xlsx" with open(filename, "wb") as f: for chunk in response.iter_content(chunk_size=8192): f.write(chunk) print(f"Success! Saved Excel file as {filename}") else: print(f"Download failed. Status code: {response.status_code}") # Debug: Print response text to see if it's an error page # print(response.text)
Common Troubleshooting Tips
- Why your initial attempt failed: You were probably fetching the HTML page instead of the dedicated export endpoint. The Excel export is generated dynamically by the server when you trigger the right request—you can’t just save the page as Excel.
- 403/401 Errors: Double-check your
Cookie(it might have expired—re-copy from your browser). Also ensureUser-AgentandReferermatch exactly what your browser sends. - Corrupted File: If the saved file won’t open, check if the response is actually HTML (print
response.text). This usually means your request is missing parameters or authentication. - CSRF Tokens: Some websites require a CSRF token in the request. Look for a
<meta name="csrf-token" content="xxxx">tag on the page, then add"X-CSRFToken": "xxxx"to your headers.
内容的提问来源于stack exchange,提问作者R-A




