使用requests下载tar.gz文件遭Access denied,如何实现Python无授权批量下载?
Absolutely you can automate this with Python— the issue here is that requests isn't sending the same headers (or possibly cookies) that your Firefox browser is, which is why the server is blocking you with "Access denied". Let's break down how to fix this:
1. Mimic Firefox's request headers
Most servers block raw requests calls because they don't have the same identifying headers as a real browser. Here's how to grab the headers Firefox uses:
- Open Firefox's Developer Tools (press F12), switch to the Network tab.
- Trigger a download of one of your tar.gz files, then find the corresponding request in the network list.
- Click into the request, look for the Request Headers section. Copy key headers like
User-Agent,Accept,Accept-Language, and if present,Cookie(this is critical if the site requires a logged-in session).
2. Use requests with the copied headers to download files
Once you have the headers, you can pass them to requests.get() to simulate a browser request. For large files, use streaming to avoid loading the entire file into memory at once.
Here's a reusable function to handle the downloads:
import requests def download_tar_gz(url, save_path): # Replace these with the exact headers from your Firefox request browser_headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:117.0) Gecko/20100101 Firefox/117.0", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate, br", "Connection": "keep-alive", "Upgrade-Insecure-Requests": "1", # Uncomment and add your cookie string if Firefox shows one in headers # "Cookie": "your_session_cookie_here" } try: # Stream the download to handle large files efficiently response = requests.get(url, headers=browser_headers, stream=True) response.raise_for_status() # Throw an error if HTTP status is 4xx/5xx with open(save_path, "wb") as output_file: for chunk in response.iter_content(chunk_size=8192): output_file.write(chunk) print(f"✅ Downloaded successfully: {save_path}") except requests.exceptions.RequestException as e: print(f"❌ Failed to download {url}: {str(e)}") # Example: Batch download all your URLs url_list = [ "https://your-domain.com/file1.tar.gz", "https://your-domain.com/file2.tar.gz", # Add all your URLs here ] for index, url in enumerate(url_list): save_filename = f"downloaded_file_{index+1}.tar.gz" download_tar_gz(url, save_filename)
3. Bonus: Check wget's behavior (if needed)
Since you mentioned wg (I assume you mean wget) works, you can run wget --debug <your-url> to see exactly what headers it sends. You can then mirror those headers in your requests code if the Firefox headers don't work for some reason.
Key Notes
- Always make sure the headers match exactly what Firefox sends— even small differences can trigger access blocks.
- If the site uses session cookies, you can either copy the cookie string from Firefox, or use
requests.Session()to persist a login session if you need to authenticate programmatically.
内容的提问来源于stack exchange,提问作者KOB




