You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

使用requests下载tar.gz文件遭Access denied,如何实现Python无授权批量下载?

Absolutely you can automate this with Python— the issue here is that requests isn't sending the same headers (or possibly cookies) that your Firefox browser is, which is why the server is blocking you with "Access denied". Let's break down how to fix this:

1. Mimic Firefox's request headers

Most servers block raw requests calls because they don't have the same identifying headers as a real browser. Here's how to grab the headers Firefox uses:

  • Open Firefox's Developer Tools (press F12), switch to the Network tab.
  • Trigger a download of one of your tar.gz files, then find the corresponding request in the network list.
  • Click into the request, look for the Request Headers section. Copy key headers like User-Agent, Accept, Accept-Language, and if present, Cookie (this is critical if the site requires a logged-in session).

2. Use requests with the copied headers to download files

Once you have the headers, you can pass them to requests.get() to simulate a browser request. For large files, use streaming to avoid loading the entire file into memory at once.

Here's a reusable function to handle the downloads:

import requests

def download_tar_gz(url, save_path):
    # Replace these with the exact headers from your Firefox request
    browser_headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:117.0) Gecko/20100101 Firefox/117.0",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.5",
        "Accept-Encoding": "gzip, deflate, br",
        "Connection": "keep-alive",
        "Upgrade-Insecure-Requests": "1",
        # Uncomment and add your cookie string if Firefox shows one in headers
        # "Cookie": "your_session_cookie_here"
    }

    try:
        # Stream the download to handle large files efficiently
        response = requests.get(url, headers=browser_headers, stream=True)
        response.raise_for_status()  # Throw an error if HTTP status is 4xx/5xx

        with open(save_path, "wb") as output_file:
            for chunk in response.iter_content(chunk_size=8192):
                output_file.write(chunk)
        
        print(f"✅ Downloaded successfully: {save_path}")
    except requests.exceptions.RequestException as e:
        print(f"❌ Failed to download {url}: {str(e)}")

# Example: Batch download all your URLs
url_list = [
    "https://your-domain.com/file1.tar.gz",
    "https://your-domain.com/file2.tar.gz",
    # Add all your URLs here
]

for index, url in enumerate(url_list):
    save_filename = f"downloaded_file_{index+1}.tar.gz"
    download_tar_gz(url, save_filename)

3. Bonus: Check wget's behavior (if needed)

Since you mentioned wg (I assume you mean wget) works, you can run wget --debug <your-url> to see exactly what headers it sends. You can then mirror those headers in your requests code if the Firefox headers don't work for some reason.

Key Notes

  • Always make sure the headers match exactly what Firefox sends— even small differences can trigger access blocks.
  • If the site uses session cookies, you can either copy the cookie string from Firefox, or use requests.Session() to persist a login session if you need to authenticate programmatically.

内容的提问来源于stack exchange,提问作者KOB

火山引擎 最新活动