Python3.6下载图片遇urlretrieve 503及403错误求助
Fixing 403/503 Errors When Downloading Images with urllib/requests in Python 3.6
Hey fellow developer! I see you're hitting 403 Forbidden and 503 Service Unavailable errors while trying to download that image from http://pic.minitoon.net/albums/2819/01-01/01_000.jpg using urllib (and even after switching to requests). Let's break down why this is happening and how to fix it—these issues are super common with anti-scraping measures, so we've got this.
Why You're Seeing These Errors
- 403 Forbidden: Most likely, the server is detecting your request as coming from a script (not a real browser) and blocking it. Servers often check for missing
User-Agentheaders or non-human request patterns to flag traffic. - 503 Service Unavailable: This could be temporary server overload, but more often, it's the server throttling repeated suspicious requests. Even if you fixed the 403, your request pattern might still trigger rate limits.
Fix 1: Improve urllib Requests with Proper Headers & Retries
Let's update your urllib code to mimic a browser and add retry logic for 503 errors. Here's a working example for Python 3.6:
import urllib.request from urllib.error import HTTPError import time def download_image_with_urllib(url, save_path, retries=3): # Mimic a real browser's request headers headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36', 'Accept': 'image/webp,image/apng,image/*,*/*;q=0.8', 'Accept-Language': 'en-US,en;q=0.9', 'Referer': 'http://pic.minitoon.net/' # Add the site's base URL as referer } req = urllib.request.Request(url, headers=headers) for attempt in range(retries): try: with urllib.request.urlopen(req) as response: with open(save_path, 'wb') as f: f.write(response.read()) print(f"Image downloaded successfully to {save_path}") return except HTTPError as e: if e.code == 503 and attempt < retries - 1: print(f"503 Error encountered. Retrying in 2 seconds... (Attempt {attempt+1}/{retries})") time.sleep(2) else: print(f"Failed to download image: {e}") raise # Usage download_image_with_urllib( 'http://pic.minitoon.net/albums/2819/01-01/01_000.jpg', 'downloaded_image.jpg' )
Fix 2: Use Requests with Session & Retry Adapter (More Robust)
Requests is easier to work with for these scenarios. Let's set up a session with persistent headers and an adapter that automatically retries 503 errors—Python 3.6 fully supports this:
import requests from requests.adapters import HTTPAdapter from urllib3.util.retry import Retry def download_image_with_requests(url, save_path): # Create a session to persist headers and connections session = requests.Session() # Configure retry logic for 503 errors retry_strategy = Retry( total=3, backoff_factor=1, # Wait 1, 2, 4 seconds between retries status_forcelist=[503], ) adapter = HTTPAdapter(max_retries=retry_strategy) session.mount('http://', adapter) session.mount('https://', adapter) # Browser-like headers headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36', 'Accept': 'image/webp,image/apng,image/*,*/*;q=0.8', 'Accept-Language': 'en-US,en;q=0.9', 'Referer': 'http://pic.minitoon.net/' } try: response = session.get(url, headers=headers, stream=True) response.raise_for_status() # Raise exception for HTTP errors with open(save_path, 'wb') as f: for chunk in response.iter_content(chunk_size=8192): f.write(chunk) print(f"Image downloaded successfully to {save_path}") except requests.exceptions.RequestException as e: print(f"Failed to download image: {e}") raise # Usage download_image_with_requests( 'http://pic.minitoon.net/albums/2819/01-01/01_000.jpg', 'downloaded_image.jpg' )
Additional Tips
- Check for Cookies: Some sites require cookies to be set (e.g., after visiting the homepage). Use the session object in requests to first visit
http://pic.minitoon.net/to capture cookies before downloading the image. - Avoid Rate Limiting: If you're downloading multiple images, add small delays between requests (
time.sleep(1)or similar) to avoid triggering 503s. - Proxy Servers: If you're still blocked, the site might be IP-blocking you. Using a proxy could help, but that's more advanced—try the above steps first.
内容的提问来源于stack exchange,提问作者Arun




