Google Images Download爬取图片代码失效的解决方法咨询
Hey there, I’ve dealt with this exact problem before! The google_images_download library hasn’t been maintained since around 2020, and Google has updated the structure of their image search page multiple times since then—so the old crawler logic is no longer compatible. Here are the most reliable fixes:
1. Switch to an Actively Maintained Alternative (Highly Recommended)
The best long-term solution is to replace the outdated library with icrawler, a well-maintained tool that supports Google Image Search and other sources. It’s way more flexible and less likely to break suddenly.
Steps:
- First install icrawler via pip:
pip install icrawler - Rewrite your crawling function like this:
from icrawler.builtin import GoogleImageCrawler def ImageCrawling(keyword, dir_path): # Initialize crawler with target directory crawler = GoogleImageCrawler(storage={"root_dir": dir_path}) # Crawl up to 2 images for the keyword crawler.crawl(keyword=keyword, max_num=2, file_idx_offset=0) # Call the function (same as your original) ImageCrawling('dog', 'C:\\nuguya')
This code will handle downloading images directly to your target folder, and you can add extra filters like image size, type, or date range if needed.
2. Try a Forked Version of the Original Library (Temporary Fix)
Some developers have forked google_images_download and updated it to work with Google’s current image page. You can install one of these forks via pip:
pip install git+https://github.com/Joeclinton1/google-images-download.git
Keep in mind this is a temporary fix—Google could change their page again anytime, and the fork might not get updated.
3. Check for Local Environment Issues
Sometimes sudden failures are caused by changes in your setup:
- Verify your Python version matches the one you used when the code worked (e.g., upgrading from 3.7 to 3.11 might introduce compatibility issues)
- Reinstall the original library to fix possible dependency conflicts:
pip uninstall google_images_download pip install google_images_download - Check if your IP has been temporarily blocked by Google (if you’ve been crawling frequently). Try switching networks or adding a proxy to your requests.
4. Use Google’s Official Custom Search API (Most Stable)
If you need a reliable, long-term solution that won’t get blocked, use Google’s official Custom Search API. Note that this requires a Google Cloud account (you get a free monthly quota, beyond which you’ll pay small fees).
Example Code:
import requests import os def download_images(keyword, dir_path, api_key, search_engine_id, num_images=2): # Create target directory if it doesn't exist os.makedirs(dir_path, exist_ok=True) # API request URL api_url = f"https://www.googleapis.com/customsearch/v1?q={keyword}&cx={search_engine_id}&searchType=image&key={api_key}&num={num_images}" # Fetch image URLs response = requests.get(api_url).json() image_items = response.get('items', []) # Download each image for idx, item in enumerate(image_items): img_url = item['link'] img_data = requests.get(img_url).content with open(os.path.join(dir_path, f"{keyword}_{idx+1}.jpg"), 'wb') as img_file: img_file.write(img_data) # Replace with your own API key and search engine ID (get from Google Cloud Console) download_images('dog', 'C:\\nuguya', 'YOUR_GOOGLE_API_KEY', 'YOUR_SEARCH_ENGINE_ID')
内容的提问来源于stack exchange,提问作者변구훈




