请求协助:批量下载维基共享资源分类图片并添加元数据描述
Hey there! As someone who’s messed around with scraping Wikimedia Commons for personal projects before, I’ve got a solid, beginner-friendly workflow for you. We’ll use Python (free, easy to set up) and a couple simple libraries to pull down all original images from your target category, grab their official descriptions, and inject those into the image’s metadata. Let’s dive in!
First, make sure you have Python installed (grab it from the official site if you don’t—don’t forget to check the "Add Python to PATH" box during setup!). Then open your terminal/command prompt and install these required libraries:
pip install requests piexif
requests: Handles fetching data from Wikimedia’s API (way more reliable than scraping HTML)piexif: Makes editing image metadata (EXIF) straightforward, no fancy image editing skills needed
Copy this code into a new file named commons_downloader.py. I’ve added comments to explain each part, so you can tweak it for your specific category.
import requests import os import piexif import time # Optional, for rate limiting # -------------------------- # Customize these variables! # -------------------------- TARGET_CATEGORY = "Air Ministry Second World War Official Collection" DOWNLOAD_FOLDER = "wikimedia_air_ministry_images" # Folder where images will save # Create download folder if it doesn't exist os.makedirs(DOWNLOAD_FOLDER, exist_ok=True) def get_all_files_in_category(category_name): """Fetch every file title in the target Wikimedia Commons category""" api_endpoint = "https://commons.wikimedia.org/w/api.php" params = { "action": "query", "list": "categorymembers", "cmtitle": f"Category:{category_name}", "cmtype": "file", "cmlimit": "max", "format": "json" } response = requests.get(api_endpoint, params=params) data = response.json() # Extract just the file titles (e.g., "File:XYZ.jpg") return [item["title"] for item in data["query"]["categorymembers"]] def get_file_details(file_title): """Grab the original image URL and official description for a single file""" api_endpoint = "https://commons.wikimedia.org/w/api.php" params = { "action": "query", "prop": "imageinfo", "iiprop": "url|extmetadata", "titles": file_title, "format": "json" } response = requests.get(api_endpoint, params=params) data = response.json() page_data = next(iter(data["query"]["pages"].values())) image_info = page_data["imageinfo"][0] # Get the original high-res image URL original_image_url = image_info["url"] # Pull the description (fallback to file name if no description exists) description = image_info.get("extmetadata", {}).get("Description", {}).get("value", file_title.replace("File:", "")) return original_image_url, description def download_image_and_add_metadata(image_url, description, save_path): """Download the image and inject the description into its EXIF metadata""" # Download the image in chunks (better for large files) response = requests.get(image_url, stream=True) with open(save_path, "wb") as file: for chunk in response.iter_content(chunk_size=8192): file.write(chunk) # Load existing EXIF data (or create empty if none exists) exif_data = piexif.load(save_path) # Add description to standard Title and Comment metadata fields # These fields are recognized by most photo viewers/editors exif_data["0th"][piexif.ImageIFD.XPTitle] = description.encode("utf-16") exif_data["0th"][piexif.ImageIFD.XPComment] = description.encode("utf-16") # Save the updated metadata back to the image exif_bytes = piexif.dump(exif_data) piexif.insert(exif_bytes, save_path) # Main workflow to run everything if __name__ == "__main__": print(f"Fetching files from category: {TARGET_CATEGORY}") file_titles = get_all_files_in_category(TARGET_CATEGORY) print(f"Found {len(file_titles)} files to process") for index, file_title in enumerate(file_titles, 1): print(f"\nProcessing file {index}/{len(file_titles)}: {file_title}") try: image_url, description = get_file_details(file_title) # Extract the actual file name from the URL file_name = image_url.split("/")[-1] save_location = os.path.join(DOWNLOAD_FOLDER, file_name) download_image_and_add_metadata(image_url, description, save_location) print(f"Successfully saved: {save_location}") # Optional: Add a 1-second delay to avoid hitting Wikimedia's rate limits time.sleep(1) except Exception as error: print(f"Failed to process {file_title}: {str(error)}")
- Open your terminal/command prompt
- Navigate to the folder where you saved
commons_downloader.py - Run this command:
python commons_downloader.py
The script will:
- Fetch all files in your target category
- For each file, grab the original high-res image URL and its official description
- Download the image to your specified folder
- Inject the description into the image’s EXIF title and comment fields (most photo apps will display these as "Title" and "Description")
- If you get a "rate limit exceeded" error, uncomment the
time.sleep(1)line to slow down requests. - Some images might not have a formal description—this script uses the file name as a fallback.
- The metadata fields we’re using work for most Windows/macOS photo viewers. If you need to target other fields (like IPTC), check the piexif documentation for more options.
内容的提问来源于stack exchange,提问作者PJF




