如何创建爬虫调用API获取caramigo.eu不同地点的车辆数据?
Hey there! Let's figure out how to build a crawler for this scenario and directly interact with the API to fetch vehicle data for different locations on caramigo.eu. Here's a step-by-step breakdown:
First, you need to understand exactly what the API expects when you send a request. Open your browser's DevTools (F12), go to the Network tab, then perform a search on the homepage like you normally would. Look for the request to https://www.caramigo.eu/services/car and check:
- Request Method: Is it a
GETorPOST? (From your example URL, it's likely aGETsince parameters are in the query string) - Query Parameters: Note all required parameters (like
address,date_debut,date_fin—there might be others likedistanceorlangdepending on the site) - Request Headers: Pay attention to headers like
User-Agent,Cookie, and anyX-CSRF-Token—many sites use these to block automated requests.
Using Python (with the requests library, which is perfect for this), you can replicate the request directly to the API endpoint instead of going through the homepage search. Here's a basic example:
import requests import json # Create a session to persist cookies (important for maintaining a valid session) session = requests.Session() # First, visit the homepage to get necessary cookies/tokens session.get("https://www.caramigo.eu") # Define your target parameters (customize these for different locations/dates) request_params = { "address": "Belgique, Wallonie, Liège, 4000, Liège", "date_debut": "22-03-2019", "date_fin": "23-03-2019" # Add any other required parameters you found in DevTools here } # Send the request to the API response = session.get( "https://www.caramigo.eu/services/car", params=request_params, headers={ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" } ) # Parse and use the JSON data if response.status_code == 200: vehicle_data = response.json() # Process the data (e.g., save to a file, extract specific fields) with open("vehicle_data.json", "w", encoding="utf-8") as f: json.dump(vehicle_data, f, ensure_ascii=False, indent=2) print("Data fetched successfully!") else: print(f"Request failed: Status code {response.status_code}")
The key here is that you don't need to simulate the homepage search every time—just update the address parameter in your request to target different locations. Make sure the address format matches what the site expects (you can copy valid addresses from successful searches in the browser).
For example, to fetch data for Brussels:
request_params["address"] = "Belgique, Bruxelles, Bruxelles, 1000, Bruxelles"
- Rate Limiting: Add delays between requests (using
time.sleep(2)for example) to avoid overwhelming the server and getting your IP blocked. - Session Persistence: Using a
requests.Session()ensures cookies are retained, which helps avoid being flagged as an automated bot. - Check Robots.txt: Always review
https://www.caramigo.eu/robots.txtto make sure scraping the API is allowed. - Dynamic Tokens: If the site uses a CSRF token, extract it from the homepage HTML (using
BeautifulSoupfor example) and include it in your request headers or parameters.
To scrape data for multiple locations and date ranges, you can loop through a list of parameters:
import time # List of locations to scrape locations = [ "Belgique, Wallonie, Liège, 4000, Liège", "Belgique, Bruxelles, Bruxelles, 1000, Bruxelles", "Belgique, Flandre-Occidentale, Anvers, 2000, Anvers" ] # List of date ranges date_ranges = [("22-03-2019", "23-03-2019"), ("25-03-2019", "26-03-2019")] session = requests.Session() session.get("https://www.caramigo.eu") for location in locations: for start_date, end_date in date_ranges: request_params = { "address": location, "date_debut": start_date, "date_fin": end_date } response = session.get("https://www.caramigo.eu/services/car", params=request_params) if response.ok: filename = f"vehicles_{location.replace(',', '_')}_{start_date}.json" with open(filename, "w", encoding="utf-8") as f: json.dump(response.json(), f, ensure_ascii=False, indent=2) print(f"Saved data to {filename}") else: print(f"Failed to fetch data for {location} ({start_date} to {end_date})") # Add a delay to respect rate limits time.sleep(2)
That's it! By directly interacting with the API endpoint and modifying parameters, you can efficiently fetch data for any location and date range you need.
内容的提问来源于stack exchange,提问作者M. Coppee




