You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何创建爬虫调用API获取caramigo.eu不同地点的车辆数据?

Hey there! Let's figure out how to build a crawler for this scenario and directly interact with the API to fetch vehicle data for different locations on caramigo.eu. Here's a step-by-step breakdown:

Step 1: Inspect the API Request Details

First, you need to understand exactly what the API expects when you send a request. Open your browser's DevTools (F12), go to the Network tab, then perform a search on the homepage like you normally would. Look for the request to https://www.caramigo.eu/services/car and check:

  • Request Method: Is it a GET or POST? (From your example URL, it's likely a GET since parameters are in the query string)
  • Query Parameters: Note all required parameters (like address, date_debut, date_fin—there might be others like distance or lang depending on the site)
  • Request Headers: Pay attention to headers like User-Agent, Cookie, and any X-CSRF-Token—many sites use these to block automated requests.
Step 2: Replicate the Request in Your Crawler

Using Python (with the requests library, which is perfect for this), you can replicate the request directly to the API endpoint instead of going through the homepage search. Here's a basic example:

import requests
import json

# Create a session to persist cookies (important for maintaining a valid session)
session = requests.Session()

# First, visit the homepage to get necessary cookies/tokens
session.get("https://www.caramigo.eu")

# Define your target parameters (customize these for different locations/dates)
request_params = {
    "address": "Belgique, Wallonie, Liège, 4000, Liège",
    "date_debut": "22-03-2019",
    "date_fin": "23-03-2019"
    # Add any other required parameters you found in DevTools here
}

# Send the request to the API
response = session.get(
    "https://www.caramigo.eu/services/car",
    params=request_params,
    headers={
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
    }
)

# Parse and use the JSON data
if response.status_code == 200:
    vehicle_data = response.json()
    # Process the data (e.g., save to a file, extract specific fields)
    with open("vehicle_data.json", "w", encoding="utf-8") as f:
        json.dump(vehicle_data, f, ensure_ascii=False, indent=2)
    print("Data fetched successfully!")
else:
    print(f"Request failed: Status code {response.status_code}")
Step 3: Modify API Parameters for Different Locations

The key here is that you don't need to simulate the homepage search every time—just update the address parameter in your request to target different locations. Make sure the address format matches what the site expects (you can copy valid addresses from successful searches in the browser).

For example, to fetch data for Brussels:

request_params["address"] = "Belgique, Bruxelles, Bruxelles, 1000, Bruxelles"
Step 4: Handle Edge Cases & Ethical Scraping
  • Rate Limiting: Add delays between requests (using time.sleep(2) for example) to avoid overwhelming the server and getting your IP blocked.
  • Session Persistence: Using a requests.Session() ensures cookies are retained, which helps avoid being flagged as an automated bot.
  • Check Robots.txt: Always review https://www.caramigo.eu/robots.txt to make sure scraping the API is allowed.
  • Dynamic Tokens: If the site uses a CSRF token, extract it from the homepage HTML (using BeautifulSoup for example) and include it in your request headers or parameters.
Step 5: Scale for Multiple Locations/Dates

To scrape data for multiple locations and date ranges, you can loop through a list of parameters:

import time

# List of locations to scrape
locations = [
    "Belgique, Wallonie, Liège, 4000, Liège",
    "Belgique, Bruxelles, Bruxelles, 1000, Bruxelles",
    "Belgique, Flandre-Occidentale, Anvers, 2000, Anvers"
]

# List of date ranges
date_ranges = [("22-03-2019", "23-03-2019"), ("25-03-2019", "26-03-2019")]

session = requests.Session()
session.get("https://www.caramigo.eu")

for location in locations:
    for start_date, end_date in date_ranges:
        request_params = {
            "address": location,
            "date_debut": start_date,
            "date_fin": end_date
        }
        
        response = session.get("https://www.caramigo.eu/services/car", params=request_params)
        
        if response.ok:
            filename = f"vehicles_{location.replace(',', '_')}_{start_date}.json"
            with open(filename, "w", encoding="utf-8") as f:
                json.dump(response.json(), f, ensure_ascii=False, indent=2)
            print(f"Saved data to {filename}")
        else:
            print(f"Failed to fetch data for {location} ({start_date} to {end_date})")
        
        # Add a delay to respect rate limits
        time.sleep(2)

That's it! By directly interacting with the API endpoint and modifying parameters, you can efficiently fetch data for any location and date range you need.

内容的提问来源于stack exchange,提问作者M. Coppee

火山引擎 最新活动