气象数据下载清洗CSV存储及美国四城市降水数据处理开发需求

阿华AIGC实验室

2026-5-22

Let's walk through exactly how to solve this task—from fetching and cleaning the weather data to generating those 48 monthly files and calculating the average precipitation. I’ll use Python since it’s the go-to for data tasks like this.

Step 1: Set Up Your Tools

First, make sure you have the necessary libraries installed. We’ll use requests to pull data from the API, pandas for data cleaning/analysis, and os for file management. Install them via pip if you haven’t already:

pip install requests pandas

Step 2: Load Your Station IDs

Start by reading the stations.csv file to get the four city station IDs. Adjust the column name if your CSV uses something other than station_id:

import pandas as pd
import requests
import os
from datetime import datetime

# Load station IDs from the CSV
stations_df = pd.read_csv("stations.csv")
station_ids = stations_df["station_id"].tolist()  # Update column name if needed

Step 3: Fetch & Clean Monthly 2017 Data

Next, we’ll loop through each station and each month of 2017, fetch the data, clean it, and save it as a separate CSV. You’ll need to replace the base_url with your actual weather data API’s URL template (adjust the placeholders to match how the API accepts station ID, year, and month):

# Create a folder to store the monthly files (avoids clutter)
os.makedirs("monthly_weather_data", exist_ok=True)

# Replace this with your actual API URL template
base_url = "https://your-weather-api-url.com/stations/{station_id}/data?year={year}&month={month}"

for station_id in station_ids:
    for month in range(1, 13):
        # Format month as two digits (e.g., 01 for January)
        month_str = f"{month:02d}"
        # Build the full request URL
        request_url = base_url.format(station_id=station_id, year=2017, month=month_str)
        
        try:
            # Fetch the data from the API
            response = requests.get(request_url)
            response.raise_for_status()  # Throw an error if the request fails
            
            # Parse the data into a DataFrame (adjust this based on your API's output format)
            # Example for JSON data:
            data = response.json()
            df = pd.DataFrame(data["observations"])  # Update to match your API's structure
            
            # --- Data Cleaning Steps (customize these to your data!) ---
            # 1. Keep only columns we need (date and precipitation)
            df = df[["date", "precipitation"]]  # Replace with your actual column names
            
            # 2. Convert date to datetime format (critical for sorting/filtering)
            df["date"] = pd.to_datetime(df["date"])
            
            # 3. Drop rows with missing precipitation values
            df = df.dropna(subset=["precipitation"])
            
            # 4. Ensure precipitation values are numeric (fix any string entries)
            df["precipitation"] = pd.to_numeric(df["precipitation"], errors="coerce")
            df = df.dropna(subset=["precipitation"])
            
            # --- Save the cleaned data as a CSV ---
            output_filename = f"monthly_weather_data/{station_id}_2017_{month_str}.csv"
            df.to_csv(output_filename, index=False)
            print(f"Successfully saved: {output_filename}")
            
        except Exception as e:
            print(f"Failed to process {station_id} - {month_str}/2017: {str(e)}")
            continue

Quick Notes on Data Cleaning:

If your API returns CSV data instead of JSON, replace the parsing step with df = pd.read_csv(request_url)
Adjust column names to match what your data uses (e.g., precip instead of precipitation, timestamp instead of date)
Add extra steps if needed: filter out invalid values (like negative precipitation), convert units (inches to mm), or handle timezone differences

Step 4: Calculate Monthly Average Precipitation

Once you have all 48 monthly files, calculating the averages is straightforward. We’ll loop through each file, compute the mean precipitation, and save the results in a summary CSV:

# Store average precipitation data in a list
avg_precip_results = []

for station_id in station_ids:
    for month in range(1, 13):
        month_str = f"{month:02d}"
        file_path = f"monthly_weather_data/{station_id}_2017_{month_str}.csv"
        
        # Read the cleaned monthly data
        df = pd.read_csv(file_path)
        
        # Calculate the monthly average
        monthly_avg = df["precipitation"].mean()
        
        # Add to our results list
        avg_precip_results.append({
            "station_id": station_id,
            "year": 2017,
            "month": month,
            "average_precipitation": round(monthly_avg, 2)  # Round to 2 decimal places for readability
        })

# Convert results to a DataFrame and save
avg_precip_df = pd.DataFrame(avg_precip_results)
avg_precip_df.to_csv("2017_monthly_avg_precipitation.csv", index=False)
print("Monthly average precipitation results saved to 2017_monthly_avg_precipitation.csv")

Troubleshooting Tips

API Rate Limits: If you hit API request limits, add a small delay between requests with time.sleep(1) (don’t forget to import time)
Missing Data: Some months might have no data—our try-except block will skip those and log the error
Data Types: Double-check that precipitation values are numeric; if you see errors, adjust the pd.to_numeric step to handle edge cases

内容的提问来源于stack exchange，提问作者Ma_