在Jupyter Notebook中导入多CSV文件,提取Market Cap列并重命名为文件名
Let's break down exactly how to import, clean, and merge those 100 CSV files into a single, usable DataFrame. Here's a step-by-step approach with code:
Step 1: Prepare Your Environment & File Setup
First, make sure you've downloaded all the CSV files from the dataset and saved them in a dedicated folder (e.g., crypto_market_data) on your local machine. Then, install/import the required libraries:
# Install pandas if you haven't already (run once) !pip install pandas # Import necessary libraries import pandas as pd import os
Step 2: Define Your Data Folder Path
Set the path to the folder where you stored the CSV files. For example:
# Replace this with your actual folder path data_folder = "./crypto_market_data"
Step 3: Loop Through Files & Build Merged DataFrame
We'll iterate over each CSV, extract only the Market Cap column (with timestamp as index), rename the column to the crypto's filename, and merge everything together:
# Initialize an empty DataFrame to hold all merged data merged_market_cap = pd.DataFrame() # Loop through each file in the data folder for filename in os.listdir(data_folder): # Only process CSV files if filename.endswith(".csv"): # Full path to the current file file_path = os.path.join(data_folder, filename) # Read the CSV, set the timestamp column as index (adjust 'Date' if your timestamp column has a different name) df = pd.read_csv(file_path, index_col="Date", parse_dates=True) # Keep only the 'Market Cap' column df = df[["Market Cap"]] # Rename the column to the crypto name (remove .csv from filename) crypto_name = filename.replace(".csv", "") df.rename(columns={"Market Cap": crypto_name}, inplace=True) # Merge with the main DataFrame (use outer join to preserve all timestamps across cryptos) if merged_market_cap.empty: merged_market_cap = df else: merged_market_cap = merged_market_cap.join(df, how="outer")
Step 4: (Optional) Clean Up Missing Values
Depending on the data, you might have missing entries where a crypto didn't exist on a particular date. You can handle this with methods like forward fill or drop missing values:
# Example: Forward fill missing values (carry over the last known market cap) merged_market_cap.fillna(method="ffill", inplace=True) # Or drop rows with any missing values (use cautiously, as this removes data) # merged_market_cap.dropna(inplace=True)
Key Notes & Troubleshooting
- Timestamp Column Name: If your CSV files use a different name for the timestamp column (e.g., "Timestamp" instead of "Date"), update the
index_colparameter inpd.read_csv()accordingly. - Filename Cleanup: If filenames have extra characters (like hyphens or spaces), you can adjust the
crypto_nameline to clean them up (e.g.,crypto_name = filename.replace(".csv", "").replace("-", "_")). - Memory Considerations: 100 files might take up some memory—if you run into issues, consider processing batches or filtering dates before merging.
内容的提问来源于stack exchange,提问作者Juan Rodriguez




