如何用Pandas读取含嵌套列的Excel文件并解决列名异常问题
Hey there, let’s figure out why you’re getting those annoying Unnamed: column names and fix it, plus walk through how to convert your cleaned data to JSON afterward.
一、Fixing the Unnamed Column Issue
The most common reason you’re seeing Unnamed: columns is that your Excel’s header isn’t a single, clean row—maybe it’s split across multiple rows, uses merged cells, or skiprows=6 isn’t targeting the right starting point for your actual column names. Here are a few ways to fix this:
1. First, map out your Excel’s structure
Before diving into fixing, let’s see exactly what’s in those top rows. Skip setting headers entirely to inspect the raw data:
import pandas as pd # Read the entire file without setting headers raw_df = pd.read_excel('filename.xlsx') # Print the first 10 rows to spot where your actual column names live print(raw_df.head(10))
Once you find which rows make up your header (e.g., rows 7 and 8 if you skipped 6, which are index 6 and 7 in zero-based numbering), use the header parameter to specify that range:
# Use a list to tell Pandas which rows form the header df = pd.read_excel('filename.xlsx', header=[6, 7]) # If you end up with multi-level headers, you can merge them into single names df.columns = ['_'.join(str(col).strip() for col in col_tuple) for col_tuple in df.columns] # Clean up any leftover "Unnamed" parts df.columns = [col.replace('_Unnamed', '') for col in df.columns]
2. Manually define column names
If your header is super messy (like scattered across non-consecutive rows), skip past all header rows and set the column names yourself:
# Skip all header rows (adjust skiprows to start at your first data row) df = pd.read_excel('filename.xlsx', skiprows=7) # Create a list of your actual column names, in the same order as the Excel columns custom_cols = ['Customer ID', 'Order Date', 'Product Name', ...] # Replace with your real names df.columns = custom_cols
3. Handle merged cells with openpyxl
Merged cells in Excel often throw Pandas off. Use the openpyxl engine (install it first with pip install openpyxl) to read the file, then fill in the missing column names by forward-filling:
df = pd.read_excel('filename.xlsx', skiprows=6, engine='openpyxl') # Fill Unnamed columns with the last non-empty column name to the left df.columns = df.columns.fillna(method='ffill')
二、Converting Your Data to JSON
Once you’ve got a clean DataFrame with all the right column names, converting to JSON is straightforward. Here are the most useful formats:
1. Column-oriented JSON (default)
This format groups data by column names, great if you want to keep the structure close to your DataFrame:
# indent=4 makes the JSON human-readable df.to_json('output_columns.json', orient='columns', indent=4)
2. Record-oriented JSON
Each row becomes a separate JSON object—perfect for APIs or datasets where you want individual entries:
df.to_json('output_records.json', orient='records', indent=4)
3. Index-included JSON
If you need to keep the DataFrame’s index as part of the JSON output:
df.to_json('output_with_index.json', orient='index', indent=4)
内容的提问来源于stack exchange,提问作者Hamad




