如何将Excel两行内容合并为表头?Pandas加载格式混乱文件遇问题
Got it, let's sort out that tricky Excel header situation where your column names are split across two rows. Here's a step-by-step solution tailored to your code:
Step 1: Extract the two header rows
First, we'll read just the two rows that contain your split column names (since you're skipping 35 rows, these are the next two rows after that):
import os import pandas as pd os.chdir(r'xxx') file = 'problem.xls' xl = pd.ExcelFile(file) # Read only the two header rows header_df = xl.parse('WEEKLY NUMBERS', skiprows=35, nrows=2)
Step 2: Merge the rows into proper column names
Next, we'll combine the values from each column's two cells to create a single, clean column name. We'll handle cases where one cell might be empty (so we don't end up with extra spaces):
new_columns = [] for col in header_df.columns: # Get values from both rows, replace NaNs with empty strings row1_val = str(header_df[col].iloc[0]).strip() if pd.notna(header_df[col].iloc[0]) else "" row2_val = str(header_df[col].iloc[1]).strip() if pd.notna(header_df[col].iloc[1]) else "" # Combine the two parts, trimming any extra spaces combined_name = f"{row1_val} {row2_val}".strip() new_columns.append(combined_name)
Step 3: Load the actual data with the merged headers
Now we'll load your main data, skipping the first 37 rows (35 irrelevant rows + 2 header rows) and using our newly merged column names:
# Load the data, skip the header rows we already processed df = xl.parse('WEEKLY NUMBERS', skiprows=37, names=new_columns) # Optional: Check the result to confirm headers are correct print(df.columns)
Bonus: Handling edge cases
If some columns only have a name in one of the two rows (e.g., row 1 has "Total" and row 2 is empty), the code will automatically use just the non-empty value, so you won't get weird blank spaces in your column names.
内容的提问来源于stack exchange,提问作者jules325




