Pandas读取Excel文件:从含指定值行的下一行开始读取
The problem with hardcoding skiprows is obvious—if the "Start" row moves, your code breaks. Instead, we can first scan the entire sheet to find where "Start" lives, then pull the data from the row right after it. Here's a reliable way to do this:
Step 1: Read the entire sheet into a temporary DataFrame
First, we read every row (no skipping) so we can search for "Start". We use header=None because the initial rows aren't our actual data headers:
import pandas as pd # Use raw string for Windows file path to avoid backslash issues temp_df = pd.read_excel(r'C:\Users\MyFolder\MyFile.xlsx', sheet_name='Sheet1', header=None)
Step 2: Locate the "Start" row
Next, find the index of the first row that contains "Start". If you know "Start" is always in Column A (the first column), use this:
# Find all rows where Column A has "Start" start_matches = temp_df[temp_df[0] == 'Start'] if start_matches.empty: raise ValueError("Error: Could not find the 'Start' row in the Excel file") # Get the index of the first matching row start_row_idx = start_matches.index[0]
If "Start" could be in any column (not just Column A), replace the above check with this to search all columns:
start_matches = temp_df[temp_df.isin(['Start']).any(axis=1)]
Step 3: Extract the actual data
Now slice the temporary DataFrame to get everything after the "Start" row, and reset the index for cleanliness:
# Take all rows starting immediately after "Start" df = temp_df.iloc[start_row_idx + 1:].reset_index(drop=True) # Optional: If your data should have column headers (e.g., the first row of the original sheet), uncomment below: # df.columns = temp_df.iloc[0]
This approach adapts automatically to wherever the "Start" row is placed—no more broken code when the Excel file's structure changes slightly!
内容的提问来源于stack exchange,提问作者FaCoffee




