如何读取多文件到Pandas DataFrame时跳过前两行?
Hey there! The issue you're running into is because pd.read_csv(f) returns a Pandas DataFrame, not an iterable line-by-line object—so using next() on it doesn't work, which explains the error you saw.
Good news: you don't need to split up your existing code at all! Pandas' read_csv has a built-in parameter called skiprows that lets you skip the first N rows of each CSV file directly. Here's how to modify your existing one-liner generator:
Modified Code
import pandas as pd import glob import os path = r'D:\E\Traficc\migration\Zambia-Mining\DATA\24monthimport' # use your path all_files = glob.glob(os.path.join(path, "*.csv")) # Add skiprows=2 to read_csv to ignore the first two rows of each file df_from_each_file = (pd.read_csv(f, skiprows=2) for f in all_files) data = pd.concat(df_from_each_file, ignore_index=True) print(data.tail())
How It Works
- The
skiprows=2argument tellspd.read_csvto skip the first 2 rows of every CSV file it reads. This is far more efficient than trying to manipulate the DataFrame after loading it, since it avoids loading those rows into memory at all. - If you ever need to skip specific rows (not just the first N), you can pass a list of row indices to
skiprows(e.g.,skiprows=[0,1]does the same thing asskiprows=2here).
Why Your Previous Attempt Failed
When you tried next(df), you were trying to iterate over a DataFrame like it was a file handle—but DataFrames aren't line iterators. The proper way to control which rows are loaded from a CSV is to use read_csv parameters like skiprows, header, or skipfooter.
内容的提问来源于stack exchange,提问作者Thelurker Lurker




