Python Pandas:分组后多列ffill填充缺失值报错及解决方法
I see you're trying to forward-fill missing values for var1 and var2 in a pandas DataFrame, grouped by date and building, while leaving var3 and var4 untouched. Your single-column approach works, but the multi-column attempt throws a ValueError. Let's break down what's happening and how to fix it.
Your Original Code & Problem
First, here's your sample DataFrame for reference:
import pandas as pd import numpy as np df = pd.DataFrame({ 'date': ['2019-01-01','2019-01-01','2019-01-01','2019-01-01','2019-02-01','2019-02-01','2019-02-01','2019-02-01'], 'building': ['a', 'a', 'b', 'b', 'a', 'a', 'b', 'b'], 'var1': [1.5, np.nan, 2.1, 2.2, 1.2, 1.3, 2.4, np.nan], 'var2': [100, 110, 105, np.nan, 102, np.nan, 103, 107], 'var3': [10, 11, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], 'var4': [1, 2, 3, 4, 5, 6, 7, 8] })
Working Single-Column Approach
This works because you're assigning back to a single column, and the grouped ffill() returns a Series that aligns perfectly with the original column:
df['var1'] = df.groupby(['date', 'building'])['var1'].ffill() df['var2'] = df.groupby(['date', 'building'])['var2'].ffill()
Broken Multi-Column Code
This throws ValueError: Columns must be same length as key:
df[['var1', 'var2']] = df.groupby(['date', 'building'])[['var1', 'var2']].ffill()
Why the Error Happens
When you call ffill() directly on a grouped DataFrame (with multiple columns), the returned object is a DataFrame where each group's values are forward-filled—but pandas can sometimes struggle to align this grouped result perfectly with the original DataFrame's structure during assignment, especially if there are edge cases in missing value positions. The key issue is that ffill() on a grouped DataFrame doesn't guarantee a shape that's 100% compatible with direct assignment to the original columns.
The Fix: Use transform()
Instead of calling ffill() directly on the grouped DataFrame, use transform(). This method ensures the result has the same shape and index as the original DataFrame, making it safe to assign back to your target columns.
Solution Code
# Forward-fill var1 and var2 within each (date, building) group df[['var1', 'var2']] = df.groupby(['date', 'building'])[['var1', 'var2']].transform('ffill')
Or, if you prefer using a lambda for clarity:
df[['var1', 'var2']] = df.groupby(['date', 'building'])[['var1', 'var2']].transform(lambda x: x.ffill())
Verification
After running this code:
var1will have the missing value in row 1 filled with 1.5 (from the same2019-01-01+agroup), and row 7 filled with 2.4 (from2019-02-01+bgroup)var2will have the missing value in row 3 filled with 105 (from2019-01-01+bgroup), and row 5 filled with 102 (from2019-02-01+agroup)var3andvar4remain completely unchanged, just as you wanted.
内容的提问来源于stack exchange,提问作者Gaurav Bansal




