You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Python Pandas:分组后多列ffill填充缺失值报错及解决方法

Fixing "ValueError: Columns must be same length as key" when filling multiple columns with ffill() after groupby in pandas

I see you're trying to forward-fill missing values for var1 and var2 in a pandas DataFrame, grouped by date and building, while leaving var3 and var4 untouched. Your single-column approach works, but the multi-column attempt throws a ValueError. Let's break down what's happening and how to fix it.

Your Original Code & Problem

First, here's your sample DataFrame for reference:

import pandas as pd
import numpy as np

df = pd.DataFrame({ 
    'date': ['2019-01-01','2019-01-01','2019-01-01','2019-01-01','2019-02-01','2019-02-01','2019-02-01','2019-02-01'], 
    'building': ['a', 'a', 'b', 'b', 'a', 'a', 'b', 'b'], 
    'var1': [1.5, np.nan, 2.1, 2.2, 1.2, 1.3, 2.4, np.nan], 
    'var2': [100, 110, 105, np.nan, 102, np.nan, 103, 107], 
    'var3': [10, 11, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], 
    'var4': [1, 2, 3, 4, 5, 6, 7, 8] 
})

Working Single-Column Approach

This works because you're assigning back to a single column, and the grouped ffill() returns a Series that aligns perfectly with the original column:

df['var1'] = df.groupby(['date', 'building'])['var1'].ffill()
df['var2'] = df.groupby(['date', 'building'])['var2'].ffill()

Broken Multi-Column Code

This throws ValueError: Columns must be same length as key:

df[['var1', 'var2']] = df.groupby(['date', 'building'])[['var1', 'var2']].ffill()

Why the Error Happens

When you call ffill() directly on a grouped DataFrame (with multiple columns), the returned object is a DataFrame where each group's values are forward-filled—but pandas can sometimes struggle to align this grouped result perfectly with the original DataFrame's structure during assignment, especially if there are edge cases in missing value positions. The key issue is that ffill() on a grouped DataFrame doesn't guarantee a shape that's 100% compatible with direct assignment to the original columns.

The Fix: Use transform()

Instead of calling ffill() directly on the grouped DataFrame, use transform(). This method ensures the result has the same shape and index as the original DataFrame, making it safe to assign back to your target columns.

Solution Code

# Forward-fill var1 and var2 within each (date, building) group
df[['var1', 'var2']] = df.groupby(['date', 'building'])[['var1', 'var2']].transform('ffill')

Or, if you prefer using a lambda for clarity:

df[['var1', 'var2']] = df.groupby(['date', 'building'])[['var1', 'var2']].transform(lambda x: x.ffill())

Verification

After running this code:

  • var1 will have the missing value in row 1 filled with 1.5 (from the same 2019-01-01 + a group), and row 7 filled with 2.4 (from 2019-02-01 + b group)
  • var2 will have the missing value in row 3 filled with 105 (from 2019-01-01 + b group), and row 5 filled with 102 (from 2019-02-01 + a group)
  • var3 and var4 remain completely unchanged, just as you wanted.

内容的提问来源于stack exchange,提问作者Gaurav Bansal

火山引擎 最新活动