如何用Pandas的Rolling计算DataFrame的真实滚动平均值?
Calculating True Rolling Average for All Values in a Pandas DataFrame
Alright, let's break down how to compute the true rolling average across all values in your DataFrame using Pandas' rolling functionality. I'll use a concrete example to make this clear.
Step 1: Example DataFrame
First, let's define the sample DataFrame we're working with:
import pandas as pd import numpy as np df = pd.DataFrame({ 'a001': [1, np.nan, np.nan, np.nan, np.nan, 2, np.nan], 'a002': [1, 7, np.nan, 3, np.nan, 2, 6] })
Which looks like this when printed:
| a001 | a002 | |
|---|---|---|
| 0 | 1 | 1 |
| 1 | NaN | 7 |
| 2 | NaN | NaN |
| 3 | NaN | 3 |
| 4 | NaN | NaN |
| 5 | 2 | 2 |
| 6 | NaN | 6 |
Step 2: Core Calculation Logic
To get the true rolling average (with a window size of 2 rows), we need to:
- First calculate the average of valid values (ignoring
NaN) for each individual row - Then apply a rolling window average to those row-wise means
Here's the code to do this:
df['rolling_mean'] = df.mean(axis=1).rolling(window=2, min_periods=1).mean()
Step 3: Result
After running the code, your DataFrame will look like this:
| a001 | a002 | rolling_mean | |
|---|---|---|---|
| 0 | 1.0 | 1.0 | 1.0 |
| 1 | NaN | 7.0 | 4.0 |
| 2 | NaN | NaN | 7.0 |
| 3 | NaN | 3.0 | 3.0 |
| 4 | NaN | NaN | 3.0 |
| 5 | 2.0 | 2.0 | 2.0 |
| 6 | NaN | 6.0 | 4.0 |
Let's Break Down the Code
df.mean(axis=1): Computes the average of each row, automatically skippingNaNvalues. For rows with allNaN, this returnsNaN..rolling(window=2, min_periods=1): Sets up a rolling window of 2 rows. Themin_periods=1parameter ensures we still get a result even if only one valid value exists in the window (critical for handling rows with allNaN)..mean(): Calculates the average of values within each rolling window.
内容的提问来源于stack exchange,提问作者Joe




