Pandas生成15分钟OHLCV:求和聚合时如何用前向填充替代补0?
The Problem
I'm working on converting tick-level price/volume/time data into 15-minute OHLCV bars. My raw data looks like this:
price amount unix_timestamp
2018-01-05 12:33:52 15861.00000 0.194755
2018-01-05 12:33:52 15860.00000 0.050000
2018-01-05 12:33:53 15860.00000 0.100000
...
I used this code to generate the bars and fill missing values:
ohlcv = data.resample(minutes).agg({ "price":"ohlc", "amount": "sum", }).rename(columns={'amount':'volume'}).ffill()
But here's the issue: for periods with no trades (like 2018-01-05 13:30:00 in my sample output), the volume field gets set to 0.000000 instead of forward-filling the previous period's volume value (which is what happens with the OHLC fields).
Why This Happens
The root cause is how pandas handles sum during resampling: when there's no data in a period, pandas' default sum returns 0 instead of NaN. Since ffill() only acts on missing values (NaN), those 0s get left untouched.
The Fix
We need to make sure empty periods return NaN for volume first, then run ffill() to propagate the last valid volume value. Here are two clean ways to do this:
Option 1: Use sum(min_count=1)
Pandas' Series.sum() has a min_count parameter—if the number of non-NaN values is less than this number, it returns NaN instead of 0. Perfect for our case:
ohlcv = data.resample('15T').agg( { "price": "ohlc", "amount": lambda x: x.sum(min_count=1) # Returns NaN if no trades } ).rename(columns={'amount':'volume'}).ffill()
Option 2: Use numpy's sum with a check
Alternatively, you can use np.sum and explicitly check if there are any values in the period:
import numpy as np ohlcv = data.resample('15T').agg( { "price": "ohlc", "amount": lambda x: np.sum(x) if len(x) > 0 else np.nan } ).rename(columns={'amount':'volume'}).ffill()
What This Does
After making this change, empty periods will have NaN in the volume column. Then ffill() will replace those NaNs with the last valid volume value, matching the behavior you see with the OHLC fields.
Example Result
The problematic row 2018-01-05 13:30:00 will now have a volume of 347.864213 (carried over from the 13:15 period) instead of 0.
内容的提问来源于stack exchange,提问作者funnyleh32




