You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Python新手求助:用Pandas按10分钟频率计算2个月数据平均值

Building a 10-Minute Average Baseline for Your 2-Month Transaction Data

Hey there! Since you’ve already got the basics down—setting a datetime index, grouping transactions into 10-minute windows, and retaining the Respuesta, OperationId, and SucursalId columns—let’s build out that average baseline using your 2 months of daily appended data. Here’s a step-by-step breakdown tailored to your needs:

Step 1: Combine Your Daily Appended Data

First, we need to bring all your daily datasets into a single DataFrame. If your data is stored in separate files (e.g., CSV files named like transactions_2024-01-01.csv), use this code to concatenate them:

import pandas as pd
import glob

# Grab all daily data files (adjust the file pattern to match your naming)
daily_files = glob.glob('transactions_*.csv')

# Combine all files into one DataFrame
full_dataset = pd.concat([pd.read_csv(file) for file in daily_files], ignore_index=True)

# Ensure your timestamp column is converted to datetime and set as the index
# Replace 'timestamp_column' with your actual time column name
full_dataset['timestamp_column'] = pd.to_datetime(full_dataset['timestamp_column'])
full_dataset.set_index('timestamp_column', inplace=True)

Step 2: Choose Your Baseline Type

There are two common types of 10-minute baselines—pick the one that fits your use case:

Option 1: Global 10-Minute Window Averages

This calculates the average for every unique 10-minute interval across your entire 2-month dataset (e.g., average values for 2024-01-01 09:00-09:10, 2024-01-01 09:10-09:20, etc.):

# Resample to 10-minute intervals and compute mean for your target columns
global_baseline = full_dataset[['Respuesta', 'OperationId', 'SucursalId']].resample('10T').mean()
  • resample('10T') groups your data into 10-minute chunks (T stands for minutes)
  • mean() computes the average for each chunk

Option 2: Time-of-Day 10-Minute Baselines

If you want a baseline that represents the average value for the same 10-minute window every day (e.g., average of all 09:00-09:10 transactions across all 2 months), use this approach:

# Extract the hour-minute portion of your timestamp to group by time of day
full_dataset['time_window'] = full_dataset.index.strftime('%H:%M')

# Group by the 10-minute time window and compute the average
time_of_day_baseline = full_dataset.groupby('time_window')[['Respuesta', 'OperationId', 'SucursalId']].mean()

# Optional: Sort the baseline by time for readability
time_of_day_baseline.index = pd.to_datetime(time_of_day_baseline.index, format='%H:%M').time
time_of_day_baseline.sort_index(inplace=True)

This gives you a reusable baseline that you can compare against new daily data (e.g., check if today’s 09:00-09:10 Respuesta values are above or below the 2-month average for that window).

Step 3: Handle Edge Cases

  • Missing Values: If your dataset has gaps, clean them up before computing averages to avoid skewed results:
    full_dataset = full_dataset.dropna(subset=['Respuesta', 'OperationId', 'SucursalId'])
    
  • Irregular Timestamps: If your data doesn’t align perfectly with 10-minute windows, resample() will automatically fill in missing intervals with NaN—you can use fillna(method='ffill') or fillna(0) if needed, depending on your data.

Step 4: Use the Baseline with New Data

To compare new daily transactions against your baseline (using the time-of-day example):

# Load and prepare new daily data
new_daily_data = pd.read_csv('new_transactions.csv')
new_daily_data['timestamp_column'] = pd.to_datetime(new_daily_data['timestamp_column'])
new_daily_data.set_index('timestamp_column', inplace=True)
new_daily_data['time_window'] = new_daily_data.index.strftime('%H:%M')

# Merge the baseline with new data
new_data_with_baseline = new_daily_data.merge(
    time_of_day_baseline,
    on='time_window',
    suffixes=('_actual', '_baseline')
)

Let me know if you need to adjust this for specific edge cases (like splitting baseline by SucursalId or handling weekend vs weekday differences)!

内容的提问来源于stack exchange,提问作者jufegome

火山引擎 最新活动