You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

在Pandas DataFrame中按日期生成客户商品历史列(同日期互斥)

Solution for Adding Item History Excluding Same-Day Purchases

Got it, let's fix this problem properly! The key issue with your original approach is that it uses row indices to build the history, which includes same-day items since they're consecutive rows. We need to group by date first within each customer to ensure same-day purchases don't appear in each other's history.

Step-by-Step Solution

First, let's make sure our date column is in a format we can compare, then we'll build the history based on dates instead of row positions:

import pandas as pd
import numpy as np

# Your sample data
df = pd.DataFrame({
    'Customer':['Bert', 'Bert', 'Bert', 'Bert', 'Bert', 'Ernie', 'Ernie', 'Ernie', 'Ernie', 'Steven', 'Steven'], 
    'Date':['01/01/2019', '15/01/2019', '20/01/2019', '20/01/2019', '22/01/2019', '01/01/2019', '15/01/2019', '20/01/2019', '22/01/2019', '01/01/2019' ,'15/01/2019'], 
    'Item':['Bread', 'Cheese', 'Apples', 'Pears', 'Toothbrush', 'Toys', 'Shellfish', 'Dog', 'Yoghurt', 'Toilet', 'Dominos']
})

# 1. Convert date strings to datetime objects (critical for proper ordering)
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')

# 2. Sort by Customer and Date to ensure chronological order
df = df.sort_values(['Customer', 'Date']).reset_index(drop=True)

# 3. Define a function to build item history per customer
def build_customer_history(group):
    # Group the customer's purchases by date to get all items per day
    daily_items = group.groupby('Date')['Item'].apply(list).reset_index()
    
    # Cumulate all previous days' items, shifting by 1 to exclude the current day
    daily_items['Item History'] = daily_items['Item'].cumsum().shift(1)
    
    # Replace empty histories with NaN (instead of empty lists)
    daily_items['Item History'] = daily_items['Item History'].apply(
        lambda x: x if isinstance(x, list) and len(x) > 0 else np.nan
    )
    
    # Merge the daily history back to the original group rows
    return group.merge(daily_items[['Date', 'Item History']], on='Date', how='left')

# 4. Apply the function to each customer group
df = df.groupby('Customer').apply(build_customer_history).reset_index(drop=True)

# Let's check the result
print(df[['Customer', 'Date', 'Item', 'Item History']])

What This Does

  • Date Conversion: Ensures we can correctly compare dates to order purchases chronologically.
  • Sorting: Makes sure each customer's records are in the right time order before building history.
  • Daily Grouping: By first grouping purchases per date, we ensure all same-day rows share the same history (which doesn't include their own day's items).
  • Cumulative Shift: Using cumsum() accumulates all previous days' items, and shift(1) drops the current day's items from the history.
  • NaN Handling: Replaces empty histories (for a customer's first purchase) with NaN instead of empty lists, matching your original approach.

Sample Output Excerpt

For Bert's 20/01/2019 purchases, you'll see:

CustomerDateItemItem History
Bert2019-01-20Apples['Bread', 'Cheese']
Bert2019-01-20Pears['Bread', 'Cheese']
Bert2019-01-22Toothbrush['Bread', 'Cheese', 'Apples', 'Pears']

Perfect—same-day items don't appear in each other's history, and later dates include all prior purchases.

内容的提问来源于stack exchange,提问作者Dan Scally

火山引擎 最新活动