You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何在Pandas中用多列作为索引通过pivot重塑DataFrame

Fixing Pandas Pivot with Multiple Index Columns & Handling Duplicates

Hey there, let's break down how to resolve your pivot problems step by step:

Why Your Previous Attempts Failed

  • When you used df.pivot(index='A', columns='C', values='E'), you only specified A as the index, so columns B and D got dropped—pivot only retains the index and the pivoted columns by default.
  • When you tried df.pivot(index=['A','B','D'], columns='C', values='E'), it likely threw an error because your actual dataset has duplicate combinations of (A, B, D, C)—the pivot() method requires unique index-column pairs to work without aggregation.

Solution 1: No Duplicate Index-Column Pairs

If each combination of (A, B, D, C) in your data is unique (like your sample data), you can directly use pivot() with multiple index columns. Here's how:

import pandas as pd

# Your sample DataFrame
data = [
    ['V1', 'B1', 'Clearing', 'C1', 1538884.46],
    ['V1', 'B1', 'CustomerPayment_Difference', 'C1', 13537679.70],
    ['V1', 'B1', 'Invoice', 'C1', -15771005.81],
    ['V1', 'B1', 'PaymentDifference', 'C1', 0.00],
    ['V2', 'B2', 'Clearing', 'C2', 104457.22],
    ['V2', 'B2', 'Invoice', 'C2', -400073.56],
    ['V2', 'B2', 'Payment', 'C2', 297856.45],
    ['V3', 'B3', 'Clearing', 'C3', 1989462.95],
    ['V3', 'B3', 'CreditMemo', 'C3', 538.95],
    ['V3', 'B3', 'CustomerPayment_Difference', 'C3', 2112329.00],
    ['V3', 'B3', 'Invoice', 'C3', -4066485.69],
    ['V4', 'B4', 'Clearing', 'C4', -123946.13],
    ['V4', 'B4', 'CreditMemo', 'C4', 127624.66],
    ['V4', 'B4', 'Accounting', 'C4', 424774.52],
    ['V4', 'B4', 'Invoice', 'C4', -40446521.41],
    ['V4', 'B4', 'Payment', 'C4', 44441419.95]
]
df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D', 'E'])

# Pivot with multiple index columns
pivoted_df = df.pivot(index=['A', 'B', 'D'], columns='C', values='E').reset_index()

# Clean up the column names (remove the 'C' label from the column level)
pivoted_df.columns.name = None

print(pivoted_df)

This will retain columns A, B, D, and pivot all unique values from C into separate columns with corresponding E values.

Solution 2: Handling Duplicate Index-Column Pairs

If your dataset has duplicate (A, B, D, C) combinations (which causes pivot() to fail), use pivot_table() instead. This allows you to specify an aggregation function to combine duplicate values:

# Simulate duplicate data (adding a copy of the first row)
df_with_duplicates = df.append(df.iloc[0], ignore_index=True)

# Use pivot_table with aggregation (sum, mean, first, etc.)
pivoted_table_df = df_with_duplicates.pivot_table(
    index=['A', 'B', 'D'],
    columns='C',
    values='E',
    aggfunc='sum'  # Choose the right aggregation for your use case
).reset_index()

# Clean up column names
pivoted_table_df.columns.name = None

print(pivoted_table_df)

Common aggregation functions you can use:

  • 'sum': Sum all duplicate values
  • 'mean': Calculate the average
  • 'first': Keep the first occurrence
  • 'last': Keep the last occurrence

Key Takeaways

  • To retain columns A, B, D, pass them as a list to the index parameter in pivot() or pivot_table().
  • Use pivot() when you have unique index-column pairs; use pivot_table() when duplicates exist and you need to aggregate values.
  • Always use reset_index() to convert the multi-index into regular columns, and columns.name = None to clean up the column labels.

内容的提问来源于stack exchange,提问作者Mohamed Thasin ah

火山引擎 最新活动