如何在Pandas中用多列作为索引通过pivot重塑DataFrame
Fixing Pandas Pivot with Multiple Index Columns & Handling Duplicates
Hey there, let's break down how to resolve your pivot problems step by step:
Why Your Previous Attempts Failed
- When you used
df.pivot(index='A', columns='C', values='E'), you only specifiedAas the index, so columnsBandDgot dropped—pivot only retains the index and the pivoted columns by default. - When you tried
df.pivot(index=['A','B','D'], columns='C', values='E'), it likely threw an error because your actual dataset has duplicate combinations of(A, B, D, C)—thepivot()method requires unique index-column pairs to work without aggregation.
Solution 1: No Duplicate Index-Column Pairs
If each combination of (A, B, D, C) in your data is unique (like your sample data), you can directly use pivot() with multiple index columns. Here's how:
import pandas as pd # Your sample DataFrame data = [ ['V1', 'B1', 'Clearing', 'C1', 1538884.46], ['V1', 'B1', 'CustomerPayment_Difference', 'C1', 13537679.70], ['V1', 'B1', 'Invoice', 'C1', -15771005.81], ['V1', 'B1', 'PaymentDifference', 'C1', 0.00], ['V2', 'B2', 'Clearing', 'C2', 104457.22], ['V2', 'B2', 'Invoice', 'C2', -400073.56], ['V2', 'B2', 'Payment', 'C2', 297856.45], ['V3', 'B3', 'Clearing', 'C3', 1989462.95], ['V3', 'B3', 'CreditMemo', 'C3', 538.95], ['V3', 'B3', 'CustomerPayment_Difference', 'C3', 2112329.00], ['V3', 'B3', 'Invoice', 'C3', -4066485.69], ['V4', 'B4', 'Clearing', 'C4', -123946.13], ['V4', 'B4', 'CreditMemo', 'C4', 127624.66], ['V4', 'B4', 'Accounting', 'C4', 424774.52], ['V4', 'B4', 'Invoice', 'C4', -40446521.41], ['V4', 'B4', 'Payment', 'C4', 44441419.95] ] df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D', 'E']) # Pivot with multiple index columns pivoted_df = df.pivot(index=['A', 'B', 'D'], columns='C', values='E').reset_index() # Clean up the column names (remove the 'C' label from the column level) pivoted_df.columns.name = None print(pivoted_df)
This will retain columns A, B, D, and pivot all unique values from C into separate columns with corresponding E values.
Solution 2: Handling Duplicate Index-Column Pairs
If your dataset has duplicate (A, B, D, C) combinations (which causes pivot() to fail), use pivot_table() instead. This allows you to specify an aggregation function to combine duplicate values:
# Simulate duplicate data (adding a copy of the first row) df_with_duplicates = df.append(df.iloc[0], ignore_index=True) # Use pivot_table with aggregation (sum, mean, first, etc.) pivoted_table_df = df_with_duplicates.pivot_table( index=['A', 'B', 'D'], columns='C', values='E', aggfunc='sum' # Choose the right aggregation for your use case ).reset_index() # Clean up column names pivoted_table_df.columns.name = None print(pivoted_table_df)
Common aggregation functions you can use:
'sum': Sum all duplicate values'mean': Calculate the average'first': Keep the first occurrence'last': Keep the last occurrence
Key Takeaways
- To retain columns
A,B,D, pass them as a list to theindexparameter inpivot()orpivot_table(). - Use
pivot()when you have unique index-column pairs; usepivot_table()when duplicates exist and you need to aggregate values. - Always use
reset_index()to convert the multi-index into regular columns, andcolumns.name = Noneto clean up the column labels.
内容的提问来源于stack exchange,提问作者Mohamed Thasin ah




