如何将DataFrame转宽格式?实现自定义列数的分栏显示函数
Hey there! Let’s break down your two pandas-related questions with practical examples and explanations.
In pandas, the most straightforward tools for this task are pivot() and pivot_table(). Here’s how to use them:
Using pivot() (for unique index-column pairs)
This works best when you don’t have duplicate entries for the combination of your chosen index and columns. Let’s walk through an example:
import pandas as pd # Sample long-format DataFrame long_df = pd.DataFrame({ "user_id": [1, 1, 2, 2], "metric": ["height", "weight", "height", "weight"], "value": [175, 70, 160, 55] }) # Convert to wide format wide_df = long_df.pivot( index="user_id", # Column to use as rows in the wide format columns="metric", # Column whose values become new columns values="value" # Values to fill in the new wide columns ).reset_index() # Clean up the column name hierarchy wide_df.columns.name = None
The result will have user_id as a regular column, with height and weight as separate columns holding the corresponding values.
Using pivot_table() (for handling duplicates)
If you have repeated entries for the same index-column pair (e.g., a user’s height recorded multiple times), pivot() will throw an error. Instead, use pivot_table() with an aggregation function to combine those duplicates:
# Long-format DataFrame with duplicate entries long_df_duplicates = pd.DataFrame({ "user_id": [1, 1, 1, 2], "metric": ["height", "height", "weight", "height"], "value": [175, 176, 70, 160] }) # Convert to wide format with aggregation wide_df_agg = long_df_duplicates.pivot_table( index="user_id", columns="metric", values="value", aggfunc="mean" # Use mean to combine duplicates; can also use sum, first, etc. ).reset_index() wide_df_agg.columns.name = None
To split a DataFrame into a specified number of columns (side-by-side), we’ll write a function that calculates the correct number of rows per segment, splits the original DataFrame, then concatenates the segments horizontally. Here’s how:
Step-by-Step Function Implementation
def split_df_to_columns(df, n_cols): total_rows = len(df) # Calculate base rows per segment, plus extra rows for the first few segments if there's a remainder base_rows = total_rows // n_cols extra_rows = total_rows % n_cols split_segments = [] start_idx = 0 for segment_num in range(n_cols): # Determine how many rows this segment should have end_idx = start_idx + base_rows + (1 if segment_num < extra_rows else 0) # Extract the segment and reset its index to avoid alignment issues segment = df.iloc[start_idx:end_idx].reset_index(drop=True) # Rename columns to distinguish segments (optional but helpful) segment.columns = [f"{col}_part{segment_num + 1}" for col in segment.columns] split_segments.append(segment) start_idx = end_idx # Concatenate all segments horizontally return pd.concat(split_segments, axis=1)
Example Usage
Let’s test this with a sample large DataFrame:
# Create a sample DataFrame with 20 rows large_df = pd.DataFrame({ "col1": range(1, 21), "col2": [f"value_{x}" for x in range(1, 21)] }) # Split into 3 side-by-side columns split_result = split_df_to_columns(large_df, n_cols=3) print(split_result)
This will split the 20-row DataFrame into 3 segments: the first two segments have 7 rows each, and the third has 6 rows, all displayed side-by-side as columns.
内容的提问来源于stack exchange,提问作者Lorenzo Benassi




