You need to enable JavaScript to run this app.
优惠活动
大模型
产品
解决方案
定价
更多
文档控制台
免费开始使用

如何将DataFrame转宽格式?实现自定义列数的分栏显示函数

Hey there! Let’s break down your two pandas-related questions with practical examples and explanations.

1. Converting a DataFrame from Long to Wide Format

In pandas, the most straightforward tools for this task are pivot() and pivot_table(). Here’s how to use them:

Using pivot() (for unique index-column pairs)

This works best when you don’t have duplicate entries for the combination of your chosen index and columns. Let’s walk through an example:

import pandas as pd

# Sample long-format DataFrame
long_df = pd.DataFrame({
    "user_id": [1, 1, 2, 2],
    "metric": ["height", "weight", "height", "weight"],
    "value": [175, 70, 160, 55]
})

# Convert to wide format
wide_df = long_df.pivot(
    index="user_id",    # Column to use as rows in the wide format
    columns="metric",   # Column whose values become new columns
    values="value"      # Values to fill in the new wide columns
).reset_index()

# Clean up the column name hierarchy
wide_df.columns.name = None

The result will have user_id as a regular column, with height and weight as separate columns holding the corresponding values.

Using pivot_table() (for handling duplicates)

If you have repeated entries for the same index-column pair (e.g., a user’s height recorded multiple times), pivot() will throw an error. Instead, use pivot_table() with an aggregation function to combine those duplicates:

# Long-format DataFrame with duplicate entries
long_df_duplicates = pd.DataFrame({
    "user_id": [1, 1, 1, 2],
    "metric": ["height", "height", "weight", "height"],
    "value": [175, 176, 70, 160]
})

# Convert to wide format with aggregation
wide_df_agg = long_df_duplicates.pivot_table(
    index="user_id",
    columns="metric",
    values="value",
    aggfunc="mean"  # Use mean to combine duplicates; can also use sum, first, etc.
).reset_index()

wide_df_agg.columns.name = None
2. Splitting a Large DataFrame into Specified Columns for Side-by-Side Display

To split a DataFrame into a specified number of columns (side-by-side), we’ll write a function that calculates the correct number of rows per segment, splits the original DataFrame, then concatenates the segments horizontally. Here’s how:

Step-by-Step Function Implementation

def split_df_to_columns(df, n_cols):
    total_rows = len(df)
    # Calculate base rows per segment, plus extra rows for the first few segments if there's a remainder
    base_rows = total_rows // n_cols
    extra_rows = total_rows % n_cols
    
    split_segments = []
    start_idx = 0
    
    for segment_num in range(n_cols):
        # Determine how many rows this segment should have
        end_idx = start_idx + base_rows + (1 if segment_num < extra_rows else 0)
        # Extract the segment and reset its index to avoid alignment issues
        segment = df.iloc[start_idx:end_idx].reset_index(drop=True)
        # Rename columns to distinguish segments (optional but helpful)
        segment.columns = [f"{col}_part{segment_num + 1}" for col in segment.columns]
        split_segments.append(segment)
        
        start_idx = end_idx
    
    # Concatenate all segments horizontally
    return pd.concat(split_segments, axis=1)

Example Usage

Let’s test this with a sample large DataFrame:

# Create a sample DataFrame with 20 rows
large_df = pd.DataFrame({
    "col1": range(1, 21),
    "col2": [f"value_{x}" for x in range(1, 21)]
})

# Split into 3 side-by-side columns
split_result = split_df_to_columns(large_df, n_cols=3)
print(split_result)

This will split the 20-row DataFrame into 3 segments: the first two segments have 7 rows each, and the third has 6 rows, all displayed side-by-side as columns.


内容的提问来源于stack exchange,提问作者Lorenzo Benassi

火山引擎 最新活动