编写Python函数:生成DataFrame列和并对比两DataFrame差异
Solution: Compare DataFrame Column Sums with Expected Totals
Got it, let's build a practical pandas-based function that checks off both your requirements—calculating column sums and validating them against your expected totals. I'll break this down step by step so it's easy to follow and adapt.
Step 1: Calculate Column Sums & Generate df1
First, we'll take your input DataFrame, compute sums for each numeric column, and append a "Total" row to create df1. Here's the code with comments to explain each part:
import pandas as pd def calculate_and_validate_totals(input_df, df2): # Step 1: Compute column sums and build df1 with Total row # Calculate sums for all numeric columns (skips non-numeric like 'name') column_sums = input_df.sum(numeric_only=True) # Add the 'name' label for our Total row column_sums['name'] = 'Total' # Append the Total row to the original DataFrame to get df1 df1 = input_df.append(column_sums.to_frame().T, ignore_index=True) # Step 2: Compare df1's Total row with df2's True_Total row # Extract the two rows we need for comparison total_row = df1[df1['name'] == 'Total'].squeeze() true_total_row = df2[df2['name'] == 'True_Total'].squeeze() # Define which columns to compare (exclude the 'name' column since it's just a label) compare_cols = [col for col in input_df.columns if col != 'name'] # Track match status, differences, and mismatched columns match_status = {} difference_details = [] mismatched_cols = [] for col in compare_cols: calc_total = total_row[col] expected_total = true_total_row[col] if calc_total == expected_total: match_status[col] = "✅ Matched" else: match_status[col] = "❌ Mismatched" diff = calc_total - expected_total difference_details.append(f"Column '{col}': Calculated ({calc_total}) vs Expected ({expected_total}) | Difference: {diff}") mismatched_cols.append(col) # Print the results in a readable format print("=== Match Status Summary ===") for col, status in match_status.items(): print(f"{col}: {status}") if difference_details: print("\n=== Difference Breakdown ===") for detail in difference_details: print(detail) print("\n=== Mismatched Columns ===") print(f"Columns with inconsistent totals: {', '.join(mismatched_cols)}") else: print("\n🎉 All columns match the expected True_Total values!") # Return df1 in case you need to use it for further analysis return df1
How to Test the Function
Let's use your example data to verify everything works as expected:
# Example input DataFrame (matches your structure with T, N1, N2 rows) input_data = { 'name': ['T', 'N1', 'N2'], 'name1': [100, 10, 134], # Sum: 244 'name2': [50, 22, 50], # Sum: 122 'name3': [80, 62, 40], # Sum: 182 'name4': [30, 29, 20], # Sum: 79 'name5': [70, 49, 50] # Sum: 169 } input_df = pd.DataFrame(input_data) # Example df2 with True_Total row df2_data = { 'name': ['True_Total'], 'name1': [244], 'name2': [122], 'name3': [182], 'name4': [79], 'name5': [169] } df2 = pd.DataFrame(df2_data) # Run the function df1_result = calculate_and_validate_totals(input_df, df2)
Expected Output
When you run the test code, you'll see:
=== Match Status Summary === name1: ✅ Matched name2: ✅ Matched name3: ✅ Matched name4: ✅ Matched name5: ✅ Matched 🎉 All columns match the expected True_Total values!
If you tweak one of the expected values (e.g., set name1 to 245 in df2), the output will flag the mismatch clearly:
=== Match Status Summary === name1: ❌ Mismatched name2: ✅ Matched name3: ✅ Matched name4: ✅ Matched name5: ✅ Matched === Difference Breakdown === Column 'name1': Calculated (244) vs Expected (245) | Difference: -1 === Mismatched Columns === Columns with inconsistent totals: name1
Quick Notes
- The
numeric_only=Trueflag ensures we don't try to sum non-numeric columns like 'name'. squeeze()converts single-row DataFrames into Series, making column-wise comparison simpler.- You can adjust the output formatting or return results as variables instead of printing if you need to integrate this into a larger script.
内容的提问来源于stack exchange,提问作者Richard21




