如何让Pandas to_excel()保留重复列名而非自动重命名?
Alright, let's tackle this problem head-on. The issue here is that pandas' to_excel() method automatically mangles duplicate column names by appending .1, .2, etc., and the mangle_dupe_cols=False parameter isn't supported yet for writing Excel files. Here's how you can get around this and preserve your original duplicate column names:
Step 1: Make sure your DataFrame keeps the original duplicate column names
First, double-check that when you read old_wb.xlsx, you're actually retaining the duplicate column names. By default, pd.read_excel() will mangle duplicates on read, so you'll need to extract the headers manually first (using openpyxl) before loading the data:
from openpyxl import load_workbook import pandas as pd # Load the original workbook to grab the true headers (after removing merged cells) wb = load_workbook('old_wb.xlsx') ws = wb.active # Adjust the row number here to match where your cleaned headers live (openpyxl uses 1-based indexing) original_headers = [cell.value for cell in ws[1]] # Read the data without letting pandas auto-handle headers df1 = pd.read_excel('old_wb.xlsx', header=None, skiprows=1) # Assign the original duplicate headers to your DataFrame df1.columns = original_headers
Step 2: Write the DataFrame to Excel while preserving duplicates
Since df.to_excel() won't let you skip mangling duplicates, we'll bypass pandas' internal logic by writing directly to the Excel file using either xlsxwriter or openpyxl. Here are both options:
Option 1: Using xlsxwriter
from xlsxwriter import Workbook # Create a new workbook and worksheet with Workbook('new_wb.xlsx') as workbook: worksheet = workbook.add_worksheet('Sheet1') # Write headers exactly as they are (duplicates included) for col_idx, col_name in enumerate(df1.columns): worksheet.write(0, col_idx, col_name) # Write each row of data starting from row 1 (1-based index in xlsxwriter) for row_idx, row_data in enumerate(df1.values, start=1): for col_idx, value in enumerate(row_data): worksheet.write(row_idx, col_idx, value)
Option 2: Using openpyxl
If you prefer openpyxl, this approach works just as well:
from openpyxl import Workbook wb = Workbook() ws = wb.active ws.title = 'Sheet1' # Write the duplicate headers first ws.append(df1.columns.tolist()) # Write each row of data for row in df1.values.tolist(): ws.append(row) wb.save('new_wb.xlsx')
Why this works
By writing directly with these Excel libraries, we take full control over what gets written to the file. We skip pandas' automatic column name mangling entirely, so your original duplicate column names stay exactly as they are.
内容的提问来源于stack exchange,提问作者Ricky Aguilar




