You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何让Pandas to_excel()保留重复列名而非自动重命名?

Alright, let's tackle this problem head-on. The issue here is that pandas' to_excel() method automatically mangles duplicate column names by appending .1, .2, etc., and the mangle_dupe_cols=False parameter isn't supported yet for writing Excel files. Here's how you can get around this and preserve your original duplicate column names:

Step 1: Make sure your DataFrame keeps the original duplicate column names

First, double-check that when you read old_wb.xlsx, you're actually retaining the duplicate column names. By default, pd.read_excel() will mangle duplicates on read, so you'll need to extract the headers manually first (using openpyxl) before loading the data:

from openpyxl import load_workbook
import pandas as pd

# Load the original workbook to grab the true headers (after removing merged cells)
wb = load_workbook('old_wb.xlsx')
ws = wb.active

# Adjust the row number here to match where your cleaned headers live (openpyxl uses 1-based indexing)
original_headers = [cell.value for cell in ws[1]]

# Read the data without letting pandas auto-handle headers
df1 = pd.read_excel('old_wb.xlsx', header=None, skiprows=1)

# Assign the original duplicate headers to your DataFrame
df1.columns = original_headers

Step 2: Write the DataFrame to Excel while preserving duplicates

Since df.to_excel() won't let you skip mangling duplicates, we'll bypass pandas' internal logic by writing directly to the Excel file using either xlsxwriter or openpyxl. Here are both options:

Option 1: Using xlsxwriter

from xlsxwriter import Workbook

# Create a new workbook and worksheet
with Workbook('new_wb.xlsx') as workbook:
    worksheet = workbook.add_worksheet('Sheet1')
    
    # Write headers exactly as they are (duplicates included)
    for col_idx, col_name in enumerate(df1.columns):
        worksheet.write(0, col_idx, col_name)
    
    # Write each row of data starting from row 1 (1-based index in xlsxwriter)
    for row_idx, row_data in enumerate(df1.values, start=1):
        for col_idx, value in enumerate(row_data):
            worksheet.write(row_idx, col_idx, value)

Option 2: Using openpyxl

If you prefer openpyxl, this approach works just as well:

from openpyxl import Workbook

wb = Workbook()
ws = wb.active
ws.title = 'Sheet1'

# Write the duplicate headers first
ws.append(df1.columns.tolist())

# Write each row of data
for row in df1.values.tolist():
    ws.append(row)

wb.save('new_wb.xlsx')

Why this works

By writing directly with these Excel libraries, we take full control over what gets written to the file. We skip pandas' automatic column name mangling entirely, so your original duplicate column names stay exactly as they are.

内容的提问来源于stack exchange,提问作者Ricky Aguilar

火山引擎 最新活动