如何将XLSX文件转换为保留精确结构的TXT文件并处理合并单元格
Got it, let’s break down exactly what’s happening and how to fix this:
Why You’re Seeing "Part of a Merged Cell" Errors
Most XLSX parsing libraries (like openpyxl) store merged cell content only in the top-left cell of the merged range. All other cells in the merge are marked as "part of a merged cell"—so when you loop through every cell blindly, you’re pulling those placeholder messages instead of handling merged cells properly.
How to Preserve the Original XLSX Structure in TXT
The goal is to:
- Map all merged cell ranges and their content
- Replicate the row/column layout, including empty cells and aligned columns
- Avoid repeating merged cell content or placeholder messages
Here’s a practical, tested solution using openpyxl (one of the most reliable libraries for XLSX manipulation):
from openpyxl import load_workbook def convert_xlsx_to_structured_txt(input_xlsx, output_txt): # Load the workbook (use data_only=True to get cell values, not formulas) wb = load_workbook(input_xlsx, data_only=True) ws = wb.active # Replace with ws = wb["YourSheetName"] if using a specific sheet # Step 1: Record all merged cell ranges and their content merged_cell_map = {} for merged_range in ws.merged_cells.ranges: # Grab the content from the top-left cell of the merge top_left = ws.cell(row=merged_range.min_row, column=merged_range.min_col) merge_content = str(top_left.value) if top_left.value is not None else "" # Mark all other cells in the merge range to avoid duplicate content for row in range(merged_range.min_row, merged_range.max_row + 1): for col in range(merged_range.min_col, merged_range.max_col + 1): if row != merged_range.min_row or col != merged_range.min_col: merged_cell_map[(row, col)] = merge_content # Step 2: Calculate column widths to keep content aligned (mimic Excel's layout) column_widths = [] for col in ws.columns: max_col_width = 0 for cell in col: if cell.value: cell_length = len(str(cell.value)) if cell_length > max_col_width: max_col_width = cell_length # Add a small buffer to prevent content from overlapping column_widths.append(max_col_width + 2) # Step 3: Write structured content to TXT with open(output_txt, "w", encoding="utf-8") as txt_file: for row_num in range(1, ws.max_row + 1): row_parts = [] for col_num in range(1, ws.max_column + 1): # Handle merged cells: skip content for non-top-left cells in the merge if (row_num, col_num) in merged_cell_map: cell_content = "" else: cell = ws.cell(row=row_num, column=col_num) cell_content = str(cell.value) if cell.value is not None else "" # Align content to match column width (use ljust for left-align, rjust for right) aligned_content = cell_content.ljust(column_widths[col_num - 1]) row_parts.append(aligned_content) # Join all columns for the row and write to file txt_file.write("".join(row_parts) + "\n") # Example usage convert_xlsx_to_structured_txt("your_input_file.xlsx", "structured_output.txt")
Key Details to Adjust for Your File
- Sheet Selection: If your data isn’t in the active sheet, replace
wb.activewithwb["YourSheetName"] - Alignment: Use
rjust()instead ofljust()if your original Excel columns are right-aligned - Cell Content Handling: If you need to preserve line breaks within cells, replace
str(cell.value)withstr(cell.value).replace("\n", " ")(or keep the\nif you want line breaks in the TXT)
If You’re Using Pandas Instead
If you prefer pandas, you can load the workbook with merged cells preserved and then export with structured formatting:
import pandas as pd df = pd.read_excel("your_input_file.xlsx", header=None, merge_cells=True) # Fill merged cell gaps with empty strings to avoid duplicate content df = df.fillna("") # Export with aligned columns using tabs df.to_csv("structured_output.txt", sep="\t", index=False, header=False)
Note: Pandas uses tabs for separation, so the structure will rely on tab stops in your text viewer.
内容的提问来源于stack exchange,提问作者Jeet




