如何在XLSX文件中选取指定单元格？及Python中高效汇总格式异常Excel文件对应数值单元格的方法

阿华AIGC实验室

2026-4-30

Hey there, let's work through this problem—those finicky Excel files that break when you import them into Python are the worst, but we've got a clear path to fix this and get your cell summation done efficiently. Here's how to tackle it step by step:

1. Fixing Messy Data Imports First

The root issue is likely hidden formatting (like text-formatted numbers, merged cells, or wonky headers) throwing off your imports. Try these tweaks:

For .xlsx files, use the openpyxl engine—it handles complex Excel formats way better than the default. Skip auto-detecting headers with header=None to keep cell positions intact:
```
import pandas as pd
df = pd.read_excel("your_tricky_file.xlsx", engine="openpyxl", header=None)
```
If cells have text-formatted numbers (or hidden characters), force-convert them to numeric values. This turns unparseable content into 0 (adjust to NaN if you prefer):
```
df = df.apply(pd.to_numeric, errors="coerce").fillna(0)
```
For merged cells or hidden rows/columns, openpyxl lets you access the original cell directly (e.g., if E16-E18 are merged, ws["E16"] will still return the merged value).

2. How to Select Specific Excel Cells (Like E16)

Two reliable methods, depending on your workflow:

Openpyxl (exact Excel cell matching)
This is perfect if you need to target cells by their exact Excel coordinates (E16, B9, etc.):

from openpyxl import load_workbook

wb = load_workbook("your_file.xlsx", data_only=True)  # Use data_only=True to get calculated values, not formulas
ws = wb.active  # Or specify a sheet by name: wb["SalesReport"]
cell_value = ws["E16"].value
# Handle empty cells by setting them to 0
cell_value = cell_value if cell_value is not None else 0

Pandas (for data frame workflows)
Remember pandas uses 0-indexed rows/columns, so Excel's E16 translates to row index 15, column index 4:

import pandas as pd
df = pd.read_excel("your_file.xlsx", engine="openpyxl", header=None)
cell_value = df.iloc[15, 4]  # 15 = 16th row, 4 = 5th column (E)
cell_value = cell_value if pd.notna(cell_value) else 0

3. Efficiently Sum Specific Cells Across All Files

Combine the above to batch-process all your files and build your summary:

import os
from openpyxl import load_workbook, Workbook

# Set your folder path here
folder_path = "path/to/your/excel_files"
# Track sums for each target cell (add more cells to this list as needed)
target_cells = ["E16", "B9", "C22", "G30"]
cell_sums = {cell: 0 for cell in target_cells}

# Loop through all .xlsx files in the folder
for filename in os.listdir(folder_path):
    if filename.endswith(".xlsx"):
        file_path = os.path.join(folder_path, filename)
        wb = load_workbook(file_path, data_only=True)
        ws = wb.active
        
        # Add each cell's value to the total
        for cell in target_cells:
            value = ws[cell].value
            cell_sums[cell] += value if value is not None else 0

# Create a new Excel file for the summary
summary_wb = Workbook()
summary_ws = summary_wb.active

# Write totals to their corresponding cells
for cell, total in cell_sums.items():
    summary_ws[cell] = total

# Save the final summary
summary_wb.save("excel_summary.xlsx")

Quick Extra Tips

If you have old .xls files, swap openpyxl for xlrd (note: install version 1.2.0, since newer xlrd versions don't support .xlsx).
For large numbers of files, use glob to simplify file matching: from glob import glob; files = glob(os.path.join(folder_path, "*.xlsx"))
If files use non-standard sheet names, replace wb.active with wb["YourSheetName"] to target the right sheet every time.

内容的提问来源于stack exchange，提问作者RandallCloud