如何用Python读取Excel合并单元格的属性值(解决Ref和Amount属性值无法获取的问题)
如何用Python读取Excel合并单元格的属性值(解决Ref和Amount属性值无法获取的问题)
我看了你的问题、代码和Excel的情况,马上就找到问题根源了——你没考虑到属性名本身是跨列合并的情况!比如"Ref"占了A、B两列,你原来的代码找到"Ref"所在的A列后,直接去取B列的值,但实际对应的值在C列,自然拿不到正确结果。下面给你详细分析和修改后的解决方案:
问题核心原因
- 对于"Ref"、"Amount"这类跨列合并的属性名,它们占了多列,对应的值在合并单元格最后一列的下一列,但你原代码直接用
col_idx +1取值列,只适用于属性名占1列的情况。 - 你写的
get_merged_cell_value函数是用来获取合并单元格的值,但没用到关键的地方——判断属性名本身的合并范围,导致值的列数计算错误。
修改后的完整代码
import os import openpyxl from openpyxl.utils import column_index_from_string file_path = r"D:\file\input\example.xlsx" if os.path.exists(file_path): print("File exists!") else: print("File not found! Check the path.") exit() target_sheet = "Output Approval Templete" # Define the properties to extract properties = [ "Approval Memo of", "Name of the Applicant", "Name of Territory", "Total Family Expenses", "Ref", "Amount", "Total Amount" ] # 新增:获取单元格所在的合并范围(如果是合并单元格) def get_cell_merged_range(sheet, row, col): for merged_range in sheet.merged_cells.ranges: min_row, min_col, max_row, max_col = merged_range.bounds if min_row <= row <= max_row and min_col <= col <= max_col: return (min_row, min_col, max_row, max_col) return None # 不是合并单元格 # Function to get the actual value from a cell (including merged cells) def get_merged_cell_value(sheet, row, col): merged_range = get_cell_merged_range(sheet, row, col) if merged_range: min_row, min_col, _, _ = merged_range return sheet.cell(min_row, min_col).value return sheet.cell(row, col).value # Function to format numeric values properly def format_value(value): if isinstance(value, float) and value > 1e10: # Large numbers like NID return str(int(value)) # Convert to integer and string to avoid scientific notation elif isinstance(value, (int, float)): # General number formatting return str(value) elif value is None: return "N/A" # Handle missing values return str(value).strip() try: # Load the workbook wb = openpyxl.load_workbook(file_path, data_only=True) if target_sheet not in wb.sheetnames: print(f"Sheet '{target_sheet}' not found in the file.") else: ws = wb[target_sheet] extracted_data = {} # Iterate over rows to extract data for row in ws.iter_rows(): for cell in row: # 跳过空值或非字符串单元格,减少无效判断 if not cell.value or not isinstance(cell.value, str): continue prop_name = cell.value.strip() if prop_name in properties: row_idx = cell.row col_idx = cell.column # 检查当前属性名是否在合并单元格中 merged_range = get_cell_merged_range(ws, row_idx, col_idx) if merged_range: # 合并单元格:取合并范围的最后一列,+1得到值的列 _, _, _, max_col = merged_range value_col = max_col + 1 else: # 非合并单元格:取当前列的下一列 value_col = col_idx + 1 # 确保值的列在工作表范围内 if value_col <= ws.max_column: value = get_merged_cell_value(ws, row_idx, value_col) extracted_data[prop_name] = format_value(value) # Print extracted values for key, value in extracted_data.items(): print(f"{key} : {value}") except Exception as e: print(f"Error loading workbook: {e}")
关键修改点说明
- 新增
get_cell_merged_range函数:用来判断属性名所在单元格是否属于合并范围,返回合并区域的起止行/列,这样就能知道属性名占了多少列。 - 优化值列的计算逻辑:
- 如果属性名在合并单元格里,取合并范围的最后一列
max_col,值的列就是max_col +1(比如"Ref"占A、B列,max_col=2,值列就是3即C列) - 如果不是合并单元格,保持原逻辑取当前列+1
- 如果属性名在合并单元格里,取合并范围的最后一列
- 跳过无效单元格:新增判断跳过空值或非字符串单元格,减少不必要的循环判断
你运行修改后的代码,就能正确获取到"Ref"和"Amount"的对应值了!
备注:内容来源于stack exchange,提问作者Enamul Haque




