如何优化基于openpyxl的Python办公自动化脚本以提升执行速度?
Hey there! Looking at your code, the slowdown is almost certainly coming from repeated single-cell operations and redundant attribute access—openpyxl isn’t optimized for handling hundreds of individual cell reads/writes and style copies efficiently. Let’s walk through the most impactful fixes to get this running faster:
1. Cache Your Worksheet Objects (Quick Win!)
Right now, every loop iteration calls source_wb['MASTER FILE'] to fetch the worksheet. This forces openpyxl to re-locate the sheet every time, which adds up fast. Just store the source worksheet in a variable once at the start:
source_ws = source_wb['MASTER FILE']
Now you can use source_ws in your loops instead, cutting out that repeated lookup overhead.
2. Use Read-Only Mode for the Source Workbook
Openpyxl’s default load mode loads every detail of every cell, which is slow for large files. Since you’re only reading from the source workbook, use read_only=True to load it in a lightweight, fast-reading mode:
source_wb = load_workbook("GUARANTEE MASTER FILE (1).xlsx", read_only=True)
This skips loading unnecessary background data and speeds up cell access drastically.
3. Batch Process Rows Instead of Individual Cells
Nested loops over every single cell are inefficient. Instead, use iter_rows() to fetch entire rows at once from the source sheet, then process each row’s cells in one go. This reduces the number of low-level operations openpyxl has to perform:
# Iterate over rows 2 to 299 (matches your original range(2, 300)) for row_idx, source_row in enumerate(source_ws.iter_rows(min_row=2, max_row=299, min_col=1, max_col=299), start=2): target_row_idx = row_idx + 72 # Now loop through each cell in the row for col_idx, source_cell in enumerate(source_row, start=1): if col_idx < 15 or (col_idx > 17 and col_idx != 20): target_cell = ws.cell(row=target_row_idx, column=col_idx) target_cell.value = source_cell.value # Copy styles (we'll optimize this next) target_cell.border = copy(source_cell.border) target_cell.fill = copy(source_cell.fill) target_cell.alignment = copy(source_cell.alignment) target_cell.protection = copy(source_cell.protection) target_cell.font = copy(source_cell.font) target_cell.number_format = source_cell.number_format # No need to copy strings!
Note: number_format is a string, so you can assign it directly instead of using copy()—saves a tiny bit of extra work.
4. Cache Reused Styles to Avoid Duplicate Copies
If multiple cells share the same style (which is common in Excel files), you’re wasting time copying the same style object over and over. Add a cache to store styles you’ve already copied, then reuse them:
# Create a cache to store unique styles style_cache = {} for row_idx, source_row in enumerate(source_ws.iter_rows(min_row=2, max_row=299, min_col=1, max_col=299), start=2): target_row_idx = row_idx + 72 for col_idx, source_cell in enumerate(source_row, start=1): if col_idx < 15 or (col_idx > 17 and col_idx != 20): target_cell = ws.cell(row=target_row_idx, column=col_idx) target_cell.value = source_cell.value # Create a unique key for the cell's style style_key = ( source_cell.border, source_cell.fill, source_cell.alignment, source_cell.protection, source_cell.font, source_cell.number_format ) # Only copy the style if we haven't seen this key before if style_key not in style_cache: style_cache[style_key] = { 'border': copy(source_cell.border), 'fill': copy(source_cell.fill), 'alignment': copy(source_cell.alignment), 'protection': copy(source_cell.protection), 'font': copy(source_cell.font), 'number_format': source_cell.number_format } # Apply the cached style to the target cell style = style_cache[style_key] target_cell.border = style['border'] target_cell.fill = style['fill'] target_cell.alignment = style['alignment'] target_cell.protection = style['protection'] target_cell.font = style['font'] target_cell.number_format = style['number_format']
This cuts down on the number of copy() calls, which are surprisingly expensive in openpyxl.
5. Optional: Skip Non-Essential Styles
If your use case doesn’t require every style detail (like borders or fill colors), skip copying those. Just transfer the value and number format—this will make the code run even faster. Only copy styles that are actually necessary for your task.
Optimized Full Code
Here’s all the fixes combined into one script:
import time from openpyxl import load_workbook from copy import copy start_time = time.time() # Load source in read-only mode for speed source_wb = load_workbook("GUARANTEE MASTER FILE (1).xlsx", read_only=True) target_wb = load_workbook(r"C:\Users\User1\OneDrive\TEST\GUARANTEE MASTER FILE (2).xlsx") # Cache worksheets to avoid repeated lookups source_ws = source_wb['MASTER FILE'] target_ws = target_wb['MASTER FILE'] # Cache styles to avoid duplicate copies style_cache = {} # Batch process rows for row_idx, source_row in enumerate(source_ws.iter_rows(min_row=2, max_row=299, min_col=1, max_col=299), start=2): target_row_idx = row_idx + 72 for col_idx, source_cell in enumerate(source_row, start=1): # Skip columns you don't need to process if col_idx < 15 or (col_idx > 17 and col_idx != 20): target_cell = target_ws.cell(row=target_row_idx, column=col_idx) target_cell.value = source_cell.value # Reuse cached styles when possible style_key = ( source_cell.border, source_cell.fill, source_cell.alignment, source_cell.protection, source_cell.font, source_cell.number_format ) if style_key not in style_cache: style_cache[style_key] = { 'border': copy(source_cell.border), 'fill': copy(source_cell.fill), 'alignment': copy(source_cell.alignment), 'protection': copy(source_cell.protection), 'font': copy(source_cell.font), 'number_format': source_cell.number_format } style = style_cache[style_key] target_cell.border = style['border'] target_cell.fill = style['fill'] target_cell.alignment = style['alignment'] target_cell.protection = style['protection'] target_cell.font = style['font'] target_cell.number_format = style['number_format'] target_wb.save(r"C:\Users\User1\OneDrive\TEST\GUARANTEE MASTER FILE (2).xlsx") print("--- %s seconds ---" % (time.time() - start_time))
These changes should give you a noticeable speed boost—especially the read-only mode and batch row processing. Let me know if you still see slowdowns!
内容的提问来源于stack exchange,提问作者cwammad




