You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何优化基于openpyxl的Python办公自动化脚本以提升执行速度?

Speed Up Your Openpyxl Excel Automation Code

Hey there! Looking at your code, the slowdown is almost certainly coming from repeated single-cell operations and redundant attribute access—openpyxl isn’t optimized for handling hundreds of individual cell reads/writes and style copies efficiently. Let’s walk through the most impactful fixes to get this running faster:

1. Cache Your Worksheet Objects (Quick Win!)

Right now, every loop iteration calls source_wb['MASTER FILE'] to fetch the worksheet. This forces openpyxl to re-locate the sheet every time, which adds up fast. Just store the source worksheet in a variable once at the start:

source_ws = source_wb['MASTER FILE']

Now you can use source_ws in your loops instead, cutting out that repeated lookup overhead.

2. Use Read-Only Mode for the Source Workbook

Openpyxl’s default load mode loads every detail of every cell, which is slow for large files. Since you’re only reading from the source workbook, use read_only=True to load it in a lightweight, fast-reading mode:

source_wb = load_workbook("GUARANTEE MASTER FILE (1).xlsx", read_only=True)

This skips loading unnecessary background data and speeds up cell access drastically.

3. Batch Process Rows Instead of Individual Cells

Nested loops over every single cell are inefficient. Instead, use iter_rows() to fetch entire rows at once from the source sheet, then process each row’s cells in one go. This reduces the number of low-level operations openpyxl has to perform:

# Iterate over rows 2 to 299 (matches your original range(2, 300))
for row_idx, source_row in enumerate(source_ws.iter_rows(min_row=2, max_row=299, min_col=1, max_col=299), start=2):
    target_row_idx = row_idx + 72
    # Now loop through each cell in the row
    for col_idx, source_cell in enumerate(source_row, start=1):
        if col_idx < 15 or (col_idx > 17 and col_idx != 20):
            target_cell = ws.cell(row=target_row_idx, column=col_idx)
            target_cell.value = source_cell.value
            # Copy styles (we'll optimize this next)
            target_cell.border = copy(source_cell.border)
            target_cell.fill = copy(source_cell.fill)
            target_cell.alignment = copy(source_cell.alignment)
            target_cell.protection = copy(source_cell.protection)
            target_cell.font = copy(source_cell.font)
            target_cell.number_format = source_cell.number_format  # No need to copy strings!

Note: number_format is a string, so you can assign it directly instead of using copy()—saves a tiny bit of extra work.

4. Cache Reused Styles to Avoid Duplicate Copies

If multiple cells share the same style (which is common in Excel files), you’re wasting time copying the same style object over and over. Add a cache to store styles you’ve already copied, then reuse them:

# Create a cache to store unique styles
style_cache = {}

for row_idx, source_row in enumerate(source_ws.iter_rows(min_row=2, max_row=299, min_col=1, max_col=299), start=2):
    target_row_idx = row_idx + 72
    for col_idx, source_cell in enumerate(source_row, start=1):
        if col_idx < 15 or (col_idx > 17 and col_idx != 20):
            target_cell = ws.cell(row=target_row_idx, column=col_idx)
            target_cell.value = source_cell.value
            
            # Create a unique key for the cell's style
            style_key = (
                source_cell.border, source_cell.fill, source_cell.alignment,
                source_cell.protection, source_cell.font, source_cell.number_format
            )
            
            # Only copy the style if we haven't seen this key before
            if style_key not in style_cache:
                style_cache[style_key] = {
                    'border': copy(source_cell.border),
                    'fill': copy(source_cell.fill),
                    'alignment': copy(source_cell.alignment),
                    'protection': copy(source_cell.protection),
                    'font': copy(source_cell.font),
                    'number_format': source_cell.number_format
                }
            
            # Apply the cached style to the target cell
            style = style_cache[style_key]
            target_cell.border = style['border']
            target_cell.fill = style['fill']
            target_cell.alignment = style['alignment']
            target_cell.protection = style['protection']
            target_cell.font = style['font']
            target_cell.number_format = style['number_format']

This cuts down on the number of copy() calls, which are surprisingly expensive in openpyxl.

5. Optional: Skip Non-Essential Styles

If your use case doesn’t require every style detail (like borders or fill colors), skip copying those. Just transfer the value and number format—this will make the code run even faster. Only copy styles that are actually necessary for your task.

Optimized Full Code

Here’s all the fixes combined into one script:

import time
from openpyxl import load_workbook
from copy import copy

start_time = time.time()

# Load source in read-only mode for speed
source_wb = load_workbook("GUARANTEE MASTER FILE (1).xlsx", read_only=True)
target_wb = load_workbook(r"C:\Users\User1\OneDrive\TEST\GUARANTEE MASTER FILE (2).xlsx")

# Cache worksheets to avoid repeated lookups
source_ws = source_wb['MASTER FILE']
target_ws = target_wb['MASTER FILE']

# Cache styles to avoid duplicate copies
style_cache = {}

# Batch process rows
for row_idx, source_row in enumerate(source_ws.iter_rows(min_row=2, max_row=299, min_col=1, max_col=299), start=2):
    target_row_idx = row_idx + 72
    for col_idx, source_cell in enumerate(source_row, start=1):
        # Skip columns you don't need to process
        if col_idx < 15 or (col_idx > 17 and col_idx != 20):
            target_cell = target_ws.cell(row=target_row_idx, column=col_idx)
            target_cell.value = source_cell.value
            
            # Reuse cached styles when possible
            style_key = (
                source_cell.border, source_cell.fill, source_cell.alignment,
                source_cell.protection, source_cell.font, source_cell.number_format
            )
            
            if style_key not in style_cache:
                style_cache[style_key] = {
                    'border': copy(source_cell.border),
                    'fill': copy(source_cell.fill),
                    'alignment': copy(source_cell.alignment),
                    'protection': copy(source_cell.protection),
                    'font': copy(source_cell.font),
                    'number_format': source_cell.number_format
                }
            
            style = style_cache[style_key]
            target_cell.border = style['border']
            target_cell.fill = style['fill']
            target_cell.alignment = style['alignment']
            target_cell.protection = style['protection']
            target_cell.font = style['font']
            target_cell.number_format = style['number_format']

target_wb.save(r"C:\Users\User1\OneDrive\TEST\GUARANTEE MASTER FILE (2).xlsx")

print("--- %s seconds ---" % (time.time() - start_time))

These changes should give you a noticeable speed boost—especially the read-only mode and batch row processing. Let me know if you still see slowdowns!

内容的提问来源于stack exchange,提问作者cwammad

火山引擎 最新活动