You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何使用Python的openpyxl遍历Excel指定列?含重复列名问题

Hey there! Let's solve your Excel automation questions step by step

You're already off to a solid start with importing those libraries—let's break down how to iterate through column J and handle those tricky duplicate column names like "Sample".

1. Iterating through Column J with openpyxl

Your existing code loads the workbook correctly; now we just need to target column J and loop through its values. A quick reminder: openpyxl uses 1-based indexing, so column J is the 10th column. Here's the full code to make this work:

from openpyxl import load_workbook

# Load your Excel workbook
wb = load_workbook('workbook.xlsx')

# Grab the active worksheet (or specify a sheet by name: wb['YourSheetName'])
ws = wb.active

# Loop through all cells in column J (min_col and max_col both set to 10)
print("All values in Column J:")
for col_cells in ws.iter_cols(min_col=10, max_col=10, values_only=True):
    # iter_cols returns tuples, so we loop through each value inside the tuple
    for value in col_cells:
        if value is not None:  # Optional: skip empty cells if needed
            print(value)

The values_only=True parameter lets you pull cell values directly instead of cell objects, which is exactly what you need for displaying content.

2. Handling Duplicate Column Names

Duplicate column names can mess with data access, but we can fix this by either making names unique or targeting all columns with the original duplicate name. Here are two approaches depending on whether you prefer pandas or sticking with openpyxl:

Option A: Using Pandas (great for data manipulation)

If you're using pandas to load your data, deduplicating column names is straightforward:

import pandas as pd

# Load the Excel file, assuming the first row is your column headers
df = pd.read_excel('workbook.xlsx', header=0)

# Check which column names are duplicated
duplicate_names = df.columns[df.columns.duplicated()].unique()
print(f"Found duplicate column names: {list(duplicate_names)}")

# Automatically add suffixes to duplicates (e.g., Sample, Sample_1, Sample_2)
df.columns = pd.io.parsers.base_parser.ParserBase({'usecols': None})._maybe_dedup_names(df.columns)

# Now you can access all original "Sample" columns using the new unique names
sample_columns = [col for col in df.columns if 'Sample' in col]
print("\nValues from all Sample columns:")
print(df[sample_columns])

Option B: Using Openpyxl (direct workbook manipulation)

If you want to work directly with openpyxl, we can either rename duplicate headers or collect all columns with the original duplicate name:

from openpyxl import load_workbook

wb = load_workbook('workbook.xlsx')
ws = wb.active

# Get the original column names from the first row
original_col_names = [cell.value for cell in ws[1]]

# Create unique names by adding suffixes to duplicates
unique_col_names = []
name_counter = {}
for name in original_col_names:
    if name in name_counter:
        name_counter[name] += 1
        unique_col_names.append(f"{name}_{name_counter[name]}")
    else:
        name_counter[name] = 0
        unique_col_names.append(name)

# Print the deduplicated names
print("Deduplicated column names:")
print(unique_col_names)

# To get all values from columns originally named "Sample"
sample_col_indices = [i+1 for i, name in enumerate(original_col_names) if name == 'Sample']

print("\nValues from all original 'Sample' columns:")
for idx in sample_col_indices:
    print(f"\nColumn {idx} (Original 'Sample'):")
    # Skip the header row by starting from row 2
    for cell in ws.iter_cols(min_col=idx, max_col=idx, values_only=True):
        for value in cell[1:]:
            if value is not None:
                print(value)

Hope this clears things up for you! You're doing awesome as a Python beginner—keep experimenting and asking questions.

内容的提问来源于stack exchange,提问作者Ramya

火山引擎 最新活动