如何使用Python的openpyxl遍历Excel指定列?含重复列名问题
You're already off to a solid start with importing those libraries—let's break down how to iterate through column J and handle those tricky duplicate column names like "Sample".
1. Iterating through Column J with openpyxl
Your existing code loads the workbook correctly; now we just need to target column J and loop through its values. A quick reminder: openpyxl uses 1-based indexing, so column J is the 10th column. Here's the full code to make this work:
from openpyxl import load_workbook # Load your Excel workbook wb = load_workbook('workbook.xlsx') # Grab the active worksheet (or specify a sheet by name: wb['YourSheetName']) ws = wb.active # Loop through all cells in column J (min_col and max_col both set to 10) print("All values in Column J:") for col_cells in ws.iter_cols(min_col=10, max_col=10, values_only=True): # iter_cols returns tuples, so we loop through each value inside the tuple for value in col_cells: if value is not None: # Optional: skip empty cells if needed print(value)
The values_only=True parameter lets you pull cell values directly instead of cell objects, which is exactly what you need for displaying content.
2. Handling Duplicate Column Names
Duplicate column names can mess with data access, but we can fix this by either making names unique or targeting all columns with the original duplicate name. Here are two approaches depending on whether you prefer pandas or sticking with openpyxl:
Option A: Using Pandas (great for data manipulation)
If you're using pandas to load your data, deduplicating column names is straightforward:
import pandas as pd # Load the Excel file, assuming the first row is your column headers df = pd.read_excel('workbook.xlsx', header=0) # Check which column names are duplicated duplicate_names = df.columns[df.columns.duplicated()].unique() print(f"Found duplicate column names: {list(duplicate_names)}") # Automatically add suffixes to duplicates (e.g., Sample, Sample_1, Sample_2) df.columns = pd.io.parsers.base_parser.ParserBase({'usecols': None})._maybe_dedup_names(df.columns) # Now you can access all original "Sample" columns using the new unique names sample_columns = [col for col in df.columns if 'Sample' in col] print("\nValues from all Sample columns:") print(df[sample_columns])
Option B: Using Openpyxl (direct workbook manipulation)
If you want to work directly with openpyxl, we can either rename duplicate headers or collect all columns with the original duplicate name:
from openpyxl import load_workbook wb = load_workbook('workbook.xlsx') ws = wb.active # Get the original column names from the first row original_col_names = [cell.value for cell in ws[1]] # Create unique names by adding suffixes to duplicates unique_col_names = [] name_counter = {} for name in original_col_names: if name in name_counter: name_counter[name] += 1 unique_col_names.append(f"{name}_{name_counter[name]}") else: name_counter[name] = 0 unique_col_names.append(name) # Print the deduplicated names print("Deduplicated column names:") print(unique_col_names) # To get all values from columns originally named "Sample" sample_col_indices = [i+1 for i, name in enumerate(original_col_names) if name == 'Sample'] print("\nValues from all original 'Sample' columns:") for idx in sample_col_indices: print(f"\nColumn {idx} (Original 'Sample'):") # Skip the header row by starting from row 2 for cell in ws.iter_cols(min_col=idx, max_col=idx, values_only=True): for value in cell[1:]: if value is not None: print(value)
Hope this clears things up for you! You're doing awesome as a Python beginner—keep experimenting and asking questions.
内容的提问来源于stack exchange,提问作者Ramya




