You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何使用Python拆分Excel表格中的句子?现有单句代码需适配

Fixing Excel Content Splitting in Python

Hey there! Let's get your Excel content split properly. Right now, your code only opens the Excel file but doesn't actually read or process the data inside it—you're using a hardcoded string s to test the split function, which is why it can't handle your spreadsheet's content. Also, that truncated output (['Dead poet soci...']) is just a display quirk when printing long lists, not an issue with the split logic itself.

Here are two straightforward solutions to process your Excel file, depending on whether you prefer using pandas (simpler for table data) or sticking with xlrd:

Pandas makes it super easy to manipulate Excel data. This example will split comma-separated values in a target column into individual rows, while keeping other columns from your original sheet intact:

import pandas as pd

# 1. Read your Excel file into a DataFrame
df = pd.read_excel("sample_docu5.xlsx")

# 2. Replace "Content" with the actual name of your column containing comma-separated values
#    Split each cell, strip extra spaces, and expand into multiple rows
split_series = df["Content"].str.split(",", expand=True).stack().reset_index(level=1, drop=True).str.strip()

# 3. Merge the split values back with the original data (dropping the old unsplit column)
result_df = df.drop("Content", axis=1).join(split_series.rename("Split_Content"))

# 4. Save the result to a new Excel file
result_df.to_excel("split_output.xlsx", index=False)

How this works:

  • str.split(",", expand=True) splits each cell's content into separate columns
  • .stack() converts those columns into rows, pairing each split value with the original row's other data
  • .str.strip() removes any extra spaces around split values (like the space after each comma)
  • The final join combines the split values back with your original data, and we save it to a new file.

If you want to split values into separate columns instead of rows, just use the expanded split directly:

split_columns_df = df["Content"].str.split(",", expand=True).rename(columns=lambda x: f"Item_{x+1}")
final_df = pd.concat([df, split_columns_df], axis=1)
final_df.to_excel("split_columns_output.xlsx", index=False)

Solution 2: Using xlrd (Lower-Level Control)

If you want to stick with xlrd for reading, you can manually iterate through each row and split the content:

import xlrd
import pandas as pd

# 1. Open the Excel file and select the first sheet
wb = xlrd.open_workbook("sample_docu5.xlsx")
sheet = wb.sheet_by_index(0)

# 2. Store split data in a list
split_data = []

# 3. Iterate through every row in the sheet
for row_idx in range(sheet.nrows):
    row_values = sheet.row_values(row_idx)
    # Replace index 0 with the column index that has your comma-separated content
    target_content = row_values[0]
    
    if target_content:  # Skip empty cells
        # Split content and remove extra spaces
        split_items = [item.strip() for item in target_content.split(",")]
        # For each split item, create a new row with the original row's other data
        for item in split_items:
            new_row = row_values.copy()
            new_row[0] = item
            split_data.append(new_row)

# 4. Write the split data to a new Excel file
pd.DataFrame(split_data).to_excel("split_output.xlsx", index=False, header=False)

Just adjust the column index (like row_values[0]) to match which column in your Excel has the content you want to split.

内容的提问来源于stack exchange,提问作者Sanchari Ghosh

火山引擎 最新活动