如何在Python的Pandas DataFrame中提取指定单词后的特定文本？及提取text字段内容时遇TypeError的解决咨询

阿华AIGC实验室

2026-4-27

Hey there! Let's tackle your two Pandas questions one by one:

1. Extracting text after a specific word in a Pandas DataFrame

The most reliable way to do this is using regular expressions with str.extract, which lets you target the exact word and capture everything that follows it. Here's a concrete example:

Suppose you have a DataFrame with a column containing strings like "text: your desired content" and you want to pull out everything after "text:":

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'content': [
        "text: extract this part",
        "random leading text text: another example",
        "no matching keyword here"
    ]
})

# Extract text after "text:" (ignores any spaces right after the keyword)
df['extracted_text'] = df['content'].str.extract(r'text:\s*(.*)', expand=False)

Breakdown of the regex:

text:\s* matches the exact keyword "text:" plus any number of spaces that follow it
(.*) captures all characters after that point (the () creates a capture group that str.extract will return)
expand=False ensures we get a Series back, which fits perfectly into a new DataFrame column

If you need to target a different word, just replace "text:" in the regex with your desired keyword (e.g., r'keyword:\s*(.*)').

2. Fixing the "TypeError: Object 'float' is not writable" error

Your error happens because some values in the fields.description.content column are missing values (NaN). In Pandas, NaNs are stored as float objects, and you can't use dictionary-style indexing (x['text']) on a float.

Here are two simple fixes:

Option 1: Use `str.get()` (cleanest approach)

Pandas has a built-in str.get() method that safely pulls values from dictionary-like objects, and returns NaN for non-dictionary values (like missing data):

issues_df['new_column'] = issues_df['fields.description.content'].str.get('text')

Option 2: Add a type check in your lambda

If you prefer using apply, add a check to make sure x is a dictionary before accessing its 'text' key:

import pandas as pd

issues_df['new_column'] = issues_df['fields.description.content'].apply(
    lambda x: x['text'] if isinstance(x, dict) else pd.NA
)

Both approaches will handle missing values gracefully without throwing errors. The first option is more concise and idiomatic for Pandas, so it's my go-to recommendation.

内容的提问来源于stack exchange，提问作者Junior P