Pandas中长字符串列表无法匹配子串的问题及解决方法

阿华AIGC实验室

2026-5-13

问题根源

你的问题出在pandas.DataFrame.to_string()这个方法上：当处理长字符串时，pandas默认会对超出列宽限制的内容进行截断，用...代替被截断的部分。这就导致原本包含VALUE的长字符串，在转成字符串后，VALUE可能被截断成了...，所以你在分割后的列表里找不到这个子串。

解决方案

这里给你两种可行的解决办法，第二种更推荐，因为更符合pandas的使用习惯：

方法1：修改`to_string()`参数，关闭字符串截断

你可以在调用to_string()时，设置max_colwidth=None（pandas 1.0.0及以上版本支持），这样pandas就不会截断任何长字符串了。

import pandas as pd

data2 = {'spike-2': ["yesno yesno yesno yes no yesnoyesno yesnoyesno yesnoyesno", "chairchairchair chairchairchair chairchairchair chairchairchair "], 'hey spke': ["maybe maybe maybe", "yes nyes no ye...VALUE...syes no yesyes no yesyes no yesyes no yeso yes"], 'no': ["yes no yesyes no yes yesyes yesyes yes no yes yes no yesaaaaa...VALUE...govora","yesno"]}
df2 = pd.DataFrame(data2)

# 关闭列宽限制，避免长字符串被截断
new_data2 = df2.to_string(header=False, index=False, index_names=False, max_colwidth=None).split('\n')

for i in new_data2:
    if 'VALUE' in i:
        print('found!')

方法2：用pandas原生字符串方法查找（推荐）

其实完全不需要把整个DataFrame转成字符串再查找，直接用pandas的str.contains()方法在数据层面检查，既高效又不会有截断问题。

import pandas as pd

data2 = {'spike-2': ["yesno yesno yesno yes no yesnoyesno yesnoyesno yesnoyesno", "chairchairchair chairchairchair chairchairchair chairchairchair "], 'hey spke': ["maybe maybe maybe", "yes nyes no ye...VALUE...syes no yesyes no yesyes no yesyes no yeso yes"], 'no': ["yes no yesyes no yes yesyes yesyes yes no yes yes no yesaaaaa...VALUE...govora","yesno"]}
df2 = pd.DataFrame(data2)

# 遍历所有列，检查每个单元格是否包含'VALUE'
for column in df2.columns:
    # 筛选出包含目标子串的行，na=False处理空值情况
    matched_rows = df2[df2[column].str.contains('VALUE', na=False)]
    if not matched_rows.empty:
        print('found!')
        # 如果需要查看具体匹配的内容，取消下面的注释即可
        # print("匹配的行内容：\n", matched_rows)

这种方法的优势在于：它直接操作DataFrame的原始数据，不需要做字符串转换的额外开销，而且能精准定位到包含子串的具体行和列，方便后续处理。

内容的提问来源于stack exchange，提问作者taga