如何在文本文件中移除em dash后的署名信息（保留引号正文）

阿华AIGC实验室

2026-4-2

看起来你已经搞定了%符号的移除，接下来处理这些署名信息其实挺简单的——毕竟这些署名的格式太固定了：都是以em dash（―）开头，跟着用户名、逗号空格，再是月份缩写加四位年份。咱们可以用正则表达式精准匹配这整段署名内容，直接替换掉就行，完全不会误删引号正文~

方法思路：用正则精准匹配署名格式

因为你的署名有非常明确的模式：―用户名, 月份年份，所以我们可以写一个正则表达式，只匹配符合这个格式的内容，把它替换为空字符串。这样哪怕正文里（虽然你例子里没有）有em dash，只要后面不是这种署名格式，就不会被误删，完美符合你“只删署名，不碰正文”的需求。

Python代码实现

假设你已经用split('%')把原始文本拆成了单个quote的列表，或者是一整段处理后的文本，两种情况都能搞定：

情况1：处理split后的quote列表

import re

# 假设这是你split('%')后得到的quote列表（已经去掉了%）
quotes = [
    "Car horns should only be allowed to be in pitches C, E, and G, so whenever two people honk at the same time it will be in harmony and traffic jams will sound like symphonies. ―bringbackseymour, Mar 2016 ",
    "The best item to protect you from sasquatch attacks is a camera. ―papertank17, Aug 2015 ",
    "Im convinced most of the adults who told me wiki is unreliable, now use viral facebook posts for most of their news sources. ―Sharplynormal, Dec 2015 ",
    "When drone technology becomes cheap enough, hands-free umbrellas are gonna be the shit. ―TremendoSlap, Oct 2016 ",
    "I mostly use my drivers license to buy stuff that impairs my ability to drive. ―mozezus, Jun 2016 "
]

# 先过滤掉split可能产生的空字符串（比如首尾的%导致的）
quotes = [q.strip() for q in quotes if q.strip()]

# 定义匹配署名的正则表达式
credit_pattern = re.compile(r'―\w+, [A-Z][a-z]{2} \d{4}\s*')

# 批量处理每个quote，移除署名
clean_quotes = [credit_pattern.sub('', q) for q in quotes]

# 输出结果看看
for idx, quote in enumerate(clean_quotes, 1):
    print(f"Quote {idx}: {quote}")

情况2：处理一整段连续的文本

如果你的文本还没split成列表，是一整段字符串，直接用正则替换整个文本就行：

import re

# 假设这是你去掉%后的完整文本
raw_text = """Car horns should only be allowed to be in pitches C, E, and G, so whenever two people honk at the same time it will be in harmony and traffic jams will sound like symphonies. ―bringbackseymour, Mar 2016
The best item to protect you from sasquatch attacks is a camera. ―papertank17, Aug 2015
Im convinced most of the adults who told me wiki is unreliable, now use viral facebook posts for most of their news sources. ―Sharplynormal, Dec 2015
When drone technology becomes cheap enough, hands-free umbrellas are gonna be the shit. ―TremendoSlap, Oct 2016
I mostly use my drivers license to buy stuff that impairs my ability to drive. ―mozezus, Jun 2016"""

# 同样的正则匹配署名
credit_pattern = re.compile(r'―\w+, [A-Z][a-z]{2} \d{4}\s*')
clean_text = credit_pattern.sub('', raw_text)

print(clean_text)