You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Python替换不可打印字符时误删闭合方括号,写入文件求助

Why Your Regex Is Replacing the '5'

First off, let’s break down what’s happening here. Your code uses re.sub("[^[:print:]]", '?', line) to swap non-printable characters with a question mark, but it’s unexpectedly replacing the '5' in your string. Here are the most likely reasons and fixes:

1. The '5' Isn’t a Standard ASCII Digit

Chances are, the character you see as '5' in your line variable isn’t the regular ASCII digit (ordinal value 53). It might be a Unicode character that looks identical (like a full-width digit U+FF15, ordinal 65293) but isn’t part of the ASCII printable set.

To confirm this, run this quick check to inspect each character’s ordinal value:

for char in line:
    print(f"Character: '{char}' | Ordinal: {ord(char)}")

If the '5' shows an ordinal other than 53, that’s your culprit.

2. Python’s [:print:] Uses ASCII by Default

By default, Python’s re module treats POSIX character classes like [:print:] as ASCII-only. That means any non-ASCII printable character (like the full-width 5) will be matched by [^[:print:]] and replaced.

Fix 1: Use the re.UNICODE Flag

Add the re.UNICODE (or re.U) flag to make [:print:] include all Unicode printable characters:

import re
line = 'Testing offset [4,-5]'
fixed_line = re.sub("[^[:print:]]", '?', line, flags=re.UNICODE)
print(fixed_line)  # Should output your expected string

Fix 2: Target Only Actual Control Characters

If you want to replace only true non-printable control characters (not Unicode printable ones), use a regex that explicitly matches ASCII control characters:

fixed_line = re.sub(r'[\x00-\x1F\x7F]', '?', line)

This swaps only control characters (0-31 and 127) while leaving all printable characters—including Unicode ones—intact.

3. Don’t Forget to Assign the Result!

A quick sanity check: re.sub returns a new string (it doesn’t modify the original line in place). Make sure you’re assigning the result to a variable, like:

line = re.sub("[^[:print:]]", '?', line)  # Overwrite the original line

Or store it in a new variable like fixed_line as shown above.


内容的提问来源于stack exchange,提问作者Deepanshu Arora

火山引擎 最新活动