Python原始字符串是否禁用\w/\d元字符?两类re.findall语句差异解析
Hey there! Let's break down these two common questions about Python raw strings and regex metacharacters—they’re easy to mix up when you’re just starting out, so let’s make it clear.
1. 原始字符串是否会像禁用转义字符\n一样,禁用\w、\d这类正则元字符?
Short answer: Absolutely not.
Raw strings (the ones prefixed with r) only affect how Python’s string parser handles backslashes—they don’t change how the regex engine interprets the string. Let’s break down the two types of "escaping" here to avoid confusion:
- Python string-level escaping: Sequences like
\n(newline),\t(tab), or\"(double quote) are defined by Python. When you don’t use a raw string, Python converts these sequences into their corresponding special characters. - Regex engine-level escaping: Sequences like
\d(match digits) or\w(match letters/numbers/underscores) are defined by the regex engine. The backslash here is for the regex engine, not Python.
Since \d and \w aren’t valid Python escape sequences, Python doesn’t touch them—whether you use a raw string or not, it passes the exact sequence \d or \w to the regex engine. The regex engine then recognizes them as metacharacters like usual.
2. 为什么re.findall(r"\d+","i am aged 35")和re.findall("\d+","i am aged 35")效果一致?
This ties directly into the first explanation. Let’s walk through each case:
- When you write
"\d+": Python checks the string for valid escape sequences. Since\disn’t one of them, Python leaves it as-is, passing the string"\d+"(characters\,d,+) straight to the regex engine. - When you write
r"\d+": Therprefix tells Python to ignore all escape processing. So Python also passes the exact same string"\d+"to the regex engine.
Since the regex engine gets identical input in both cases, it produces the same result—matching the digits 35 in your example.
When do raw strings matter for regex?
Raw strings shine when you need to match literal backslashes or avoid Python accidentally interpreting your regex escape as a string escape. For example:
- To match a literal backslash in a string like
"a\b":- Without raw strings: You’d need to write
re.findall("\\\\", "a\\b")(four backslashes total—Python converts two to one literal backslash, and the regex engine needs two to match one literal backslash). - With raw strings: Just write
re.findall(r"\\", "a\\b")—cleaner and easier to read.
- Without raw strings: You’d need to write
Another example: If you wanted to match the literal sequence \n (not a newline character), raw strings let you write r"\\n" instead of the messy "\\\\n".
内容的提问来源于stack exchange,提问作者Pankaj Kulkarni




