You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Python regex模块中简单与完整大小写折叠的差异及示例问询

Simple vs. Full Case Folding in the regex Module

Great question! Let's break down the key differences between these two modes, plus share concrete examples where V1 (full folding) matches something V0 (simple folding) won't.

What's the Difference?

First, let's clarify what each folding type actually does:

  • Simple Case Folding (V0 mode): This follows Unicode's simple case-folding rules, which are strictly one-to-one character mappings. Think of it as the "basic" case conversion you're used to—like turning é to É, or и to И. No character counts change here, and this matches the behavior of Python's built-in re module exactly.
  • Full Case Folding (V1 mode): This uses Unicode's full case-folding rules, which handle more complex scenarios: one character expanding to multiple characters, multiple characters collapsing to the same form, or context-dependent variants. This is a more comprehensive approach to case-insensitive matching that adheres more closely to Unicode's full specification.

Example 1: German Sharp S (ß)

The German ß is a perfect example of where the modes diverge:

  • In full folding (V1), ß expands to ss when folded. So a case-insensitive regex will match ß, ss, and SS.
  • In simple folding (V0), ß only maps to itself (since there's no direct one-to-one uppercase equivalent in simple rules), so it won't match ss or SS.

Test this in code:

import regex

# V0 mode: No match for SS against ß
print(regex.search("(?V0i)ß", "SS"))  # Output: None

# V1 mode: Matches SS against ß
print(regex.search("(?V1i)ß", "SS"))  # Output: <regex.Match object; span=(0, 2), match='SS'>

Example 2: Greek Final Sigma (ς)

Greek has two lowercase variants of sigma: σ (used in the middle of words) and ς (used at the end).

  • In full folding (V1), both variants fold to the uppercase Σ in case-insensitive matching. So a regex for Σ will match both σ and ς.
  • In simple folding (V0), Σ only maps to σ, so it won't recognize the final sigma ς.

Test code:

import regex

# V0 mode: No match for final sigma ς against Σ
print(regex.search("(?V0i)Σ", "ς"))  # Output: None

# V1 mode: Matches final sigma ς against Σ
print(regex.search("(?V1i)Σ", "ς"))  # Output: <regex.Match object; span=(0, 1), match='ς'>

Why Your Test Examples Didn't Show a Difference

In your tests with é and E, neither mode matches because both simple and full case folding don't map é to E—they only convert é to É (and vice versa). To see the split between V0 and V1, you need characters with complex folding rules like the ones above.


内容的提问来源于stack exchange,提问作者iBug

火山引擎 最新活动