提取关键词паспорт后俄罗斯护照号码的正则表达式需求
Solution to Extract Russian Passport Numbers After "паспорт"
Got it, let's tackle this regex problem step by step. Your goal is to capture the 4-digit and 6-digit parts of a Russian passport number that follows the keyword паспорт, regardless of whether the parts are separated by spaces, a hyphen, or a hyphen with spaces around it.
The Correct Regex
Here's a regex that covers all your specified cases:
паспорт\s+(\d{4})\s*[-]?\s*(\d{6})
Breakdown of the Regex
Let's break down each component to explain how it works:
паспорт: Explicitly matches the keyword (add the/imodifier if you need case-insensitive matching for variations likeПаспорт)\s+: Matches one or more spaces right after the keyword (covers the gap between "паспорт" and the first set of digits)(\d{4}): Captures the first group of 4 digits (this will be your first expected result)\s*[-]?\s*: Handles all possible separators:\s*: Matches 0 or more spaces (covers spaces before/after the hyphen)[-]?: Makes the hyphen optional (so it works with or without the hyphen)
(\d{6}): Captures the second group of 6 digits (your second expected result)
Testing Against Your Examples
Let's verify this regex against your sample inputs:
- Input:
паспорт 5715 424141→ Captured groups:5715(group 1),424141(group 2) - Input:
паспорт 5715-424141→ Captured groups:5715(group 1),424141(group 2) - Input:
паспорт 5715 - 424141→ Captured groups:5715(group 1),424141(group 2)
Why Your Original Regex Failed
Your initial regex ^(\d{4})\ (\d{6})$ had two main issues:
- It uses
^and$which force a full-line match, so it won't work if the text includes the "паспорт" keyword before the digits. - It only accounts for a single space as a separator, completely ignoring hyphens or hyphens with surrounding spaces.
内容的提问来源于stack exchange,提问作者Alex Nikitin




