如何构造正则表达式提取<Guid>标签内的数字？

阿华AIGC实验室

2026-5-20

Extract Numeric Part from Tag Content

Got it, let's tackle this. You already have a regex to grab the full content inside <Guid> tags, so now we just need to adjust things to target only the numeric digits (1234 in your example) directly. Here are a couple of simple, effective approaches:

Approach 1: One-step Regex (Direct Extraction)

You can modify your existing regex to focus specifically on the numeric portion within the tags. Use this pattern:

(?<=<Guid>)\D*(\d+)\D*(?=</Guid>)

Breakdown:

(?<=<Guid>): Positive lookbehind to position us right after the opening <Guid> tag
\D*: Matches any non-digit characters (like "Abc" in your example) that come before the numbers
(\d+): Capturing group that grabs one or more consecutive digits (this is the value you want to extract)
\D*: Matches any remaining non-digit characters after the numbers (if there were any)
(?=</Guid>): Positive lookahead to position us right before the closing </Guid> tag

When you run this against <Guid>Abc1234</Guid>, the first capturing group will return exactly 1234.

Approach 2: Two-step Extraction (If You Already Have the Tag Content)

If you're already extracting the full tag content (like "Abc1234") using your original regex, you can run a second regex against that string to pull out the numbers:

\d+

This simple pattern will match all consecutive digits in the string. For "Abc1234", it will return 1234.

Bonus: Handle Multiple Numeric Segments (If Needed)

If your tag content ever has numbers scattered (e.g., <Guid>123Abc456Xyz789</Guid>) and you want to combine all digits into a single string, you can match all instances of \d in the tag content and concatenate them together—most programming languages have built-in methods to do this easily.

内容的提问来源于stack exchange，提问作者Cristi Er