求助:SQL中提取多组<start>与<end>标记间文本的实现方法
Got it, let's tackle this problem where you need to pull out all content between multiple <start>/<end> pairs—even when there are stray <end> tags cluttering up the string. Your original code works perfectly for a single match, but we need a scalable approach that handles any number of valid tag pairs.
Here's a solution using a recursive CTE (Common Table Expression) that will do exactly what you need:
DECLARE @text AS VARCHAR(MAX) = 'this <end><end>is for first <start> one, two, three <end>.this is for second <start> four, five, six<end>.'; DECLARE @pretext AS NVARCHAR(100) = '<start>'; DECLARE @posttext AS NVARCHAR(100) = '<end>'; WITH RecursiveMatches AS ( -- Anchor member: Grab the first valid match and remaining text SELECT SUBSTRING(@text, CHARINDEX(@pretext, @text) + LEN(@pretext), CHARINDEX(@posttext, @text, CHARINDEX(@pretext, @text)) - (CHARINDEX(@pretext, @text) + LEN(@pretext))) AS ExtractedText, SUBSTRING(@text, CHARINDEX(@posttext, @text, CHARINDEX(@pretext, @text)) + LEN(@posttext), LEN(@text)) AS RemainingText WHERE CHARINDEX(@pretext, @text) > 0 AND CHARINDEX(@posttext, @text, CHARINDEX(@pretext, @text)) > 0 UNION ALL -- Recursive member: Keep finding matches in the leftover text SELECT SUBSTRING(rm.RemainingText, CHARINDEX(@pretext, rm.RemainingText) + LEN(@pretext), CHARINDEX(@posttext, rm.RemainingText, CHARINDEX(@pretext, rm.RemainingText)) - (CHARINDEX(@pretext, rm.RemainingText) + LEN(@pretext))) AS ExtractedText, SUBSTRING(rm.RemainingText, CHARINDEX(@posttext, rm.RemainingText, CHARINDEX(@pretext, rm.RemainingText)) + LEN(@posttext), LEN(rm.RemainingText)) AS RemainingText FROM RecursiveMatches rm WHERE CHARINDEX(@pretext, rm.RemainingText) > 0 AND CHARINDEX(@posttext, rm.RemainingText, CHARINDEX(@pretext, rm.RemainingText)) > 0 ) -- Return all cleaned-up extracted segments SELECT LTRIM(RTRIM(ExtractedText)) AS TextBetweenTags FROM RecursiveMatches;
How This Works:
- Anchor Member: Starts by locating the first
<start>tag, then finds the next<end>tag that comes after it (ignoring any stray<end>tags before the first<start>). It extracts the content between them and saves the rest of the string after the matched<end>tag. - Recursive Member: Repeats the matching process on the remaining text from the previous step, hunting down the next valid
<start>/<end>pair each time. - Final Select: Returns all extracted segments, with optional trimming to clean up extra whitespace (like the leading space in your first example).
Key Details:
- Handles any number of valid tag pairs—it will keep extracting until there are no more
<start>tags followed by a corresponding<end>tag. - Stray
<end>tags (with no preceding<start>) are completely ignored, which aligns with your requirement. - If a
<start>tag doesn't have an<end>tag after it, that partial segment won't be extracted (to avoid returning incomplete or invalid content).
Example Output:
For your test string, this query will return two rows:
one, two, three four, five, six
内容的提问来源于stack exchange,提问作者Nate Pet




