求助：SQL中提取多组<start>与<end>标记间文本的实现方法

阿华AIGC实验室

2026-4-28

Extract Multiple Text Segments Between and Tags in SQL Server

Got it, let's tackle this problem where you need to pull out all content between multiple <start>/<end> pairs—even when there are stray <end> tags cluttering up the string. Your original code works perfectly for a single match, but we need a scalable approach that handles any number of valid tag pairs.

Here's a solution using a recursive CTE (Common Table Expression) that will do exactly what you need:

DECLARE @text AS VARCHAR(MAX) = 'this <end><end>is for first <start> one, two, three <end>.this is for second <start> four, five, six<end>.';
DECLARE @pretext AS NVARCHAR(100) = '<start>';
DECLARE @posttext AS NVARCHAR(100) = '<end>';

WITH RecursiveMatches AS (
    -- Anchor member: Grab the first valid match and remaining text
    SELECT 
        SUBSTRING(@text, CHARINDEX(@pretext, @text) + LEN(@pretext), 
                  CHARINDEX(@posttext, @text, CHARINDEX(@pretext, @text)) - (CHARINDEX(@pretext, @text) + LEN(@pretext))) AS ExtractedText,
        SUBSTRING(@text, CHARINDEX(@posttext, @text, CHARINDEX(@pretext, @text)) + LEN(@posttext), LEN(@text)) AS RemainingText
    WHERE CHARINDEX(@pretext, @text) > 0 
      AND CHARINDEX(@posttext, @text, CHARINDEX(@pretext, @text)) > 0

    UNION ALL

    -- Recursive member: Keep finding matches in the leftover text
    SELECT 
        SUBSTRING(rm.RemainingText, CHARINDEX(@pretext, rm.RemainingText) + LEN(@pretext), 
                  CHARINDEX(@posttext, rm.RemainingText, CHARINDEX(@pretext, rm.RemainingText)) - (CHARINDEX(@pretext, rm.RemainingText) + LEN(@pretext))) AS ExtractedText,
        SUBSTRING(rm.RemainingText, CHARINDEX(@posttext, rm.RemainingText, CHARINDEX(@pretext, rm.RemainingText)) + LEN(@posttext), LEN(rm.RemainingText)) AS RemainingText
    FROM RecursiveMatches rm
    WHERE CHARINDEX(@pretext, rm.RemainingText) > 0 
      AND CHARINDEX(@posttext, rm.RemainingText, CHARINDEX(@pretext, rm.RemainingText)) > 0
)

-- Return all cleaned-up extracted segments
SELECT LTRIM(RTRIM(ExtractedText)) AS TextBetweenTags
FROM RecursiveMatches;

How This Works:

Anchor Member: Starts by locating the first <start> tag, then finds the next <end> tag that comes after it (ignoring any stray <end> tags before the first <start>). It extracts the content between them and saves the rest of the string after the matched <end> tag.
Recursive Member: Repeats the matching process on the remaining text from the previous step, hunting down the next valid <start>/<end> pair each time.
Final Select: Returns all extracted segments, with optional trimming to clean up extra whitespace (like the leading space in your first example).

Key Details:

Handles any number of valid tag pairs—it will keep extracting until there are no more <start> tags followed by a corresponding <end> tag.
Stray <end> tags (with no preceding <start>) are completely ignored, which aligns with your requirement.
If a <start> tag doesn't have an <end> tag after it, that partial segment won't be extracted (to avoid returning incomplete or invalid content).