You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

SQL Server递归CTE查询超时问题排查与优化求助

问题:SQL Server 2012递归CTE超时排查与查询优化

我需要在SQL Server 2012版本中构建查询,每日返回CATEGORY_3(别名DM)的结果。业务表NOTES仅在列变更时新增行,无每日数据,部分列值为空。使用日期表生成结果行数,但递归CTE执行10分钟超时,预期仅4000行。现有SQL代码如下:

WITH DAILY_DRIVER AS (
    SELECT
        DD1.DRIVER,
        DD1.Date,
        DN.CATEGORY_3 AS DM,
        ROW_NUMBER() OVER (PARTITION BY DD1.DRIVER ORDER BY DD1.Date) AS RowNum
    FROM (
        SELECT DISTINCT(SDN.DRIVER_ID) AS DRIVER, DD.Date
        FROM NOTES SDN
        JOIN DATES DD ON DD.Date BETWEEN '2022-01-01' AND '2022-11-28'
    ) DD1
    LEFT JOIN NOTES DN 
        ON DD1.DRIVER = DN.DRIVER_ID AND DD1.Date = CAST(DN.NOTES_DATE AS DATE)
),

INITIAL_DM AS (
    SELECT
        D.DRIVER,
        D.DM
    FROM (
        SELECT 
            SDN.DRIVER_ID AS DRIVER,
            SDN.CATEGORY_3 AS DM,
            MAX(SDN.NOTES_DATE) AS NDATE, 
            ROW_NUMBER() OVER (PARTITION BY SDN.DRIVER_ID ORDER BY SDN.NOTES_DATE DESC) AS RowNum
        FROM NOTES SDN
        WHERE SDN.CATEGORY_3 IS NOT NULL
          AND SDN.NOTES_DATE < '2021-01-01'
          
        GROUP BY 
        SDN.DRIVER_ID
        , SDN.CATEGORY_3
        , SDN.NOTES_DATE
    ) D
    JOIN DRIVER P ON D.DRIVER = P.DRIVER_ID
    WHERE RowNum = 1 AND (P.TERMINATION_DATE IS NULL OR P.TERMINATION_DATE > '2021-12-31')
),

SOMETHING (DRIVER, Date, DM, RowNum) AS (
    SELECT
        DD.DRIVER,
        DD.Date,
        ISNULL(DD.DM, SD.DM) AS DM,
        DD.RowNum
    FROM DAILY_DRIVER DD
    RIGHT JOIN INITIAL_DM SD ON DD.DRIVER = SD.DRIVER
    WHERE DD.RowNum = 1

    UNION ALL

    SELECT
        DD.DRIVER,
        DD.Date,
        ISNULL(DD.DM, S.DM) AS DM,
        DD.RowNum
    FROM DAILY_DRIVER AS DD
    INNER JOIN SOMETHING AS S 
        ON DD.DRIVER = S.DRIVER AND DD.RowNum = S.RowNum + 1
    WHERE DD.RowNum <= DATEDIFF(d,'2022-01-01','2021-11-28') +1
)

SELECT DRIVER, Date, DM
FROM SOMETHING
ORDER BY DRIVER, Date;

NOTES表样例数据

DRIVER_IDNOTES_DATENOTES_IDCATEGORY_1CATEGORY_2CATEGORY_3CATEGORY_4
ADAVIC2/8/20226838nullnullFORDONTRAINEE
ADAVIC3/15/20226920nullnullFORDONREGION03
ADAVIC4/11/20226985nullnullnullnull
ADAVIC4/2/20227009nullnullANDBRIREGION03
ADAVIC4/26/20227050nullnullnullnull
ALLZAN5/3/20227059nullnullHILJASTRAINEE
ADAVIC5/2/20227078nullnullPARJAMREGION03
ADAVIC5/12/20227099TERMREVIEWnullnull
ALLZAN6/9/20227187nullnullnullOTR
AMUJOH7/6/20227219nullnullPARJAMREGION03
AMUJOH7/13/20227232nullnullnullnull
ALLZAN8/15/20227308nullnullnullnull
ALLZAN11/9/20227500TERMREVIEWnullnull

预期查询结果

DRIVERDATEDM
ADAVIC4/1/2022FORDON
ADAVIC4/2/2022ANDBRI
ADAVIC4/3/2022ANDBRI
ADAVIC4/4/2022ANDBRI
Continued
ADAVIC5/1/2022ANDBRI
ADAVIC5/2/2022PARJAM
ADAVIC5/3/2022PARJAM

超时原因排查

  • 递归CTE逐行迭代低效:递归CTE是逐行循环关联DAILY_DRIVER表,4000行结果需要执行4000次关联查询,开销极大。
  • DAILY_DRIVER数据冗余:该CTE是司机与日期的笛卡尔积,再左关联NOTES表,无索引支撑时每次递归都要全表扫描。
  • 终止条件逻辑错误DATEDIFF(d,'2022-01-01','2021-11-28')返回负数,导致DD.RowNum <= 负数+1的条件无效,无法正确终止递归,甚至可能引发无限循环。

优化方案(非递归实现)

这类"填充每日快照"的场景更适合用窗口函数+日期表关联的方式,避免逐行递归:

-- 1. 提取有效的DM变更记录(过滤CATEGORY_3为空的无效行)
WITH DriverDMChanges AS (
    SELECT 
        DRIVER_ID,
        CAST(NOTES_DATE AS DATE) AS ChangeDate,
        CATEGORY_3 AS DM,
        -- 获取每条变更的下一次变更日期,确定当前DM的生效区间
        LEAD(CAST(NOTES_DATE AS DATE), 1, '2022-11-29') OVER (PARTITION BY DRIVER_ID ORDER BY NOTES_DATE) AS NextChangeDate
    FROM NOTES
    WHERE CATEGORY_3 IS NOT NULL
),
-- 2. 获取司机2022年初始DM(2022-01-01前最后一次有效变更)
InitialDM AS (
    SELECT 
        DRIVER_ID,
        CATEGORY_3 AS DM
    FROM (
        SELECT 
            DRIVER_ID,
            CATEGORY_3,
            ROW_NUMBER() OVER (PARTITION BY DRIVER_ID ORDER BY NOTES_DATE DESC) AS rn
        FROM NOTES
        WHERE CATEGORY_3 IS NOT NULL
          AND NOTES_DATE < '2022-01-01'
    ) t
    WHERE rn = 1
),
-- 3. 生成所有司机的日期范围
DriverDates AS (
    SELECT 
        drivers.DRIVER_ID AS DRIVER,
        dd.Date
    FROM DATES dd
    CROSS JOIN (
        SELECT DISTINCT DRIVER_ID FROM NOTES
        UNION
        SELECT DRIVER_ID FROM InitialDM
    ) drivers
    WHERE dd.Date BETWEEN '2022-01-01' AND '2022-11-28'
)
-- 4. 填充每日DM值:优先取变更记录,无变更则用初始DM,空值向前填充
SELECT 
    DRIVER,
    Date,
    LAST_VALUE(COALESCE(dc.DM, i.DM)) OVER (
        PARTITION BY DRIVER 
        ORDER BY Date 
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS DM
FROM DriverDates dd
LEFT JOIN DriverDMChanges dc 
    ON dd.DRIVER = dc.DRIVER_ID 
    AND dd.Date >= dc.ChangeDate 
    AND dd.Date < dc.NextChangeDate
LEFT JOIN InitialDM i 
    ON dd.DRIVER = i.DRIVER_ID
ORDER BY DRIVER, Date;

额外优化建议

  • 添加复合索引:在NOTES表上创建索引,加速变更记录的查询:
    CREATE NONCLUSTERED INDEX IX_NOTES_DRIVER_DATE_CAT3 ON NOTES (DRIVER_ID, NOTES_DATE) INCLUDE (CATEGORY_3);
    
  • 过滤无效数据:提前排除NOTES表中CATEGORY_3为空的行,减少处理数据量。
  • 验证日期表:确保DATES表包含目标日期范围内的所有日期,避免结果遗漏。

内容的提问来源于stack exchange,提问作者Matt Nevland

火山引擎 最新活动