You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

SQL逗号分隔列条目统计:电影演员参演影片数统计需求

嘿Tony,这个需求很常见,核心思路就是先把逗号分隔的cast字段拆成单独的演员行,再分组统计每个演员的出现次数。我给你整理了几种主流数据库的实现方案,你按需取用:

MySQL 实现方案

MySQL 8.0+ 支持递归CTE,用这个来拆分字符串很方便:

WITH RECURSIVE split_cast AS (
    -- 初始行:取每个电影的ID和第一个演员
    SELECT 
        id,
        TRIM(SUBSTRING_INDEX(cast, ',', 1)) AS actor,
        TRIM(SUBSTRING(cast, LOCATE(',', cast) + 1)) AS remaining_cast
    FROM movies
    WHERE cast IS NOT NULL AND cast != ''
    UNION ALL
    -- 递归拆分剩余的演员
    SELECT 
        id,
        TRIM(SUBSTRING_INDEX(remaining_cast, ',', 1)) AS actor,
        TRIM(SUBSTRING(remaining_cast, LOCATE(',', remaining_cast) + 1)) AS remaining_cast
    FROM split_cast
    WHERE remaining_cast IS NOT NULL AND remaining_cast != ''
)
-- 统计每个演员的参演数量
SELECT actor, COUNT(DISTINCT id) AS movie_count
FROM split_cast
GROUP BY actor
ORDER BY movie_count DESC;

如果你的MySQL版本低于8.0,也可以用数字辅助表来拆分——先创建一个包含连续数字的表(比如numbers,数字从1到你能想到的最大演员数),然后用:

SELECT 
    TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(m.cast, ',', n.num), ',', -1)) AS actor,
    COUNT(DISTINCT m.id) AS movie_count
FROM movies m
JOIN numbers n ON n.num <= LENGTH(m.cast) - LENGTH(REPLACE(m.cast, ',', '')) + 1
WHERE m.cast IS NOT NULL AND m.cast != ''
GROUP BY actor
ORDER BY movie_count DESC;
PostgreSQL 实现方案

PostgreSQL有现成的string_to_arrayunnest函数,实现起来更简洁:

SELECT 
    TRIM(actor) AS actor,
    COUNT(DISTINCT id) AS movie_count
FROM movies,
     unnest(string_to_array(cast, ',')) AS actor
WHERE cast IS NOT NULL AND cast != ''
GROUP BY TRIM(actor)
ORDER BY movie_count DESC;

这里string_to_arraycast转成数组,unnest把数组拆成多行,之后直接分组统计就行。

SQLite 实现方案

SQLite没有内置的字符串拆分函数,同样可以用递归CTE来处理:

WITH RECURSIVE split_cast(id, actor, remaining) AS (
    SELECT 
        id,
        TRIM(SUBSTR(cast, 1, INSTR(cast || ',', ',') - 1)),
        TRIM(SUBSTR(cast, INSTR(cast || ',', ',') + 1))
    FROM movies
    WHERE cast IS NOT NULL AND cast != ''
    UNION ALL
    SELECT 
        id,
        TRIM(SUBSTR(remaining, 1, INSTR(remaining || ',', ',') - 1)),
        TRIM(SUBSTR(remaining, INSTR(remaining || ',', ',') + 1))
    FROM split_cast
    WHERE remaining != ''
)
SELECT actor, COUNT(DISTINCT id) AS movie_count
FROM split_cast
GROUP BY actor
ORDER BY movie_count DESC;

这里通过INSTR找到逗号的位置,逐步拆分剩余的字符串,直到拆分完为止。

额外提醒:

  • TRIM处理演员名前后的空格,避免因为空格导致同一个演员被统计成多个条目
  • COUNT(DISTINCT id)避免同一部电影里演员重复出现(如果cast字段有重复的话)

内容的提问来源于stack exchange,提问作者Tony Springs

火山引擎 最新活动