SQL逗号分隔列条目统计:电影演员参演影片数统计需求
嘿Tony,这个需求很常见,核心思路就是先把逗号分隔的cast字段拆成单独的演员行,再分组统计每个演员的出现次数。我给你整理了几种主流数据库的实现方案,你按需取用:
MySQL 实现方案
MySQL 8.0+ 支持递归CTE,用这个来拆分字符串很方便:
WITH RECURSIVE split_cast AS ( -- 初始行:取每个电影的ID和第一个演员 SELECT id, TRIM(SUBSTRING_INDEX(cast, ',', 1)) AS actor, TRIM(SUBSTRING(cast, LOCATE(',', cast) + 1)) AS remaining_cast FROM movies WHERE cast IS NOT NULL AND cast != '' UNION ALL -- 递归拆分剩余的演员 SELECT id, TRIM(SUBSTRING_INDEX(remaining_cast, ',', 1)) AS actor, TRIM(SUBSTRING(remaining_cast, LOCATE(',', remaining_cast) + 1)) AS remaining_cast FROM split_cast WHERE remaining_cast IS NOT NULL AND remaining_cast != '' ) -- 统计每个演员的参演数量 SELECT actor, COUNT(DISTINCT id) AS movie_count FROM split_cast GROUP BY actor ORDER BY movie_count DESC;
如果你的MySQL版本低于8.0,也可以用数字辅助表来拆分——先创建一个包含连续数字的表(比如numbers,数字从1到你能想到的最大演员数),然后用:
SELECT TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(m.cast, ',', n.num), ',', -1)) AS actor, COUNT(DISTINCT m.id) AS movie_count FROM movies m JOIN numbers n ON n.num <= LENGTH(m.cast) - LENGTH(REPLACE(m.cast, ',', '')) + 1 WHERE m.cast IS NOT NULL AND m.cast != '' GROUP BY actor ORDER BY movie_count DESC;
PostgreSQL 实现方案
PostgreSQL有现成的string_to_array和unnest函数,实现起来更简洁:
SELECT TRIM(actor) AS actor, COUNT(DISTINCT id) AS movie_count FROM movies, unnest(string_to_array(cast, ',')) AS actor WHERE cast IS NOT NULL AND cast != '' GROUP BY TRIM(actor) ORDER BY movie_count DESC;
这里string_to_array把cast转成数组,unnest把数组拆成多行,之后直接分组统计就行。
SQLite 实现方案
SQLite没有内置的字符串拆分函数,同样可以用递归CTE来处理:
WITH RECURSIVE split_cast(id, actor, remaining) AS ( SELECT id, TRIM(SUBSTR(cast, 1, INSTR(cast || ',', ',') - 1)), TRIM(SUBSTR(cast, INSTR(cast || ',', ',') + 1)) FROM movies WHERE cast IS NOT NULL AND cast != '' UNION ALL SELECT id, TRIM(SUBSTR(remaining, 1, INSTR(remaining || ',', ',') - 1)), TRIM(SUBSTR(remaining, INSTR(remaining || ',', ',') + 1)) FROM split_cast WHERE remaining != '' ) SELECT actor, COUNT(DISTINCT id) AS movie_count FROM split_cast GROUP BY actor ORDER BY movie_count DESC;
这里通过INSTR找到逗号的位置,逐步拆分剩余的字符串,直到拆分完为止。
额外提醒:
- 用
TRIM处理演员名前后的空格,避免因为空格导致同一个演员被统计成多个条目 - 用
COUNT(DISTINCT id)避免同一部电影里演员重复出现(如果cast字段有重复的话)
内容的提问来源于stack exchange,提问作者Tony Springs




