如何按日期范围进行分组?基于id、date、treatment字段的患者数据分组需求
按日期范围分组的实现方案
看起来你需要对患者的治疗数据按日期范围进行分组,同时确保所有患者ID都被覆盖——哪怕某个患者在特定日期范围内没有治疗记录。下面我会用SQL来给出具体的实现方案,这是处理这类数据分组最常用的方式。
一、先明确日期范围的定义
首先得确定你想要的日期范围类型,常见的有两种:
- 固定自定义范围:比如预设的季度、半年,或者特定时间段(比如
2023-01-01至2023-03-31) - 动态周期范围:比如按周、月、自然季度自动划分日期
二、固定自定义日期范围的分组实现
假设我们要把日期分成三个固定季度范围,同时确保所有患者ID都出现在结果里,可以这样做:
-- 第一步:定义所有要划分的日期范围 WITH date_ranges AS ( SELECT '2023Q1' AS range_name, '2023-01-01' AS start_date, '2023-03-31' AS end_date UNION ALL SELECT '2023Q2' AS range_name, '2023-04-01' AS start_date, '2023-06-30' AS end_date UNION ALL SELECT '2023Q3' AS range_name, '2023-07-01' AS start_date, '2023-09-30' AS end_date ), -- 第二步:提取所有唯一的患者ID,确保不遗漏任何患者 all_patients AS ( SELECT DISTINCT id AS patient_id FROM your_table_name ) -- 第三步:交叉连接生成「所有患者+所有日期范围」的组合,再左连接原数据统计治疗情况 SELECT ap.patient_id, dr.range_name, -- 统计各治疗类型的次数 COUNT(CASE WHEN t.treatment = 'Cold' THEN 1 END) AS cold_count, COUNT(CASE WHEN t.treatment = 'fever' THEN 1 END) AS fever_count, COUNT(CASE WHEN t.treatment = 'cholera' THEN 1 END) AS cholera_count, -- 也可以直接列出该患者在该范围接受的治疗类型 GROUP_CONCAT(DISTINCT t.treatment SEPARATOR ', ') AS treatments_received FROM all_patients ap CROSS JOIN date_ranges dr LEFT JOIN your_table_name t ON ap.patient_id = t.id AND t.date BETWEEN dr.start_date AND dr.end_date GROUP BY ap.patient_id, dr.range_name ORDER BY ap.patient_id, dr.range_name;
这个查询的输出会覆盖每一位患者,哪怕某个患者在某个日期范围没有治疗记录,对应的计数会显示为0,treatments_received会显示为NULL。
三、动态周期日期范围的分组实现
如果想要按动态周期(比如每月、每周)自动分组,不同数据库的语法略有差异,这里举两个常用例子:
MySQL按月分组(含所有患者)
WITH all_months AS ( -- 提取数据中存在的所有月份 SELECT DISTINCT DATE_FORMAT(date, '%Y-%m') AS month_range FROM your_table_name ), all_patients AS ( SELECT DISTINCT id AS patient_id FROM your_table_name ) SELECT ap.patient_id, am.month_range, COUNT(CASE WHEN t.treatment = 'Cold' THEN 1 END) AS cold_count, COUNT(CASE WHEN t.treatment = 'fever' THEN 1 END) AS fever_count, COUNT(CASE WHEN t.treatment = 'cholera' THEN 1 END) AS cholera_count FROM all_patients ap CROSS JOIN all_months am LEFT JOIN your_table_name t ON ap.patient_id = t.id AND DATE_FORMAT(t.date, '%Y-%m') = am.month_range GROUP BY ap.patient_id, am.month_range ORDER BY ap.patient_id, am.month_range;
PostgreSQL按周分组(含所有患者)
WITH all_weeks AS ( SELECT DISTINCT DATE_TRUNC('week', date)::DATE AS week_start FROM your_table_name ), all_patients AS ( SELECT DISTINCT id AS patient_id FROM your_table_name ) SELECT ap.patient_id, CONCAT('Week starting ', aw.week_start) AS week_range, COUNT(CASE WHEN t.treatment = 'Cold' THEN 1 END) AS cold_count, COUNT(CASE WHEN t.treatment = 'fever' THEN 1 END) AS fever_count, COUNT(CASE WHEN t.treatment = 'cholera' THEN 1 END) AS cholera_count FROM all_patients ap CROSS JOIN all_weeks aw LEFT JOIN your_table_name t ON ap.patient_id = t.id AND DATE_TRUNC('week', t.date)::DATE = aw.week_start GROUP BY ap.patient_id, aw.week_start ORDER BY ap.patient_id, aw.week_start;
四、输出格式示例
以固定季度范围的查询为例,输出会类似这样:
| patient_id | range_name | cold_count | fever_count | cholera_count | treatments_received |
|---|---|---|---|---|---|
| 1 | 2023Q1 | 2 | 1 | 0 | Cold, fever |
| 1 | 2023Q2 | 0 | 0 | 1 | cholera |
| 1 | 2023Q3 | 0 | 0 | 0 | NULL |
| 2 | 2023Q1 | 1 | 0 | 0 | Cold |
| ... | ... | ... | ... | ... | ... |
如果你的需求是按日期范围分组后直接列出患者的治疗记录(而非统计),只需调整查询逻辑:
WITH date_ranges AS ( SELECT '2023Q1' AS range_name, '2023-01-01' AS start_date, '2023-03-31' AS end_date UNION ALL SELECT '2023Q2' AS range_name, '2023-04-01' AS start_date, '2023-06-30' AS end_date ) SELECT dr.range_name, t.id AS patient_id, t.date, t.treatment FROM date_ranges dr LEFT JOIN your_table_name t ON t.date BETWEEN dr.start_date AND dr.end_date ORDER BY dr.range_name, t.id;
内容的提问来源于stack exchange,提问作者Cur123




