BigQuery新增列:按ID和月份计算交易滚动求和
按用户和月份计算交易滚动累计和
看起来你需要给每个用户按月份计算交易金额的滚动累计和,这在SQL里用窗口函数就能轻松搞定。先明确需求:我们要按id分组,再按time对应的月份划分,然后在每个分组内按时间顺序累加transaction的值。
原始数据
| id | transaction | time |
|---|---|---|
| 1 | 10 | 01.01.2018 |
| 1 | 20 | 10.01.2018 |
| 2 | 20 | 05.01.2018 |
| 2 | 30 | 15.01.2018 |
| 2 | 5 | 03.02.2018 |
预期结果
| id | transaction | time | monthly_running_total |
|---|---|---|---|
| 1 | 10 | 01.01.2018 | 10 |
| 1 | 20 | 10.01.2018 | 30 |
| 2 | 20 | 05.01.2018 | 20 |
| 2 | 30 | 15.01.2018 | 50 |
| 2 | 5 | 03.02.2018 | 5 |
解决方案(分数据库)
核心思路是使用SUM()窗口函数,通过PARTITION BY指定分组维度(用户id+月份),ORDER BY确保组内按时间排序,从而计算滚动累计和。
1. PostgreSQL
如果你的time字段是字符串类型,需要先转换为日期格式:
SELECT id, transaction, time, SUM(transaction) OVER ( PARTITION BY id, DATE_TRUNC('month', TO_DATE(time, 'DD.MM.YYYY')) ORDER BY TO_DATE(time, 'DD.MM.YYYY') ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS monthly_running_total FROM your_table_name ORDER BY id, TO_DATE(time, 'DD.MM.YYYY');
如果time已经是DATE类型,直接用DATE_TRUNC('month', time)即可。
2. MySQL 8.0+
MySQL 8.0及以上版本支持窗口函数,同样需要先处理日期转换:
SELECT id, transaction, time, SUM(transaction) OVER ( PARTITION BY id, DATE_FORMAT(STR_TO_DATE(time, '%d.%m.%Y'), '%Y-%m') ORDER BY STR_TO_DATE(time, '%d.%m.%Y') ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS monthly_running_total FROM your_table_name ORDER BY id, STR_TO_DATE(time, '%d.%m.%Y');
3. SQL Server
SQL Server中可以用DATEFROMPARTS生成月份的起始日期来分组:
SELECT id, transaction, time, SUM(transaction) OVER ( PARTITION BY id, DATEFROMPARTS(YEAR(CAST(time AS DATE)), MONTH(CAST(time AS DATE)), 1) ORDER BY CAST(time AS DATE) ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS monthly_running_total FROM your_table_name ORDER BY id, CAST(time AS DATE);
关键说明
PARTITION BY id, [month]:将数据按用户和月份分割成独立的计算组,每个组内的累计和互不干扰。ORDER BY [date]:确保每个组内的记录按时间先后排序,保证滚动累计的顺序正确(如果直接用字符串排序会出错,必须转换为日期类型)。ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW:明确指定累计范围是从组内第一条记录到当前记录,部分数据库默认就是这个范围,但显式写出会让逻辑更清晰。
内容的提问来源于stack exchange,提问作者fsociety




