最近更新时间:2024.02.05 16:08:27
首次发布时间:2024.02.05 16:08:27
物化视图的本质就是类似一种触发器,当源表有数据写入,会触发视图执行定义的 SQL,写入另外一张表。
目前在 ByteHouse 根据物化视图的用途分为如下使用场景:
下面以一个行为分析系统的事件表来说明上述视图的使用方法。
--创建数据库 create database mv; --数据源表 CREATE TABLE mv.events( app_id UInt32, server_time UInt64, event_name String, uid UInt64, cost UInt64, duration UInt64, event_date Date ) ENGINE = CnchMergeTree PARTITION BY toDate(event_date) ORDER BY (app_id, uid, event_name);
--users维表 CREATE TABLE mv.users ( uid UInt64, params String ) ENGINE = CnchMergeTree ORDER BY uid;
Aggregate聚合视图是物化视图最为常用的一种场景,基于特定的聚合查询对源数据抽取存为物化视图,由于聚合查询已经聚合为中间数据状态,查询视图会减少聚合计算,提高查询性能,后续的查询能命中视图,引擎对原始查询进行改写,直接查询聚合视图表。下面看具体的场景
--视图目标表 CREATE TABLE mv.events_aggregation ( app_id UInt32, event_name String, event_date Date, sum_cost AggregateFunction(sum, UInt64), max_duration AggregateFunction(max, UInt64) ) ENGINE = CnchAggregatingMergeTree() PARTITION BY toDate(event_date) ORDER BY (app_id, event_name, event_date); --视图定义 CREATE MATERIALIZED VIEW mv.events_aggregate_view to mv.events_aggregation (app_id UInt32, event_name String, event_date Date, sum_cost AggregateFunction(sum, UInt64), max_duration AggregateFunction(max, UInt64)) AS SELECT app_id, event_name, event_date, sumState(cost) AS sum_cost, maxState(duration) AS max_duration FROM mv.events GROUP BY app_id, event_name, event_date;
---明细表 insert into table mv.events(app_id, server_time, event_name, uid, cost, duration, event_date) values (1, 1642149961, 'show', 121245, 3454, 64, '2022-06-14'); insert into table mv.events(app_id, server_time, event_name, uid, cost, duration, event_date) values (2, 1642149961 , 'send', 2345, 476, 64, '2022-06-14'); insert into table mv.events(app_id, server_time, event_name, uid, cost, duration, event_date) values (3, 1642150683, 'show', 544545, 87, 5434, '2022-06-14'); insert into table mv.events(app_id, server_time, event_name, uid, cost, duration, event_date) values (3, 1642150683, 'show', 544545, 930, 232, '2022-06-14'); insert into table mv.events(app_id, server_time, event_name, uid, cost, duration, event_date) values (4, 1642150683, 'slide', 234545, 123, 98, '2022-06-14'); insert into table mv.events(app_id, server_time, event_name, uid, cost, duration, event_date) values (5, 1642150683, 'click', 131312, 2644, 26, '2022-06-14'); ---维表 insert into table mv.users(uid, params) values (544545, 'male 35 from beijing'); insert into table mv.users(uid, params) values (121245, 'female 20 from nanjing');
物化视图查询有两种方式
SELECT app_id, event_name, event_date, sumMerge(sum_cost) AS sum_cost FROM mv.events_aggregation WHERE (toString(app_id) = '3') AND (event_name = 'show') AND (toDate(event_date) = '2022-06-14') GROUP BY app_id, event_name, event_date
set enable_optimizer = 1; set enable_materialized_view_rewrite = 1; SELECT app_id, event_name, event_date, sum(cost) AS sum_cost FROM mv.events WHERE (toString(app_id) = '3') AND (event_name = 'show') AND (toDate(event_date) = '2022-06-14') GROUP BY app_id, event_name, event_date settings enable_optimizer = 1,enable_materialized_view_rewrite = 1;
可以通过explain SQL的方式获取查询计划,如果计划中存在note: Materialized Views is applied for 1 times,说明命中了物化视图,可以对比一下如下命中视图和关闭视图改写执行计划的差别。
命中视图:
关闭视图改写:
在业务频繁迭代的场景,经常需要使用非主键过滤条件进行查询,但是主表的主键顺序又不能修改,基于这种需求,可以定义物化视图来修改主键顺序,根据业务需求裁切部分列或者根据某些条件过滤数据,来产出视图。
--视图目标表 CREATE TABLE mv.events_normal ( app_id UInt32, event_name String, event_date Date, uid UInt64, cost UInt64 ) ENGINE = CnchMergeTree() PARTITION BY toDate(event_date) ORDER BY (uid, event_name); --视图定义 CREATE MATERIALIZED VIEW mv.events_normal_view to mv.events_normal (app_id UInt32, event_name String, event_date Date, uid UInt64, cost UInt64) AS SELECT app_id, event_name, event_date, uid, cost FROM mv.events;
--- 视图刷新 refresh materialized view mv.events_normal_view partition '2022-06-14'
set enable_optimizer = 1; set enable_materialized_view_rewrite = 1; SELECT uid, sum(cost) FROM mv.events WHERE uid = 544545 GROUP BY uid
优化器会评估读取代价,选择读取代价最小的视图进行匹配。
实时消费以Kafka的消费为例,视图的SQL定义需要从consumer获取数据,可以全部获取作为明细表,可以进行聚合,过滤,投影等操作,可以是并联视图,或者串联视图。
--实时消费consumer表定义 CREATE TABLE mv.events_consumer ( app_id UInt32, server_time UInt64, event_name String, uid UInt64, cost UInt64, duration UInt64, event_date Date ) ENGINE = CnchKafka() SETTINGS kafka_cluster = 'bmq_data', kafka_topic_list = 'ch_qa_cnch_staging_yg', kafka_group_name = 'events_consumer_group', kafka_format = 'JSONEachRow', kafka_row_delimiter = '\n', kafka_num_consumers = 5, kafka_max_block_size = 65536; ---明细表视图定义 CREATE MATERIALIZED VIEW mv.events_real_all_view to mv.events ( app_id UInt32, server_time UInt64, event_name String, uid UInt64, cost UInt64, duration UInt64, event_date Date ) AS SELECT * FROM mv.events_consumer; ---聚合表视图定义 CREATE MATERIALIZED VIEW mv.events_real_aggregate_view to mv.events_aggregation (app_id UInt32, event_name String, event_date Date, sum_cost AggregateFunction(sum, UInt64), max_duration AggregateFunction(max, UInt64)) AS SELECT app_id, event_name, event_date, sumState(cost) AS sum_cost, maxState(duration) AS max_duration FROM mv.events_consumer GROUP BY app_id, event_name, event_date; --normal表实时消费表 CREATE MATERIALIZED VIEW mv.events_real_normal_view to mv.events_normal (app_id UInt32, event_name String, event_date Date, sum_cost AggregateFunction(sum, UInt64), max_duration AggregateFunction(max, UInt64)) AS SELECT app_id, event_name, event_date, uid, cost FROM mv.events_consumer where uid = 5434;
对于一些ETL的场景,单纯的是为对数据进行加工产生物化视图,单表的跟上面的视图定义类似,下面介绍多表的视图定义,目前bytehouse仅支持简单的inner join,subquery场景,后续会支持更加复杂的多表SQL场景。
--Join视图目标表 CREATE TABLE mv.events_join ( app_id UInt32, uid UInt64, cost UInt64, event_date Date, params String ) ENGINE = CnchMergeTree() PARTITION BY toDate(event_date) ORDER BY (uid, app_id); --join视图定义 CREATE MATERIALIZED VIEW mv.events_join_view to mv.events_join (app_id UInt32, uid UInt64, cost UInt64, event_date Date, params String ) AS SELECT app_id, uid, cost, event_date, params FROM mv.events as v inner join mv.users as u on v.uid = u.uid; --subset视图目标表 CREATE TABLE mv.events_subset ( app_id UInt32, uid UInt64, cost UInt64, event_date Date ) ENGINE = CnchMergeTree() PARTITION BY toDate(event_date) ORDER BY (uid, app_id); --subset视图定义 CREATE MATERIALIZED VIEW mv.events_extract_subset_view to mv.events_subset (app_id UInt32, uid UInt64, cost UInt64, event_date Date ) AS SELECT app_id, uid, cost, event_date FROM mv.events where uid in (select uid from mv.users);
-- 刷新join视图 refresh materialized view mv.events_join_view partition '2022-06-14'; -- 刷新subquery视图 refresh materialized view mv.events_extract_subset_view partition '2022-06-14';