ByteHouse 支持通过 EXPLAIN 语句为 SQL 语句生成对应的执行计划,帮助您了解 ByteHouse 如何执行查询,便于您针对性优化 SQL 语句。本文以 TPCH-Q1 中的 SQL 为例,演示了如何使用不同类型的 EXPLAIN 语句。
在 ByteHouse 中,优化器(Query Optimizer)通过规则优化(Rule-Based Optimization,RBO)、代价优化(Cost-Based Optimization,CBO)、原生分布式计划协同、及高阶能力(动态过滤下推/物化视图改写),实现全链路查询优化。开启优化器后,执行 EXPLAIN 语句将为 SQL 语句生成对应的执行计划,展示执行计划的详细信息,并生成更高效的执行计划,可帮助开发人员快速定位 SQL 性能瓶颈,可针对性优化索引设计、表达式复杂度或分布式执行策略,确保以最小资源开销快速返回结果。
ByteHouse 当前支持了以下类型的 EXPLAIN 语句:
类型 | 定义 |
|---|---|
EXPLAIN | 用于展示逻辑执行计划。 |
用于展示分布式执行计划。 | |
用于展示带有运行时信息的执行计划,可细分为以下类型:
| |
转储查询的抽象语法树(AST),支持所有类型的查询,不仅仅是 | |
返回经过语法优化后的查询。 | |
该语句用于展示 SQL Analyzer 阶段解析的一些信息,解析的信息主要包括,库名,表明,列名,函数名,settings 等信息。 |
EXPLAIN [DISTRIBUTED | ANALYZE | AST | SYNTAX | METADATA ] [setting = value, ...] [ SELECT ... | tableFunction(...) [COLUMNS (...)] [ORDER BY ...] [PARTITION BY ...] [PRIMARY KEY] [SAMPLE BY ...] [TTL ...] ] [FORMAT ...]
参数 | 是否必填 | 说明 |
|---|---|---|
DISTRIBUTED | ANALYZE | AST |
setting = value | 否 | 设置自定义查询执行参数。 |
SELECT | 二选一 | 与 tableFunction(...) 二选一。设置为需要分析的 SELECT 查询语句。 |
tableFunction(...) | 二选一 | 与 SELECT 二选一。设置为需要分析的表函数。 |
COLUMNS (...) | 否 | 定义表函数需要返回的列名和类型。 |
ORDER BY | 否 | 定义表函数的排序键。 |
PARTITION BY | 否 | 定义表函数的数据分区规则。 |
PRIMARY KEY | 否 | 设置表函数的索引主键。 |
SAMPLE BY | 否 | 声明表函数的采样表达式。 |
TTL | 否 | 配置表函数的数据过期策略。 |
FORMAT | 否 | 定义将结果转换成指定输出格式。 |
EXPLAIN json=0 SELECT l_returnflag, l_linestatus, sum(l_quantity) AS sum_qty, sum(l_extendedprice) AS sum_base_price, sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price, sum((l_extendedprice * (1 - l_discount)) * (1 + l_tax)) AS sum_charge, avg(l_quantity) AS avg_qty, avg(l_extendedprice) AS avg_price, avg(l_discount) AS avg_disc, count(*) AS count_order FROM lineitem WHERE l_shipdate <= (toDate('1998-12-01') - toIntervalDay('90')) GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag ASC, l_linestatus ASC
输出结果
该语句用于展示分布式执行计划。
EXPLAIN DISTRIBUTED <SQL>;
EXPLAIN DISTRIBUTED SELECT l_returnflag, l_linestatus, sum(l_quantity) AS sum_qty, sum(l_extendedprice) AS sum_base_price, sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price, sum((l_extendedprice * (1 - l_discount)) * (1 + l_tax)) AS sum_charge, avg(l_quantity) AS avg_qty, avg(l_extendedprice) AS avg_price, avg(l_discount) AS avg_disc, count(*) AS count_order FROM lineitem WHERE l_shipdate <= (toDate('1998-12-01') - toIntervalDay('90')) GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag ASC, l_linestatus ASC
输出结果
EXPLAIN ANALYZE 用于展示带有运行时信息的执行计划,也支持通过 EXPLAIN ANALYZE DISTRIBUTED 展示带有运行时信息的分布式执行计划。
这类语句支持的通用 setting 参数如下:
参数 | 是否必填 | 说明 |
|---|---|---|
json | 否 | 设置是否以 JSON 格式打印计划,默认值为 0,表示不以 JSON 格式输出,设置为 1 表示打印 JSON 格式的计划。示例: |
stats | 否 | 设置是否输出统计信息行数和 cost,默认值为 1,表示输出,设置为 0 表示不输出。示例: |
profile | 否 | 设置是否输出执行时 profiles,默认值为 1,表示输出,设置为 0 表示不输出。示例: |
verbose | 否 | 设置是否输出输出逻辑计划 node 详细内容,默认值为 1,表示输出,设置为 0 表示不输出。示例: |
indexes | 否 | 设置是否打印索引使用情况,设置为 1 表示打印,设置为 0 表示不打印。示例: |
segment_id | 否 | 设置 segment id,可在打印分布式计划时指定打印具体 segment 内容。 |
segment_profile | 否 | 设置是否在打印分布式计划打印 segment 耗时,设置为 1 表示打印,设置为 0 表示不打印。示例: |
该语句用于展示带有运行时信息的执行计划。具体字段说明请参见 Explain Analyze字段说明。
EXPLAIN ANALYZE <SQL>;
除了通用参数外,EXPLAIN ANALYZE 还支持设置以下 setting 参数:
processor_profile_events = 1。EXPLAIN ANALYZE SELECT l_returnflag, l_linestatus, sum(l_quantity) AS sum_qty, sum(l_extendedprice) AS sum_base_price, sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price, sum((l_extendedprice * (1 - l_discount)) * (1 + l_tax)) AS sum_charge, avg(l_quantity) AS avg_qty, avg(l_extendedprice) AS avg_price, avg(l_discount) AS avg_disc, count(*) AS count_order FROM lineitem WHERE l_shipdate <= (toDate('1998-12-01') - toIntervalDay('90')) GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag ASC, l_linestatus ASC
输出结果
该语句用于展示带有运行时信息的分布式执行计划,具体字段说明请参见 Explain Analyze字段说明。
EXPLAIN ANALYZE DISTRIBUTED <SQL>;
除了通用参数外,EXPLAIN ANALYZE DISTRIBUTED 还支持设置以下 setting 参数:
processor_profile_events = 1。profile_events_list='QueryAnalyzerTime',如果有多个参数值,可使用逗号(,)分隔。aggregate_profiles = 1,可打印每个 worker 聚合后的 profile,设置 aggregate_profiles = 0,系统将分别打印出每个 worker 的 profiles。EXPLAIN ANALYZE DISTRIBUTED SELECT l_returnflag, l_linestatus, sum(l_quantity) AS sum_qty, sum(l_extendedprice) AS sum_base_price, sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price, sum((l_extendedprice * (1 - l_discount)) * (1 + l_tax)) AS sum_charge, avg(l_quantity) AS avg_qty, avg(l_extendedprice) AS avg_price, avg(l_discount) AS avg_disc, count(*) AS count_order FROM lineitem WHERE l_shipdate <= (toDate('1998-12-01') - toIntervalDay('90')) GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag ASC, l_linestatus ASC
输出结果
转储查询的抽象语法树(AST),支持所有类型的查询,不仅仅是 SELECT。
EXPLAIN AST <SQL>;
EXPLAIN AST SELECT l_returnflag, l_linestatus, sum(l_quantity) AS sum_qty, sum(l_extendedprice) AS sum_base_price, sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price, sum((l_extendedprice * (1 - l_discount)) * (1 + l_tax)) AS sum_charge, avg(l_quantity) AS avg_qty, avg(l_extendedprice) AS avg_price, avg(l_discount) AS avg_disc, count(*) AS count_order FROM lineitem WHERE l_shipdate <= (toDate('1998-12-01') - toIntervalDay('90')) GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag ASC, l_linestatus ASC
输出结果
返回经过语法优化后的查询。
EXPLAIN SYNTAX <SQL>;
EXPLAIN SYNTAX SELECT l_returnflag, l_linestatus, sum(l_quantity) AS sum_qty, sum(l_extendedprice) AS sum_base_price, sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price, sum((l_extendedprice * (1 - l_discount)) * (1 + l_tax)) AS sum_charge, avg(l_quantity) AS avg_qty, avg(l_extendedprice) AS avg_price, avg(l_discount) AS avg_disc, count(*) AS count_order FROM lineitem WHERE l_shipdate <= (toDate('1998-12-01') - toIntervalDay('90')) GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag ASC, l_linestatus ASC
输出结果
该语句用于展示 SQL Analyzer 阶段解析的一些信息,解析的信息主要包括,库名,表明,列名,函数名,settings 等信息。
EXPLAIN METADATA <SQL> 形式下使用;EXPLAIN METADATA <SQL>;
EXPLAIN METADATA 支持设置以下 setting 参数:
format_json = 1 表示输出内容是 JSON 格式。lineage = 1。explain metadata select t1.a, t2.b from metadata t1 join metadata2 t2 on t1.a=t2.a settings enable_optimizer=1;
输出结果
其中的字段说明如下:
EXPLAIN METADATA format_json = 1, lineage = 1 WITH t3 AS ( SELECT count() AS a, sum(b) AS b FROM cache2 GROUP BY c ) SELECT a, b FROM ( ( SELECT t3.b AS a, t2.b AS b FROM cache AS t2 INNER JOIN t3 ON t2.a = t3.a ) UNION ALL ( SELECT * FROM t3 ) ) AS t1 WHERE t1.a = 1 SETTINGS enable_optimizer = 1 FORMAT TabSeparatedRaw
输出结果
{ "UsedTablesInfo" : [ { "Database" : "test", "Table" : "cache", "Columns" : [ "a", "b" ] }, { "Database" : "test", "Table" : "cache2", "Columns" : [ "b", "c" ] } ], "UsedFunctions" : [ "count", "equals", "sum" ], "UsedSettings" : { "enable_optimizer" : "1" }, "InsertInfo" : { }, "FunctionsInfo" : [ ], "LineageInfo" : { "TableSources" : [ { "Database" : "test", "Table" : "cache2", "Columns" : [ { "Id" : 3, "Name" : "b" }, { "Id" : 2, "Name" : "c" } ] }, { "Database" : "test", "Table" : "cache", "Columns" : [ { "Id" : 1, "Name" : "a" }, { "Id" : 0, "Name" : "b" } ] } ], "ExpressionSources" : [ { "Sources" : [ { "Database" : "test", "Table" : "cache2" } ], "Expression" : [ { "Id" : 4, "Name" : "expr#count()" } ] } ], "OutputLineageInfo" : [ { "Name" : "b", "SourceIds" : [ 3, 0 ] }, { "Name" : "a", "SourceIds" : [ 4, 3 ] } ], "InsertLinageInfo" : { } } }
其中字段说明如下: