本文将介绍使用 GIS 时空分析功能所涉及的基本概念、SQL 语句,以及具体使用方法。
使用之前请确认已经开启服务。
Geometry('Point', <SRID>),或Geometry('Point')(缺省SRID为 0)。Geometry('LineString', <SRID>),或Geometry('LineString')(缺省SRID为 0)。Geometry('Polygon', <SRID>),或Geometry('Polygon')(缺省SRID为 0)。ByteHouse 提供了常用的 GIS 函数,可参考下文”SQL参考-函数“章节。
如果想创建一个 GIS 相关的表,需要在建表的时候声明对应的 geometry 列。
常见的声明数据类型包括:
Geometry('Point', 4326);Geometry('LineString', 4326),其中 4326 为 SRID 值,不填写的话 Geometry('Point')默认值为 0。如果想利用空间索引,需要在 order by 里面指定 geometry 列为主键索引。
CREATE TABLE IF NOT EXISTS gis_table ON CLUSTER XXX ( `id` UInt64, `point` Geometry('Point', 4326) ) ENGINE = HaMergeTree ORDER BY point SETTINGS index_granularity = 8192;
您也可以在可视化建表中使用本功能。
Geometry。Geometry类型,以加快查询速度。声明点类型的方式为:Geometry('Point')或者Geometry('Point', SRID),其中 SRID 是一个 UInt16 的整型代表了该几何列所对应的坐标系统。
Geometry('Point') 属于 DataTypeGeometry 类型,DataTypeGeometry内部存储了一个DataTypeTuple类型用来表示Point数据,一个Point底层存储为一个Tuple,即(Float64, Float64)。
当插入数据的时候,我们可以通过 insert into select 的方式把其他 table 中的两列数据通过(x, y)的形式插入 Point 几何列中,也可以通过字符串 Tuple 的形式插入到 point 中。
CREATE TABLE IF NOT EXISTS test_gis_tmp ( `id` UInt64, `lon` Float64, `lat` Float64 ) ENGINE = MergeTree ORDER BY id SETTINGS index_granularity = 8192; insert into test_gis_tmp values (1, 1, 1), (2, 2, 2), (3, 3.1, 3.1); CREATE TABLE IF NOT EXISTS test_gis_point ( `id` UInt64, `p` Geometry('Point', 4326) ) ENGINE = MergeTree ORDER BY id SETTINGS index_granularity = 8192; insert into test_gis_point select id, (lon, lat) from test_gis_tmp; insert into test_gis_point values (4, '(4, 4)'), (5, '(5.5, 5.5)'); select * from test_gis_point order by id;
预期结果:
1 (1,1) 2 (2,2) 3 (3.1,3.1) 4 (4,4) 5 (5.5,5.5)
LineString 类型与 Point类型用法类似,区别在于LineString中存储的是坐标数组。
-- LineString CREATE TABLE IF NOT EXISTS test_gis_linestring_type ( `p` Geometry('LineString', 4326) ) ENGINE = MergeTree ORDER BY p SETTINGS index_granularity = 8192; insert into test_gis_linestring_type values ([(0,0), (0, 0)]), ('[(2,2), (2,2)]'), (ST_GeomFromText('LINESTRING (3 3, 3 3)')); select ST_AsText(p) from test_gis_linestring_type order by p; DROP TABLE IF EXISTS test_gis_linestring_type;
与前面类似,Polygon中存储的是多个坐标数组。
CREATE TABLE IF NOT EXISTS test_gis_polygon_type ( `p` Geometry('Polygon', 4326) ) ENGINE = MergeTree ORDER BY p SETTINGS index_granularity = 8192; insert into test_gis_polygon_type values ([[(0,0), (1,0), (1,1), (0,1), (0,0)]]), ('[[(0,0), (2,0), (2,2), (0,2), (0,0)]]'), (ST_GeomFromText('POLYGON ((0 0, 3 0, 3 3, 0 3, 0 0))'));
ST_GeomFromText(WKT_Geometry)或者 ST_GeomFromText(WKT_Geometry, SRID)函数用来将 OGC 标准定义的几何文本表示字符串转化为上述的几何数据。
ST_AsText(GeometryType)与 ST_GeomFromText 相反,用来将 ck 中的几何数据转化为 WKT 表示的字符串形式。
CREATE TABLE IF NOT EXISTS test_gis_point ( `id` UInt64, `p` Geometry('Point', 4326) ) ENGINE = MergeTree ORDER BY id SETTINGS index_granularity = 8192; insert into test_gis_point values(1, ST_GeomFromText('POINT (0 0)')); insert into test_gis_point values(2, ST_GeomFromText('POINT (1 1)')); insert into test_gis_point values(3, ST_GeomFromText('POINT (2 2)')); insert into test_gis_point values(4, ST_GeomFromText('POINT (3.1 3.1)')); insert into test_gis_point values(5, ST_GeomFromText('POINT (4 4)', 4326)); insert into test_gis_point values(6, ST_GeomFromText('POINT (5.1 5.1)', 4326)); select ST_AsText(p) from test_gis_point order by id;
预期结果:
POINT (0 0) POINT (1 1) POINT (2 2) POINT (3.1 3.1) POINT (4 4) POINT (5.1 5.1)
ST_Distance(p1, p2) 函数用来计算笛卡尔空间中的两个几何之间的距离,返回为 Float64 类型。
其中,p1 和 p2 可以是常量或非常量类型。
CREATE TABLE IF NOT EXISTS test_gis_point ( `id` UInt64, `p` Geometry('Point', 4326) ) ENGINE = MergeTree ORDER BY id SETTINGS index_granularity = 8192; insert into test_gis_point values(1, ST_GeomFromText('POINT (0 0)')); insert into test_gis_point values(2, ST_GeomFromText('POINT (1 1)')); insert into test_gis_point values(3, ST_GeomFromText('POINT (2 2)')); insert into test_gis_point values(4, ST_GeomFromText('POINT (3 3)')); insert into test_gis_point values(5, ST_GeomFromText('POINT (4 4)')); select ST_Distance(p, ST_GeomFromText('POINT (1 1)')) from test_gis_point order by id; select '--'; select ST_Distance(ST_GeomFromText('POINT (1 1)'), p) from test_gis_point order by id;
预期结果:
1.4142135623730951 0 1.4142135623730951 2.8284271247461903 4.242640687119285 -- 1.4142135623730951 0 1.4142135623730951 2.8284271247461903 4.242640687119285
ST_Within(p1, p2)代表了几何 p1 是否被 p2 所包含,返回值为布尔类型,代表了是否被包含。目前函数支持 Point/LineString/Polygon 等类型。
如下图所示,分别代表了不同几何类型之间的被包含关系。
CREATE TABLE IF NOT EXISTS test_gis_point ( `id` UInt64, `p` Geometry('Point', 4326) ) ENGINE = MergeTree ORDER BY id SETTINGS index_granularity = 8192; insert into test_gis_point values(1, ST_GeomFromText('POINT (1 1)')); insert into test_gis_point values(2, ST_GeomFromText('POINT (2 2)')); insert into test_gis_point values(3, ST_GeomFromText('POINT (3 3)')); insert into test_gis_point values(4, ST_GeomFromText('POINT (4 4)')); insert into test_gis_point values(5, ST_GeomFromText('POINT (5 5)')); optimize table test_gis_point final; select id from test_gis_point where ST_Within(p, ST_GeomFromText('POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))')); select '--'; select id from test_gis_point where ST_Within(p, ST_GeomFromText('POLYGON ((0 0, 3 0, 3 3, 0 3, 0 0))')); select '--'; select id from test_gis_point where ST_Within(p, ST_GeomFromText('POLYGON ((0 0, 5 0, 5 5, 0 5, 0 0))'));
预期结果:
1 -- 1 2 3 -- 1 2 3 4 5
ST_MultiAddressFilter 函数是用于过滤点 point 的函数,用来给定一个中心点和多个半径,过滤出位于该多个圆中的几何数据,其中 WKT_Geometry 代表了 WKT 表示的一个矩形范围,用来利用 r tree 索引做主键过滤。distance 代表了圆的半径,(longitude, latitude)代表了不同的圆心。
ST_MultiAddressFilter 函数返回为布尔类型。
ST_MultiAddressFilter 支持的参数如下:
ST_MultiAddressFilter(point, WKT_Geometry, distance, longitude, latitude)
使用示例:
CREATE TABLE test_gis_point ( `id` UInt64, `p` Geometry('Point', 4326) ) ENGINE = MergeTree ORDER BY id SETTINGS index_granularity = 8192; insert into test_gis_point values(1, ST_GeomFromText('POINT (1 1)')); insert into test_gis_point values(2, ST_GeomFromText('POINT (2 1)')); insert into test_gis_point values(3, ST_GeomFromText('POINT (3 1)')); insert into test_gis_point values(4, ST_GeomFromText('POINT (4 1)')); insert into test_gis_point values(5, ST_GeomFromText('POINT (5 1)')); insert into test_gis_point values(6, ST_GeomFromText('POINT (6 1)')); insert into test_gis_point values(7, ST_GeomFromText('POINT (7 1)')); insert into test_gis_point values(8, ST_GeomFromText('POINT (8 1)')); insert into test_gis_point values(9, ST_GeomFromText('POINT (9 1)')); insert into test_gis_point values(10, ST_GeomFromText('POINT (10 1)')); optimize table test_gis_point final; SELECT p from test_gis_point WHERE ST_MultiAddressFilter(p, ST_GeomFromText('POLYGON ((0 0, 20 0, 20 20, 0 20, 0 0))'), 1., 5., 1.); select '--'; SELECT p from test_gis_point WHERE ST_MultiAddressFilter(p, ST_GeomFromText('POLYGON ((0 0, 20 0, 20 20, 0 20, 0 0))'), 120000., 5., 1.);
预期:
(5,1) -- (4,1) (5,1) (6,1)
ST_X/ST_Y 函数主要针对 Point 类型,即返回 Point 的 x 坐标值和 y 坐标值。
CREATE TABLE IF NOT EXISTS test_gis_point ( `id` UInt64, `p` Geometry('Point', 4326) ) ENGINE = MergeTree ORDER BY id SETTINGS index_granularity = 8192; insert into test_gis_point values(1, ST_GeomFromText('POINT (1 2)')); insert into test_gis_point values(2, ST_GeomFromText('POINT (2 3)')); insert into test_gis_point values(3, ST_GeomFromText('POINT (3 4)')); insert into test_gis_point values(4, ST_GeomFromText('POINT (4 5)')); insert into test_gis_point values(5, ST_GeomFromText('POINT (5 6)')); optimize table test_gis_point final; select ST_X(p), ST_Y(p) from test_gis_point;
预期:
1 2 2 3 3 4 4 5 5 6
GIS 空间查询时,建议将 GIS 字段设置为 INDEX 索引。当我们按照几何列进行排序(即 order by point),ByteHouse 会为该表构建 R-tree 索引,用来加速空间查询。
R-tree 索引是 mark 级别的,即每一个 mark 中的所有几何数据的最小 bouding box 会构成 R-tree 叶子结点中的一个条目。
当我们为空间几何函数配置了 R-tree 索引时,查询会先通过传入的几何参数通过 R 树索引进行主键过滤,然后过滤出的数据才会交给空间函数真正执行。
例如下面例子中,table 按照几何列p进行排序,因此该列上会构建 R-tree 索引,同时我们也为 ST_Within 配置了 R-tree 索引。因此,当我们进行 select id from test_gis_point where ST_Within(p, ST_GeomFromText('POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))'))查询时,在主键索引过滤过程中,ST_Within 会提取出 p 列和 ST_GeomFromText('POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))')常量,首先通过 R-tree 索引过滤 p 列中的数据是否与'POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))'相交,过滤后的数据才会交给 ST_Within 做真正的计算。
CREATE TABLE IF NOT EXISTS test_gis_point ( `id` UInt64, `p` Geometry('Point', 4326) ) ENGINE = MergeTree ORDER BY p SETTINGS index_granularity = 1; insert into test_gis_point values(1, ST_GeomFromText('POINT (1 1)')); insert into test_gis_point values(2, ST_GeomFromText('POINT (2 2)')); insert into test_gis_point values(3, ST_GeomFromText('POINT (3 3)')); insert into test_gis_point values(4, ST_GeomFromText('POINT (4 4)')); insert into test_gis_point values(5, ST_GeomFromText('POINT (5 5)')); optimize table test_gis_point final; select id from test_gis_point where ST_Within(p, ST_GeomFromText('POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))')); select '--'; select id from test_gis_point where ST_Within(p, ST_GeomFromText('POLYGON ((0 0, 5 0, 5 5, 0 5, 0 0))'));
预期结果:
1 2 -- 1 2 3 4 5