You need to enable JavaScript to run this app.
导航

位图计算(pg_roaringbitmap)

最近更新时间2024.02.23 15:45:09

首次发布时间2023.08.08 23:52:58

pg_roaringbitmap 插件是一款高效的位图存储和运算的插件。

实现原理

RoaringBitmap 算法主要解决传统 Bitmap 的空间占用固化的问题,其在降低 Bitmap 空间的同时,还提供高性能的 bitmap 运算。在最极端的场景下,传统的 bitmap 即使存储两个数字,也有可能占据大量的空间。例如,存储数字 0 和 数字 1000000,传统的 bitmap 需要提前申请 1000001 个 bit 位,大约 125KB 的空间;而 Roaringbitmap 在此种场景下,仅仅只需 8 Byte 即可。

经典 RoaringBitamp 算法

最经典的 RoaringBitamp 算法,将 32bit 的有符号 Integer 整数集 [-2147483648, 2147483647] 中的每个整数划分两部分:高 16Bit + 低 16Bit,高 16Bit 作为 一级索引进行存储检索,低16 Bit 作为二级数据存储于 Container 中,Container 有 两种类型:Array Container 和 Bitmap Container,如下图所示:
alt
上图 Roaring bitmap 中,存储高 16Bit 分别为 0x0000、0x0001、0x0002 的部分 4 字节整数值:

  • 高 16Bit 为 0x0000:使用 Array Container 有序数组存储,存储前 1000 个 62 的整数倍对应的数字。

  • 高 16Bit 为 0x0001:使用 Array Container 有序数组存储,存储 [216, 216+100) 区间内的 100 个整数。

  • 高 16Bit 为 0x0002:使用 Bitmap Container 位图存储,存储 [216, 3×216) 区间内 215 个偶数。

Array Container 和 Bitmap Container 的选择,取决于对应 Container 中存储的 16Bit 整数的个数,具体如下:

  • 当 Container 中存储的 16Bit 整数个数小于等于 4096 时,单个 Container 所占据的空间最大值为 4096 * 16Bit = 8KB,此时采用 array container 存储;

  • 当 Container 中存储的 16Bit 整数个数大于 4096 时,采用 Bitmap container 存储,单个 Bitmap container 所占据的空间固定为:216Bit = 8KB。

插件采用的 RoaringBitmap

相较于经典 RoaringBitmap 算法,pg_roaringbitmap 插件采用的算法增加了 run container,对某些具有步长属性的位图进行了适当的优化。如下图所示:
alt
上图 Roaring bitmap 中,存储高 16Bit 分别为 0x0000、0x0001、0x0002 的部分 4 字节整数值:

  • 高 16Bit 为 0x0000:使用 Array Container 有序数组存储,存储前 1000 个 62 的整数倍对应的数字

  • 高 16Bit 为 0x0001:使用 Run Container 压缩存储,存储 [216, 216+ 100), [216+101, 216+ 201), [216+300, 216+ 400) 三个区间总计 300 个整数,Container 占用空间仅 6Bit。

  • 高 16Bit 为 0x0002:使用 Bitmap Container 位图存储,存储 [2×216, 3×216) 区间内 215 个偶数。

安装插件

使用以下命令即可安装插件。

create extension roaringbitmap;

使用说明

数据类型

类型名称roaringbitmap

使用案例

select  '{}'::roaringbitmap;
select  '  {          }  '::roaringbitmap;
select  '{        1 }'::roaringbitmap;
select  '{-1,2,555555,-4}'::roaringbitmap;
select  '{ -1 ,  2  , 555555 ,  -4  }'::roaringbitmap;
select  '{ 1 ,  -2  , 555555 ,  -4  }'::roaringbitmap;
select  '{ 1 ,  -2  , 555555 ,  -4  ,2147483647,-2147483648}'::roaringbitmap;
select  roaringbitmap('{ 1 ,  -2  , 555555 ,  -4  }');
select  '{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400}'::roaringbitmap;

GUC

参数名称roaringbitmap.output_format

参数说明

字符类型,用于控制 roaringbitmap 类型的输出格式,取值范围为:array 和 bytea,默认值为 bytea。

  • array: 表示查询时,以数组格式输出 roaringbitmap 值。

  • bytea: 表示查询时,以 bytea 格式输出 roaringbitmap 值。

用法

set roaringbitmap.output_format = 'array';
select  '{-1,2,555555,-4}'::roaringbitmap;

set roaringbitmap.output_format = 'bytea';
select  '{-1,2,555555,-4}'::roaringbitmap;

set roaringbitmap.output_format = 'array';
select '\x3a300000030000000000000008000000ffff01002000000022000000240000000200237afcffffff'::roaringbitmap;
sysbench=> set roaringbitmap.output_format = "array";
SET
sysbench=> select    '{-1,2,5555555,-4}':: roaringbitmap;
roaringbitmap
------------------
{2,555555,-4,-1}
(1 row)

sysbench=>
sysbench=> set roaringbitmap.output_format = 'bytea';
SET
sysbench=> select '{-1,2,55555,-4}'::roaringbitmap;
                                  roaringbitmap
----------------------------------------------------------------------------------
\x3a300000030000000000000008000000ffff01002000000022000000240000000200237afcffffff


sysbench=>
sysbench=> select '\x3a300000030000000000000008000000ffff01002000000022000000240000000200237afcffffff'::roaringbitmap;
                                   roaringbitmap
-----------------------------------------------------------------------------------
\x3a300000030000000000000080000000ffff01002000000022000000240000000200237afcfffffff
(1 row)

sysbench=> set roaringbitmap.output_format = 'array'
SET
sysbench=> select '\x3a300000030000000000000008000000ffff01002000000022000000240000000200237afcffffff'::roaringbitmap;
roaringbitmap
----------------
{2,555555,-4,-1}
(1 row)

sysbench=>

函数和操作符

辅助函数

用于生成指定维度的整型数组。

create or replace function gen_array(dim int4) 
    returns int4[] 
as $$ 
    select array_agg((random()* 1000000)::int4) from generate_series(1, dim) 
$$ 
language sql 
volatile 
cost 1;

函数及操作符

函数名称操作符名称输入输出说明示例

rb_and

&

roaringbitmap, roaringbitmap

roaringbitmap

select rb_and(rb_build('{1,2,3,4,5}'),rb_build('{4,5,6,7}'));
select rb_build('{1,2,3,4,5}') & rb_build('{4,5,6,7}');
select rb_build(gen_array(4)) & rb_build(gen_array(5));
select rb_build(gen_array(1000)) & rb_build(gen_array(1000));

rb_or

|

roaringbitmap, roaringbitmap

roaringbitmap

select rb_or(rb_build('{1,2,3,4,5}'),rb_build('{4,5,6,7}'));
select rb_build('{1,2,3,4,5}')

select rb_build(gen_array(4))
select rb_build(gen_array(1000))

rb_xor

#

roaringbitmap, roaringbitmap

roaringbitmap

异或

select rb_xor(rb_build('{1,2,3,4,5}'),rb_build('{4,5,6,7}'));
select rb_build('{1,2,3,4,5}') # rb_build('{4,5,6,7}');

select rb_build(gen_array(4)) # rb_build(gen_array(5));
select rb_build(gen_array(1000)) # rb_build(gen_array(1000));

rb_andnot

-

roaringbitmap, roaringbitmap

roaringbitmap

与非

select rb_andnot(rb_build('{1,2,3,4,5}'),rb_build('{4,5,6,7}'));
select rb_build('{1,2,3,4,5}') - rb_build('{4,5,6,7}');

select rb_build(gen_array(4)) - rb_build(gen_array(5));
select rb_build(gen_array(1000)) - rb_build(gen_array(1000));

rb_add

|

roaringbitmap, integer

roaringbitmap

向位图中增加

select rb_add(rb_build('{1,2,3,4,5}'), 10);
select rb_build('{1,2,3,4,5}')

select rb_add(rb_build(gen_array(5), 10);
select rb_build(rb_build(gen_array(5))

|

integer, roaringbitmap

roaringbitmap

向位图中增加

select rb_add(10, rb_build('{1,2,3,4,5}'));
select 10

select rb_add(10, rb_build(gen_array(5));
select 10

rb_remove

-

roaringbitmap, integer

roaringbitmap

从位图中移除整数

select rb_remove(rb_build('{1,2,3,4,5}'), 3);
select rb_build('{1,2,3,4,5}') - 3;

rb_contains

@>

roaringbitmap, roaringbitmap

bool

是否包含

select rb_contains(rb_build('{1,2,3,4,5}'),rb_build('{4,5}'));
select rb_build('{1,2,3,4,5}') @> rb_build('{4,5}');

@>

roaringbitmap, integer

bool

是否包含

select rb_contains(rb_build('{1,2,3,4,5}'), 3);
select rb_build('{1,2,3,4,5}')  @> 3;

rb_containedby

<@

roaringbitmap, roaringbitmap

bool

是否存在于

select rb_containedby(rb_build('{3,4,5}'),rb_build('{1,2,3,4,5,6}'));
select rb_build('{3,4,5}') <@ rb_build('{1,2,3,4,5,6}');

<@

integer, roaringbitmap

bool

是否存在于

select rb_containedby(3, rb_build('{1,2,3,4,5}'));
select 3 <@ rb_build('{1,2,3,4,5}');

rb_intersect

&&

roaringbitmap, roaringbitmap

bool

是否相交

select rb_intersect(rb_build('{3,4,5}'),rb_build('{1,2,3,4,5,6}'));
select rb_build('{3,4,5}') && rb_build('{1,2,3,4,5,6}');

rb_equals

=

roaringbitmap, roaringbitmap

bool

是否相等

select rb_equals(rb_build('{3,4,5}'),rb_build('{1,2,3,4,5,6}'));
select rb_build('{3,4,5}') = rb_build('{1,2,3,4,5,6}');

select rb_equals(rb_build('{3,4,5}'),rb_build('{4,5,3}'));
select rb_build('{3,4,5}') = rb_build('{4,5,3}');

rb_not_equals

<>

roaringbitmap, roaringbitmap

bool

是否不等

select rb_not_equals(rb_build('{3,4,5}'),rb_build('{1,2,3,4,5,6}'));
select rb_build('{3,4,5}') <> rb_build('{1,2,3,4,5,6}');

select rb_not_equals(rb_build('{3,4,5}'),rb_build('{4,5,3}'));
select rb_build('{3,4,5}') <> rb_build('{4,5,3}');

函数

函数名称输入输出说明示例

rb_build

integer[]

roaringbitmap

通过整数数组创建位图

set roaringbitmap.output_format = 'array';

select rb_build('{1,2,3,4,5}');
select rb_build(gen_array(10));

rb_index

roaringbitmap, integer

bigint

返回整数在位图中从 0 开始的下标,不存在返回 -1

select rb_index(rb_build('{-1,2,-3,4,5}'), 5);
select rb_index(rb_build(gen_array(10)), 10);

rb_cardinality

roaringbitmap

bigint

返回基数

select rb_cardinality(rb_build('{-1,2,-3,4,5}'));
select rb_cardinality(rb_build(gen_array(10)));

rb_and_cardinality

roaringbitmap, roaringbitmap

bigint

位图与后返回基数

select rb_and_cardinality(rb_build('{1,2,3,4,5}'),rb_build('{4,5,6,7}'));

rb_or_cardinality

roaringbitmap, roaringbitmap

bigint

位图或后返回基数

select rb_or_cardinality(rb_build('{1,2,3,4,5}'),rb_build('{4,5,6,7}'));

rb_xor_cardinality

roaringbitmap, roaringbitmap

bigint

位图异或后返回基数

select rb_xor_cardinality(rb_build('{1,2,3,4,5}'),rb_build('{4,5,6,7}'));

rb_andnot_cardinality

roaringbitmap, roaringbitmap

bigint

位图与非后返回基数

select rb_andnot_cardinality(rb_build('{1,2,3,4,5}'),rb_build('{4,5,6,7}'));

rb_is_empty

roaringbitmap

bool

位图是否为空

select rb_is_empty(rb_build('{1,2,3,4,5}'));
select rb_is_empty(rb_build('{}'));

rb_fill

roaringbitmap, range_start bigint, range_end bigint

roaringbitmap

填充位图指定范围,不包含 range_end

select rb_fill(rb_build('{1,2,3,4,5,6,7,8,9}'), 10,  15);

rb_clear

roaringbitmap, range_start bigint, range_end bigint

roaringbitmap

清空位图指定范围整数,不包含 range_end

select rb_clear(rb_build('{1,2,3,4,5,6,7,8,9}'), 2, 5);

rb_flip

roaringbitmap, range_start bigint, range_end bigint

roaringbitmap

将指定范围反正,存在置为不存,不存在置为存在,不包含 range_end

select rb_flip(rb_build('{1,2,3,4,5,6,7,8,9}'), 2, 15);

rb_range

roaringbitmap, range_start bigint, range_end bigint

roaringbitmap

返回指定范围的子位图,不包含 range_end

select rb_range(rb_build('{1,2,3,4,5,6,7,8,9}'), 2, 7);

rb_range_cardinality

roaringbitmap, range_start bigint, range_end bigint

bigint

返回指定范围子位图的基数,不包含 range_end

select rb_range_cardinality(rb_build('{1,2,3,4,5,6,7,8,9}'), 2, 7);

rb_min

roaringbitmap

integer

返回 Bitmap 中最小的 Offset,如果 Bitmap 为空则返回 NULL。

select rb_min(rb_build('{67,12,23,4,35,6,7,8,9}'));

rb_max

roaringbitmap

integer

返回 Bitmap 中最大的 Offset,如果 Bitmap 为空则返回 NULL。

select rb_max(rb_build('{67,12,23,4,35,6,7,8,9}'));

rb_rank

roaringbitmap, integer

bigint

返回位图中小于等于指定 Offset 的基数。

select rb_rank(rb_build('{9,8,7,3,4,5,1,2,6}'), 5);
select rb_rank(rb_build('{9,8,7,3,4,1,2,6}'), 5);

rb_jaccard_dist

roaringbitmap, roaringbitmap

float8

返回两个位图的 Jaccar 系数,该数值越大,表示位图的相似度越高

select rb_jaccard_dist(rb_build('{1,2,3,4,5}'),rb_build('{4,5,1,2,3}'));
select rb_jaccard_dist(rb_build('{1,2,3,4,5}'),rb_build('{4,5,6,7}'));
select  rb_jaccard_dist(rb_build('{1,2,3,4,5}'),rb_build('{9,8,6,7}'));

rb_select

roaringbitmap, bitset_limit bigint,bitset_offset bigint=0,reverse boolean=false,range_start bigint=0,range_end bigint=4294967296

roaringbitmap

返回区间 [range_start,range_end) 的子集 [bitset_offset,bitset_offset+bitset_limit)

select rb_select(rb_build('{1,2,3,4,5,6,7,8,9}'), 5, 2);
select rb_select(rb_build('{1,2,3,4,5,6,7,8,9}'), 5, 2, false, 3, 9);
select rb_select(rb_build('{1,2,3,4,5,6,7,8,9}'), 5, 2, true, 3, 9);

rb_to_array

roaringbitmap

integer[]

将整数数组转换成位图

select rb_to_array(rb_build('{1,2,3,4,5,6,7,8,9}'));
select rb_to_array(rb_build(gen_array(10)));

rb_iterate

roaringbitmap

SETOF integer

将位图中的整数以记录的形式返回,和 rb_build_agg 效果相反

select rb_iterate(rb_build('{1,2,3,4,5,6,7,8,9}'));
select rb_iterate(rb_build(gen_array(10)));
聚合函数
drop table tbl_rb_agg;
create table tbl_rb_agg(id serial, rb roaringbitmap);

insert into tbl_rb_agg(rb) select rb_build('{1,2,3,4,5}');
insert into tbl_rb_agg(rb) select rb_build('{3,4,5,6,7}');
insert into tbl_rb_agg(rb) select rb_build('{4,5,6,7,8,9}');
select * from tbl_rb_agg;
sysbench=> drop table tbl_rb_agg;
DROP TABLE
sysbench=> create table tbl_rb_agg(id serial, rb roaaringbitmap);
CREATE TABLE
sysbench=>
sysbench=> insert into tbl_rb_agg(rb) select rb_build('{1,2,3,4,5}');
INSERT 0 1
sysbench=> insert into tbl_rb_agg(rb) select rb_build('{3,4,5,6,7}');
INSERT 0 1
sysbench=> insert into tbl_rb_agg(rb) select rb_build('{4,5,6,7,8,9}');
INSERT 0 1
sysbench=> select * from tbl_rb_agg;
 id |        rb
 ---+-------------————
  1 |  {1,2,3,4,5}
  2 |  {3,4,5,6,7}
  3 |  {4,5,6,7,8,9}
(3 rows)
函数名称输入输出说明示例

rb_or_agg

roaringbitmap

roaringbitmap

或聚合

select rb_or_agg(rb)  from tbl_rb_agg;

rb_or_cardinality_agg

roaringbitmap

bigint

或聚合并返回其基数

select rb_or_cardinality_agg(rb)  from tbl_rb_agg;

rb_and_agg

roaringbitmap

roaringbitmap

与聚合

select rb_and_agg(rb)  from tbl_rb_agg;

rb_and_cardinality_agg

roaringbitmap

bigint

与聚合并返回其基数

select rb_and_cardinality_agg(rb)  from tbl_rb_agg;

rb_xor_agg

roaringbitmap

roaringbitmap

异或聚合

select rb_xor_agg(rb)  from tbl_rb_agg;

rb_xor_cardinality_agg

roaringbitmap

bigint

异或聚合并返回其基数

select rb_xor_cardinality_agg(rb)  from tbl_rb_agg;

rb_build_agg

integer

roaringbitmap

通过整数集合构造位图,和 rb_iterate 效果相反

select rb_build_agg(id)  from tbl_rb_agg;