请注意:
下文中的一些示例引用自 ClickHouse 社区文档 并经过一定修改确保可以在 ByteHouse 中正常使用。
Selects the first encountered value.
The query can be executed in any order and even in a different order each time, so the result of this function is indeterminate.
To get a determinate result, you can use the ‘min’ or ‘max’ function instead of ‘any’.
In some cases, you can rely on the order of execution. This applies to cases when SELECT comes from a subquery that uses ORDER BY.
When a SELECT
query has the GROUP BY
clause or at least one aggregate function, ByteHouse (in contrast to MySQL) requires that all expressions in the SELECT
, HAVING
, and ORDER BY
clauses be calculated from keys or from aggregate functions. In other words, each column selected from the table must be used either in keys or inside aggregate functions. To get behavior like in MySQL, you can put the other columns in the any
aggregate function.
Syntax
any(column)
Arguments
column
– The column name.Returned value
Example
SELECT any(number) FROM numbers(10);
┌─any(number)─┐ │ 0 │ └─────────────┘
Selects a frequently occurring value using the heavy hitters algorithm. If there is a value that occurs more than in half the cases in each of the query’s execution threads, this value is returned. Normally, the result is nondeterministic.
Syntax
anyHeavy(column)
Arguments
column
– The column name.Returned value
Example
CREATE TABLE IF NOT EXISTS test.functionAnyHeavy(id Int) ENGINE=CnchMergeTree() ORDER BY id; INSERT INTO test.functionAnyHeavy values (1),(1),(1),(2),(3); SELECT anyHeavy(id) FROM test.functionAnyHeavy;
┌─anyHeavy(id)─┐ │ 1 │ └──────────────┘
Selects the last value encountered.
The result is just as indeterminate.
Syntax
anyLast(column)
Arguments
column
– The column name.Returned value
Example
SELECT anyLast(number) FROM numbers(10);
┌─anyLast(number)─┐ │ 9 │ └─────────────────┘
Calculates the arg
value for a maximum val
value. If there are several different values of arg
for maximum values of val
, returns the first of these values encountered.
Syntax
argMax(arg, val)
Arguments
arg
— Argument.val
— Value.Returned value
Example
CREATE TABLE IF NOT EXISTS test.functionArgMax (user String, salary Int) Engine=CnchMergeTree() ORDER BY user; INSERT INTO test.functionArgMax VALUES ('director',5000),('manager',3000),('worker',1000); SELECT argMax(user, salary) FROM test.functionArgMax;
┌─argMax(user, salary)─┐ │ director │ └──────────────────────┘
Calculates the arg
value for a minimum val
value. If there are several different values of arg
for minimum values of val
, returns the first of these values encountered.
Syntax
argMin(arg, val)
Arguments
arg
— Argument.val
— Value.Returned value
Example
CREATE TABLE IF NOT EXISTS test.functionArgMin (user String, salary Int) Engine=CnchMergeTree() ORDER BY user; INSERT INTO test.functionArgMin VALUES ('director',5000),('manager',3000),('worker',1000); SELECT argMin(user, salary) FROM test.functionArgMin;
┌─argMin(user, salary)─┐ │ worker │ └──────────────────────┘
Calculates the arithmetic mean.
Syntax
avg(x)
Arguments
x
— input values, must be Integer, Float or Decimal.Returned value
NaN
if the input parameter x
is empty.Example
CREATE TABLE IF NOT EXISTS test.functionAvg (x Int8) ENGINE=CnchMergeTree() ORDER BY x; INSERT INTO test.functionAvg SELECT * FROM numbers(6); SELECT avg(x) FROM test.functionAvg;
┌─avg(x)──┐ │ 2.5e+00 │ └─────────┘
CREATE TABLE IF NOT EXISTS test.functionAvg (x Int8) ENGINE=CnchMergeTree() ORDER BY x; SELECT avg(x) FROM test.functionAvg;
┌─avg(x)─┐ │ NaN │ └────────┘
Calculates the Pearson correlation coefficient: Σ((x - x̅)(y - y̅)) / sqrt(Σ((x - x̅)^2) * Σ((y - y̅)^2))
.
This function uses a numerically unstable algorithm. If you need numerical stability in calculations, use the corrStable
function. It works slower but provides a lower computational error.
Syntax
corr(x, y)
Arguments
x
— input values, must be Integer, Float.y
— input values, must be Integer, Float.Returned value
Example
CREATE TABLE IF NOT EXISTS test.functionCorr (x Int8, y Int8) ENGINE=CnchMergeTree() ORDER BY x; INSERT INTO test.functionCorr VALUES (1,5),(2,4),(3,3),(4,2),(5,1); SELECT corr(x,y) FROM test.functionCorr;
┌─corr(x, y)─┐ │ -1e+00 │ └────────────┘
Calculates the value of Σ((x - x̅)(y - y̅)) / n
.
This function uses a numerically unstable algorithm. If you need numerical stability in calculations, use the covarPopStable
function. It works slower but provides a lower computational error.
Syntax
covarPop(x, y)
Arguments
x
— input values, must be Integer, Float.y
— input values, must be Integer, Float.Returned value
Σ((x - x̅)(y - y̅)) / n
, always as Float64.Example
CREATE TABLE IF NOT EXISTS test.functionCovarPop (x Int8, y Int8) ENGINE=CnchMergeTree() ORDER BY x; INSERT INTO test.functionCovarPop VALUES (1,5),(2,4),(3,3),(4,2),(5,1); SELECT covarPop(x,y) FROM test.functionCovarPop;
┌─covarPop(x, y)─┐ │ -2e+00 │ └────────────────┘
Calculates the value of Σ((x - x̅)(y - y̅)) / (n - 1)
.
This function uses a numerically unstable algorithm. If you need numerical stability in calculations, use the covarSampStable
function. It works slower but provides a lower computational error.
Syntax
covarSamp(x, y)
Arguments
x
— input values, must be Integer, Float.y
— input values, must be Integer, Float.Returned value
Σ((x - x̅)(y - y̅)) / (n - 1)
, always as Float64.Example
CREATE TABLE IF NOT EXISTS test.functionCovarSamp (x Int8, y Int8) ENGINE=CnchMergeTree() ORDER BY x; INSERT INTO test.functionCovarSamp VALUES (1,5),(2,4),(3,3),(4,2),(5,1); SELECT covarSamp(x,y) FROM test.functionCovarSamp;
┌─covarSamp(x, y)─┐ │ -2.5e+00 │ └─────────────────┘
CREATE TABLE IF NOT EXISTS test.functionCovarSamp (x Int8, y Int8) ENGINE=CnchMergeTree() ORDER BY x; INSERT INTO test.functionCovarSamp VALUES (1,5); SELECT covarSamp(x,y) FROM test.functionCovarSamp;
┌─covarSamp(x, y)─┐ │ NaN │ └─────────────────┘
Creates an array of argument values.
Values can be added to the array in any (indeterminate) order.
The second version (with the max_size
parameter) limits the size of the resulting array to max_size
elements. For example, groupArray(1)(x)
is equivalent to [any (x)]
.
In some cases, you can still rely on the order of execution. This applies to cases when SELECT
comes from a subquery that uses ORDER BY
.
Syntax
groupArray(x) groupArray(max_size)(x)
Arguments
x
— input values.Returned value
Example
select groupArray(number) from numbers(10);
┌─groupArray(number)─────────────┐ │ [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] │ └────────────────────────────────┘
select groupArray(5)(number) from numbers(10);
┌─groupArray(5)(number)─┐ │ [0, 1, 2, 3, 4] │ └───────────────────────┘
Applies bitwise AND
for series of numbers.
Syntax
groupBitAnd(expr)
Arguments
expr
– An expression that results in UInt*
type.Return value
UInt*
type.Example
CREATE TABLE IF NOT EXISTS test.groupBitAnd (num UInt8) ENGINE=CnchMergeTree() ORDER BY num; INSERT INTO test.groupBitAnd VALUES (44),(28),(13),(85); SELECT groupBitAnd(num) FROM test.groupBitAnd;
┌─groupBitAnd(num)─┐ │ 4 │ └──────────────────┘
Bit representation of 44,28,13 and 85 are 00101100,00011100,00001101 and 01010101. The bit and of the four numbers is 00000100 which is equal to 4.
Applies bitwise OR
for series of numbers.
Syntax
groupBitOr(expr)
Arguments
expr
– An expression that results in UInt*
type.Returned value
UInt*
type.Example
CREATE TABLE IF NOT EXISTS test.functionGroupBitOr (num UInt8) ENGINE=CnchMergeTree ORDER BY num; INSERT INTO test.functionGroupBitOr VALUES (44),(28),(13),(85); SELECT groupBitOr(num) FROM test.functionGroupBitOr;
┌─groupBitOr(num)─┐ │ 125 │ └─────────────────┘
Bit representation of 44,28,13 and 85 are 00101100,00011100,00001101 and 01010101. The bit or of the four numbers is 01111101 which is equal to 125.
Applies bitwise XOR
for series of numbers.
Syntax
groupBitXor(expr)
Arguments
expr
– An expression that results in UInt*
type.Return value
UInt*
type.Example
CREATE TABLE IF NOT EXISTS test.functionGroupBitXor (num UInt8) ENGINE=CnchMergeTree ORDER BY num; INSERT INTO test.functionGroupBitXor VALUES (44),(28),(13),(85); SELECT groupBitXor(num) FROM test.functionGroupBitXor;
┌─groupBitXor(num)─┐ │ 104 │ └──────────────────┘
Bit representation of 44,28,13 and 85 are 00101100,00011100,00001101 and 01010101. The bit xor of the four numbers is 01101000 which is equal to 104.
Bitmap or Aggregate calculations from a unsigned integer column, return cardinality of type UInt64, if add suffix -State, then return bitmap object.
Syntax
groupBitmap(expr)
Arguments
expr
– An expression that results in UInt*
type.Return value
UInt64
type.Example
SELECT groupBitmapState(number) FROM numbers(4);
┌─groupBitmap(number)─┐ │ 4 │ └─────────────────────┘
Calculations the AND of a bitmap column, return cardinality of type UInt64, if add suffix -State, then return bitmap object.
Syntax
groupBitmapAnd(expr)
Arguments
expr
– An expression that results in AggregateFunction(groupBitmap, UInt*)
type.Return value
UInt64
type.Example
DROP TABLE IF EXISTS bitmap_column_expr_test2; CREATE TABLE bitmap_column_expr_test2(tag_id String,z AggregateFunction(groupBitmap, UInt32)) ENGINE = CnchMergeTree ORDER BY tag_id; INSERT INTO bitmap_column_expr_test2 VALUES ('tag1', bitmapBuild(cast([1,2,3,4,5,6,7,8,9,10] as Array(UInt32)))); INSERT INTO bitmap_column_expr_test2 VALUES ('tag2', bitmapBuild(cast([6,7,8,9,10,11,12,13,14,15] as Array(UInt32)))); INSERT INTO bitmap_column_expr_test2 VALUES ('tag3', bitmapBuild(cast([2,4,6,8,10,12] as Array(UInt32)))); SELECT groupBitmapAnd(z) FROM bitmap_column_expr_test2 WHERE like(tag_id, 'tag%'); ┌─groupBitmapAnd(z)─┐ │ 3 │ └───────────────────┘ SELECT arraySort(bitmapToArray(groupBitmapAndState(z))) FROM bitmap_column_expr_test2 WHERE like(tag_id, 'tag%'); ┌─arraySort(bitmapToArray(groupBitmapAndState(z)))─┐ │ [6,8,10] │ └──────────────────────────────────────────────────┘
Calculations the OR of a bitmap column, return cardinality of type UInt64, if add suffix -State, then return bitmap object. This is equivalent to groupBitmapMerge
.
Syntax
groupBitmapOr(expr)
Arguments
expr
– An expression that results in AggregateFunction(groupBitmap, UInt*)
type.Returned value
UInt64
type.Example
DROP TABLE IF EXISTS bitmap_column_expr_test2; CREATE TABLE bitmap_column_expr_test2(tag_id String,z AggregateFunction(groupBitmap, UInt32)) ENGINE = CnchMergeTree ORDER BY tag_id; INSERT INTO bitmap_column_expr_test2 VALUES ('tag1', bitmapBuild(cast([1,2,3,4,5,6,7,8,9,10] as Array(UInt32)))); INSERT INTO bitmap_column_expr_test2 VALUES ('tag2', bitmapBuild(cast([6,7,8,9,10,11,12,13,14,15] as Array(UInt32)))); INSERT INTO bitmap_column_expr_test2 VALUES ('tag3', bitmapBuild(cast([2,4,6,8,10,12] as Array(UInt32)))); SELECT groupBitmapOr(z) FROM bitmap_column_expr_test2 WHERE like(tag_id, 'tag%'); ┌─groupBitmapOr(z)─┐ │ 15 │ └──────────────────┘ SELECT arraySort(bitmapToArray(groupBitmapOrState(z))) FROM bitmap_column_expr_test2 WHERE like(tag_id, 'tag%'); ┌─arraySort(bitmapToArray(groupBitmapOrState(z)))─┐ │ [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] │ └─────────────────────────────────────────────────┘
Calculations the XOR of a bitmap column, return cardinality of type UInt64, if add suffix -State, then return bitmap object.
Syntax
groupBitmapOr(expr)
Arguments
expr
– An expression that results in AggregateFunction(groupBitmap, UInt*)
type.Returned value
UInt64
type.Example
DROP TABLE IF EXISTS bitmap_column_expr_test2; CREATE TABLE bitmap_column_expr_test2(tag_id String, z AggregateFunction(groupBitmap, UInt32)) ENGINE = CnchMergeTree ORDER BY tag_id; INSERT INTO bitmap_column_expr_test2 VALUES ('tag1', bitmapBuild(cast([1,2,3,4,5,6,7,8,9,10] as Array(UInt32)))); INSERT INTO bitmap_column_expr_test2 VALUES ('tag2', bitmapBuild(cast([6,7,8,9,10,11,12,13,14,15] as Array(UInt32)))); INSERT INTO bitmap_column_expr_test2 VALUES ('tag3', bitmapBuild(cast([2,4,6,8,10,12] as Array(UInt32)))); SELECT groupBitmapXor(z) FROM bitmap_column_expr_test2 WHERE like(tag_id, 'tag%'); ┌─groupBitmapXor(z)─┐ │ 10 │ └───────────────────┘ SELECT arraySort(bitmapToArray(groupBitmapXorState(z))) FROM bitmap_column_expr_test2 WHERE like(tag_id, 'tag%'); ┌─arraySort(bitmapToArray(groupBitmapXorState(z)))─┐ │ [1,3,5,6,8,10,11,13,14,15] │ └──────────────────────────────────────────────────┘
Creates an array from different argument values. Memory consumption is the same as for the uniqExact function.
The second version (with the max_size
parameter) limits the size of the resulting array to max_size
elements.
Syntax
groupUniqArray(x) groupUniqArray(max_size)(x)
Arguments
x
— input values.Returned value
Example
select groupUniqArray(number) from numbers(10);
┌─groupUniqArray(number)─────────┐ │ [0, 8, 3, 9, 7, 2, 1, 6, 4, 5] │ └────────────────────────────────┘
select groupUniqArray(5)(number) from numbers(10);
┌─groupArray(5)(number)─┐ │ [0, 1, 2, 3, 4] │ └───────────────────────┘
Calculates an adaptive histogram. It does not guarantee precise results.
The functions uses A Streaming Parallel Decision Tree Algorithm . The borders of histogram bins are adjusted as new data enters a function. In common case, the widths of bins are not equal.
Syntax
histogram(number_of_bins)(values)
Arguments
number_of_bins
— Upper limit for the number of bins in the histogram. The function automatically calculates the number of bins. It tries to reach the specified number of bins, but if it fails, it uses fewer bins.values
— Expression resulting in input values.Returned values
[(lower_1, upper_1, height_1), ... (lower_N, upper_N, height_N)] - `lower` — Lower bound of the bin. - `upper` — Upper bound of the bin. - `height` — Calculated height of the bin.
Example
SELECT histogram(5)(number + 1) FROM numbers(20);
┌─histogram(5)(plus(number, 1))───────────────────────────────────────────┐ │ [(1,4.5,4),(4.5,8.5,4),(8.5,12.75,4.125),(12.75,17,4.625),(17,20,3.25)] │ └─────────────────────────────────────────────────────────────────────────┘
You can visualize a histogram with the bar function, for example:
WITH histogram(5)(rand() % 100) AS hist SELECT arrayJoin(hist).3 AS height, bar(height, 0, 6, 5) AS bar FROM numbers(20);
┌─height─┬─bar───┐ │ 2.125 │ █▋ │ │ 3.25 │ ██▌ │ │ 5.625 │ ████▏ │ │ 5.625 │ ████▏ │ │ 3.375 │ ██▌ │ └────────┴───────┘
In this case, you should remember that you do not know the histogram bin borders.
Aggregate function that calculates the maximum across a group of values.
Syntax
max(values)
Arguments
values
— input values.Returned values
Example
SELECT max(number) FROM numbers(20);
┌─max(number)─┐ │ 19 │ └─────────────┘
If you need non-aggregate function to choose a maximum of two values, see greatest
:
SELECT greatest(a, b) FROM table;
The median*
functions are the aliases for the corresponding quantile*
functions. They calculate median of a numeric data sample.
Functions:
median
— Alias for quantile.medianDeterministic
— Alias for quantileDeterministic.medianExact
— Alias for quantileExact.medianExactWeighted
— Alias for quantileExactWeighted.medianTiming
— Alias for quantileTiming.medianTimingWeighted
— Alias for quantileTimingWeighted.medianTDigest
— Alias for quantileTDigest.medianTDigestWeighted
— Alias for quantileTDigestWeighted.medianBFloat16
— Alias for quantileBFloat16.Alias for quantileDeterministic.
Alias for quantileExact.
Alias for quantileExactWeighted.
Alias for quantileTDigest.
Alias for quantileTDigestWeighted.
Alias for quantileTiming.
Alias for quantileTimingWeighted.
Computes an approximate quantile of a numeric data sequence.
This function applies reservoir sampling with a reservoir size up to 8192 and a random number generator for sampling. The result is non-deterministic. To get an exact quantile, use the quantileExact function.
When using multiple quantile*
functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the quantiles function.
Syntax
quantile(level)(expr)
Alias: median
.
Arguments
level
— Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a level
value in the range of [0.01, 0.99]
. Default value: 0.5. At level=0.5
the function calculates median .expr
— Expression over the column values resulting in numeric, Date, or DateTime.Returned value
Date
type.DateTime
type.Example
SELECT quantile(number) FROM numbers(10);
┌─quantile(number)─┐ │ 4.5e+00 │ └──────────────────┘
Computes an approximate quantile of a numeric data sequence.
This function applies reservoir sampling with a reservoir size up to 8192 and deterministic algorithm of sampling. The result is deterministic. To get an exact quantile, use the quantileExact function.
When using multiple quantile*
functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the quantiles function.
Syntax
quantileDeterministic(level)(expr, determinator)
Alias: medianDeterministic
.
Arguments
level
— Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a level
value in the range of [0.01, 0.99]
. Default value: 0.5. At level=0.5
the function calculates median .expr
— Expression over the column values resulting in numeric data types, Date or DateTime.determinator
— Number whose hash is used instead of a random number generator in the reservoir sampling algorithm to make the result of sampling deterministic. As a determinator you can use any deterministic positive number, for example, a user id or an event id. If the same determinator value occures too often, the function works incorrectly.Returned value
Date
type.DateTime
type.Example
SELECT quantileDeterministic(number, 1) FROM numbers(10);
┌─quantileDeterministic(number, 1)─┐ │ 4.5e+00 │ └──────────────────────────────────┘
Exactly computes the quantile of a numeric data sequence.
To get exact value, all the passed values are combined into an array, which is then partially sorted. Therefore, the function consumes O(n)
memory, where n
is a number of values that were passed. However, for a small number of values, the function is very effective.
When using multiple quantile*
functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the quantiles function.
Syntax
quantileExact(level)(expr)
Alias: medianExact
.
Arguments
level
— Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a level
value in the range of [0.01, 0.99]
. Default value: 0.5. At level=0.5
the function calculates median .expr
— Expression over the column values resulting in numeric data types, Date or DateTime.Returned value
Date
type.DateTime
type.Example
SELECT quantileExact(number) FROM numbers(10);
┌─quantileExact(number)─┐ │ 5 │ └───────────────────────┘
Exactly computes the quantile of a numeric data sequence, taking into account the weight of each element.
To get exact value, all the passed values are combined into an array, which is then partially sorted. Each value is counted with its weight, as if it is present weight
times. A hash table is used in the algorithm. Because of this, if the passed values are frequently repeated, the function consumes less RAM than quantileExact. You can use this function instead of quantileExact
and specify the weight 1.
When using multiple quantile*
functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the quantiles function.
Syntax
quantileExactWeighted(level)(expr, weight)
Alias: medianExactWeighted
.
Arguments
level
— Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a level
value in the range of [0.01, 0.99]
. Default value: 0.5. At level=0.5
the function calculates median .expr
— Expression over the column values resulting in numeric, Date or DateTime.weight
— Column with weights of sequence members. Weight is a number of value occurrences.Returned value
Date
type.DateTime
type.Example
CREATE TABLE IF NOT EXISTS test.functionQuantileExactWeighted (n UInt8, val UInt8) ENGINE=CnchMergeTree ORDER BY n; INSERT INTO test.functionQuantileExactWeighted VALUES (0,3),(1,2),(2,1),(5,4); SELECT quantileExactWeighted(n, val) FROM test.functionQuantileExactWeighted;
┌─quantileExactWeighted(n, val)─┐ │ 1 │ └───────────────────────────────┘
Computes an approximate quantile of a numeric data sequence using the t-digest algorithm.
Memory consumption is log(n)
, where n
is a number of values. The result depends on the order of running the query, and is nondeterministic.
The performance of the function is lower than performance of quantile or quantileTiming. In terms of the ratio of State size to precision, this function is much better than quantile
.
When using multiple quantile*
functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the quantiles function.
Syntax
quantileTDigest(level)(expr)
Alias: medianTDigest
.
Arguments
level
— Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a level
value in the range of [0.01, 0.99]
. Default value: 0.5. At level=0.5
the function calculates median .expr
— Expression over the column values resulting in numeric data types, Date or DateTime.Returned value
Date
type.DateTime
type.Example
SELECT quantileTDigest(number) FROM numbers(10)
┌─quantileTDigest(number)─┐ │ 4.5e+00 │ └─────────────────────────┘
Computes an approximate quantile of a numeric data sequence using the t-digest algorithm. The function takes into account the weight of each sequence member. The maximum error is 1%. Memory consumption is log(n)
, where n
is a number of values.
The performance of the function is lower than the performance of quantile or quantileTiming. In terms of the ratio of State size to precision, this function is much better than quantile
.
The result depends on the order of running the query and is nondeterministic.
When using multiple quantile*
functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the quantiles function.