-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Implementing Bitmap aggregate functions #6955
Description
Currently, only Merge could be used for bitmap aggregations, however, it can only provide OR semantic, while other semantics such as AND, XOR,..., are not available
Use case
There are two kinds of use cases might be useful for bitmap aggregation.
For example, bitmap is stored as a column value, which is very common for user profiling platform in advertising similar system, while each record contains the user id list which is stored as the bitmap, together with other meta information. Following kinds of query is required in many cases to generate new user list:
SELECT age,groupBitmapAnd(bitmap) FROM table GROUP BY age ;
There's another use case which has close relationship with performance issue mentioned in issue 6880, we've found that aggregation bitmap functions might be executed much faster than its non-aggregation alternatives, such as:
CREATE TABLE cdp_tags ( \
tag_id String, \
mid_seqs AggregateFunction(groupBitmap, UInt32), \
cardinality UInt32 \
) engine=ReplacingMergeTree() \
ORDER BY (tag_id) SETTINGS index_granularity=1;
SELECT groupBitmapMerge(mid_seqs)
FROM cdp_tags
WHERE has(['first_buy_market_NA', 'first_buy_province_广东省', 'member_origin_channel_NA', 'mobile_province_NA'], tag_id)
vs
SELECT bitmapAndCardinality(bitmapOr(bitmapOr(
(
SELECT mid_seqs
FROM cdp_tags
WHERE tag_id = 'first_buy_market_NA'
),
(
SELECT mid_seqs
FROM cdp_tags
WHERE tag_id = 'first_buy_province_广东省'
)),
(
SELECT mid_seqs
FROM cdp_tags
WHERE tag_id = 'member_origin_channel_NA'
)),
(
SELECT mid_seqs
FROM cdp_tags
WHERE tag_id = 'mobile_province_NA'
))