Skip to content

Constness-aware JoinGet.#7359

Merged
akuzm merged 1 commit intoClickHouse:masterfrom
amosbird:joingetconst
Oct 17, 2019
Merged

Constness-aware JoinGet.#7359
akuzm merged 1 commit intoClickHouse:masterfrom
amosbird:joingetconst

Conversation

@amosbird
Copy link
Copy Markdown
Collaborator

@amosbird amosbird commented Oct 16, 2019

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

The problem is discovered by @yingfeng and his collegue when testing joinGet with bitmaps.

Consider the following query,

WITH joinGet('cdp_tags_kv', 'mid_seqs', 'tag1') AS bm1,
     joinGet('cdp_tags_kv', 'mid_seqs', 'tag2') AS bm2,
SELECT multiIf(bitmapContains(bm1, mid_seq), 1, bitmapContains(bm2, mid_seq), 2, 0) AS tag, count() AS gc
FROM cdp_orders
PREWHERE order_complete_time >= '2019-01-01 00:00:00' AND order_complete_time <= '2019-01-01 01:00:00'
GROUP BY tag;

without constness-aware, joinGet will duplicate bitmaps.

Category (leave one):

  • Performance Improvement

@yingfeng
Copy link
Copy Markdown

Great! it can help to bypass the performance issue caused by scalar subqueries mentioned in 7329 and 7077 .

In this case, given queries introducing huge aggregateStateData in scalar subqueries, one could store the aggregateStateData in a temporary table, and use joinGet to retrieve, such that scalar subqueries are not triggered without cumbersome string escaping and copying.

@yuzhichang
Copy link
Copy Markdown
Contributor

yuzhichang commented Oct 17, 2019

An Join table behaves append-only. Any row is not update-able.

CREATE TABLE mid_replacing_kv (mid_seq_from UInt32, mid_seq UInt32) ENGINE = Join(ANY, LEFT, mid_seq_from);

INSERT INTO mid_replacing_kv(mid_seq_from, mid_seq) VALUES (5, 4), (10, 3);
-- expect 4, get 4
SELECT joinGet('mid_replacing_kv', 'mid_seq', toUInt32(5));

INSERT INTO mid_replacing_kv(mid_seq_from, mid_seq) VALUES (5, 2);
-- expect 2, get 4
SELECT joinGet('mid_replacing_kv', 'mid_seq', toUInt32(5));

-- error
ALTER TABLE mid_replacing_kv UPDATE mid_seq=2 WHERE mid_seq_from=5;
Code: 48. DB::Exception: Received from localhost:9000. DB::Exception: Mutations are not supported by storage Join. 

So joinGet always returns const value.

FYI, following sql still oom.

-- With https://github.com/ClickHouse/ClickHouse/pull/7359
-- Out of memory
WITH ['tag1', 'tag2', 'tag3', 'tag4', 'tag5', 'tag6', 'tag7', 'tag8'] AS tag_ids,
     toDateTime('2000-01-01 00:00:00') AS ts_begin,
     addSeconds(toDateTime('2000-01-01 00:00:00'), 1e8) AS ts_end
SELECT arrayFirst(t -> bitmapContains(joinGet('cdp_tags_kv', 'mid_seqs', t), mid_seq), tag_ids) AS tag, count() AS gc, sum(order_total_sales) AS total
FROM cdp_orders
PREWHERE order_complete_time >= ts_begin AND order_complete_time < ts_end
GROUP BY tag;

@amosbird
Copy link
Copy Markdown
Collaborator Author

An Join table behaves append-only. Any row is not update-able.

It's updatable but not via the alter interface. #3973

FYI, following sql still oom.

That's because bitmap types are value owners not value handlers like StringRef, and you are generating non-constant blocks with same bitmaps. You can try combining it with LowCardinality. It might work.

@akuzm akuzm merged commit e9d7da3 into ClickHouse:master Oct 17, 2019
@akuzm akuzm added the pr-improvement Pull request with some product improvements label Oct 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants