Respect max_block_size for array join to avoid possible OOM#54664
Respect max_block_size for array join to avoid possible OOM#54664robot-ch-test-poll4 merged 3 commits intoClickHouse:masterfrom
Conversation
|
This is an automated comment for commit d59db55 with description of existing statuses. It's updated for the latest CI running ✅ Click here to open a full report in a separate page Successful checks
|
|
Let's add a test. |
|
Thank you! I wanted this feature for a very long time 🎉 |
|
Notice peak memory usage :) set max_block_size = 10;
SET max_block_size = 10
Query id: a82913e9-bd2d-41c4-99de-70f16f1d7f6b
Ok.
0 rows in set. Elapsed: 0.001 sec.
:) SELECT n % 10, count(1) from (SELECT range(0, number) as x FROM numbers(10000)) LEFT ARRAY JOIN x as n group by n % 10;
SELECT
n % 10,
count(1)
FROM
(
SELECT range(0, number) AS x
FROM numbers(10000)
)
LEFT ARRAY JOIN x AS n
GROUP BY n % 10
Query id: 72db26cd-1a66-46ad-9e53-4334cbf321ee
┌─modulo(n, 10)─┬─count()─┐
│ 0 │ 5004001 │
│ 1 │ 5003000 │
│ 2 │ 5002000 │
│ 3 │ 5001000 │
│ 4 │ 5000000 │
│ 5 │ 4999000 │
│ 6 │ 4998000 │
│ 7 │ 4997000 │
│ 8 │ 4996000 │
│ 9 │ 4995000 │
└───────────────┴─────────┘
10 rows in set. Elapsed: 0.372 sec. Processed 10.00 thousand rows, 80.00 KB (26.86 thousand rows/s., 214.86 KB/s.)
Peak memory usage: 8.61 MiB.
:) set max_block_size = 100000;
SET max_block_size = 100000
Query id: b4081541-2fde-47ac-983d-980d45d9a850
Ok.
0 rows in set. Elapsed: 0.001 sec.
:) SELECT n % 10, count(1) from (SELECT range(0, number) as x FROM numbers(10000)) LEFT ARRAY JOIN x as n group by n % 10;
SELECT
n % 10,
count(1)
FROM
(
SELECT range(0, number) AS x
FROM numbers(10000)
)
LEFT ARRAY JOIN x AS n
GROUP BY n % 10
Query id: 6fdae9a2-680f-4b67-a170-4ac00934bb49
┌─modulo(n, 10)─┬─count()─┐
│ 0 │ 5004001 │
│ 1 │ 5003000 │
│ 2 │ 5002000 │
│ 3 │ 5001000 │
│ 4 │ 5000000 │
│ 5 │ 4999000 │
│ 6 │ 4998000 │
│ 7 │ 4997000 │
│ 8 │ 4996000 │
│ 9 │ 4995000 │
└───────────────┴─────────┘
10 rows in set. Elapsed: 0.689 sec. Processed 10.00 thousand rows, 80.00 KB (14.51 thousand rows/s., 116.05 KB/s.)
Peak memory usage: 898.64 MiB. |
711f948 to
39ca44b
Compare
done |
|
Hope for your reviews @alexey-milovidov The failed test not related to array join changes: https://s3.amazonaws.com/clickhouse-test-reports/54664/32c5aee1c34b68cd992b0fb81565caa7fc702e4e/integration_tests__release__[3_4]/integration_run_parallel1_0.log |
…apply max_block_size #3191 What changes were proposed in this pull request? Notice that arrayJoin function and array join step are executing in different places. The former is in ExpressionActions::execute, and latter is in ArrayJoinAction::execute. max_block_size is currently only valid in array join step, so we must transform arrayJoin function to array join step to apply max_block_size. Notice: This pr relies on ClickHouse/ClickHouse#54664, but it can be merge firstly. (Fixes: #3143)
| @@ -0,0 +1,13 @@ | |||
| SET max_block_size = 8192; | |||
There was a problem hiding this comment.
This test passed on 23.7
| @@ -0,0 +1,16 @@ | |||
| -- { echoOn } | |||
| set max_block_size = 10, enable_unaligned_array_join = true; | |||
There was a problem hiding this comment.
This test also passed on 23.7
|
@nickitat I didn't find any difference after this feature, and the tests have passed on previous ClickHouse versions. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Respect max_block_size for array join to avoid possible OOM. Close #54290