[RFC] Fix memory tracking for OPTIMIZE TABLE/merges#18772
[RFC] Fix memory tracking for OPTIMIZE TABLE/merges#18772alexey-milovidov merged 3 commits intoClickHouse:masterfrom
Conversation
|
There is an issue - you run long INSERT query (for example, insert a terabyte of data in single query) and you will get constantly growing memory usage for query, that is false, because the memory for indices is freed in background merges. |
@alexey-milovidov are you sure that this is still relevant? Those index_columns will be releated with Also I've just started one huge INSERT, like CREATE TABLE test_memory
ENGINE = MergeTree
ORDER BY (k, v) AS SELECT
number AS k,
reinterpretAsString(number) AS v FROM numbers(toUInt64(1000000000000.))With the binary from this PR, and I don't see excessive memory usage reported by MemoryTracker (right now 3e9 rows were inserted already, and memory usage is ~70MB, same as w/o this PR). |
|
How is it possible if:
|
Because of BlockerInThread in MergeTreeDataPartWriterOnDisk::calculateAndSerializePrimaryIndex memory was accounted incorrectly and grows constantly. And IIUC there is no need in that blocker, since INSERT SELECT shares the same thread group.
But reduce scope, to avoid leaking too much memory, since there are old values in last_block_index_columns. The scope of the MemoryTracker::BlockerInThread has been increased in ClickHouse#8290
93fac66 to
0560868
Compare
You are right, and blocker cannot be removed here. However the scope should be decreased, because otherwise deallocating of old columns from last_block_index_columns will leak (this has been introduced in #8290, cc @CurtizJ ) Here is an example of OPTIMIZE TABLE that "eats too much memory" w/o this patches: While process RSS does not grows: $ pgrep -f clickhouse.*server$ | xargs ps u
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
azat 29353 77.9 1.0 6236800 669548 pts/13 Sl+ 01:07 0:42 /src/ch/tmp/upstream/clickhouse server And also you can see that P.S. Maybe these two patches should be splitted into 2 separate PRs... |
|
@azat I think we should make all such blockers to account memory for server and not account for query/user. |
|
@alexey-milovidov FWIW this is still relevant (and it does not requires any changes even after #19146, since that PR changes the default for the BlockerInThread) |
[RFC] Fix memory tracking for OPTIMIZE TABLE/merges (cherry picked from commit 4e11e7c)
[RFC] Fix memory tracking for OPTIMIZE TABLE/merges (cherry picked from commit 4e11e7c)
[RFC] Fix memory tracking for OPTIMIZE TABLE/merges (cherry picked from commit 4e11e7c)
|
@azat Does this also happens in 20.12? If yes, can we also cherry-pick the fix to 20.12? |
|
Many thanks for the info. @azat |
|
Hi, @azat I found the fix is not in 20.12 branch. Can you help to confirm? Does the new version of 20.12 come from 20.12 branch? BTW, do you know how to generate a new release for 20.12? |
Yes, you are right, by some reason it has been reverted in 83ee3cb
Yes. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
OPTIMIZE TABLE/mergesOPTIMIZE TABLE/mergesDetailed description / Documentation draft:
Because of BlockerInThread in
MergeTreeDataPartWriterOnDisk::calculateAndSerializePrimaryIndex memory
was accounted incorrectly and grows constantly.
And IIUC there is no need in that blocker, since INSERT SELECT shares
the same thread group.
Plus take memory limits and sampling into account (and because of this marked as backward incompatible since now OPTIMIZE may fail if the query has some memory limits)