Significantly decrease MemoryTracking drift by azat · Pull Request #16121 · ClickHouse/ClickHouse

azat · 2020-10-18T11:14:54Z

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Use total_memory_tracker when there is no other MemoryTracker object (this should significantly reduce the MemoryTracking drift)
Fix parent memory tracker during query detaching
Fix accounting for new/delete from different threads for VariableContext::Thread (supersedes Fix accounting for new/delete from different threads for VariableContext::Thread #16122)

Detailed description / Documentation draft:
After this PR the memory drift is ~0 after running 100 queries via http/tcp/tcp from one session. (test)

TL;DR;

Details

To track memory, clickhouse creates memory tracker object for each
thread explicitly, but until it is not created the memory
allocations are not under account.
There should not be lot of allocations w/o memory tracker, since most of
the time it is created early enough, but even this maybe enough to
trigger some problems.

Plus sometimes it is not possible to create it, for example some 3d
party library does not allow to do this explicitly:

for example before Take memory allocations from librdkafka threads into account #15740 allocations from librdkafka threads,
or even worse, poco threads, they don't have any routines to do this.
This won't be a problem for MemoryTracking metric if the deallocation
will be done from the same thread w/o memory tracker (or vise versa),
but this is not always true.

NOTE, that this will slow down per-thread allocations w/o memory
tracker, since before this patch there were no memory tracking for them
while now they will be accounted in total_memory_tracker, and for
total_memory_tracker max_untracked_memory is always reached.
But this should not be significant.

Changelog category (leave one):

Not for changelog (changelog entry is not required)

Details

HEAD:

7081b1047d50d403ae6ab4935f253107196464d9
990ad21b05254feae81341289e188d2d202b1ed1
43dd401ae5bd99ac954ad3d574c8f368c779b4f0 - worked, but problem in test
327d822b7f1b46b00d9f6167f389580a0ebbac0b - dump attempt to make the test pass

Fixes: #15932

src/Common/MemoryTracker.cpp

alexey-milovidov · 2020-10-21T00:52:32Z

I like the idea, I have also tried to make something like that but then found it difficult.

azat · 2020-10-21T20:10:28Z

I like the idea, I have also tried to make something like that but then found it difficult.

Ok, great (also it worth add a message when it is synced with RSS to see how it will differs later)

It still in draft due to gettid usage (that fact that performance does not shows the regression does not mean that there is no regression in this case, since most of the time total memory tracker is not used)

alexey-milovidov · 2020-10-21T20:55:48Z

You can also compare addresses of dummy thread_local variable (with the address of the same var from main thread).

alexey-milovidov · 2020-10-21T20:57:15Z

Also you can use atomic bool in constructor of thread-local variable and check it for non-zero.

azat · 2020-10-21T22:47:57Z

Manual testing shows significant improvements (in terms of accounting drift), for http/tcp queries storm, I guess mosty due to memory accounting from the poco thread pools (require some test, will add later)

azat · 2020-10-22T20:09:42Z

Added a test that checks that after 100 queries the MemoryTracking does not changed (even to 1 byte) more then 6% (6 times), and it passed now (while before the metric is changed, and actually grows in some cases to ~4MB, after each query)!

P.S. actually if there will be no background activity it will have the same MemoryTracking value as before running those queries

tests/queries/0_stateless/01540_MemoryTracking.sh

This should significantly reduce the MemoryTracking drift, test shows that there is 0 drift after query storm (100 queries, via http/tcp/tcp in one session). TL;DR; To track memory, clickhouse creates memory tracker object for each thread **explicitly**, but until it is not created the memory allocations are not under account. There should not be lot of allocations w/o memory tracker, since most of the time it is created early enough, but even this maybe enough to trigger some problems. Plus sometimes it is not possible to create it, for example some 3d party library does not allow to do this explicitly: - for example before ClickHouse#15740 allocations from librdkafka threads, - or even worse, poco threads, they don't have any routines to do this. This won't be a problem for `MemoryTracking` metric if the deallocation will be done from the same thread w/o memory tracker (or vise versa), but this is not always true. NOTE, that this will slow down per-thread allocations w/o memory tracker, since before this patch there were no memory tracking for them while now they will be accounted in total_memory_tracker, and for total_memory_tracker max_untracked_memory is always reached. But this should not be significant.

…ext::Thread MemoryTracker assumes that for VariableContext::Thread new/delete may be called from different threads, hence the amount of memory can go negative. However the MemoryTracker is nested, so even if the negative amount is allowed for VariableContext::Thread it does not allowed for anything upper, and hence the MemoryTracking will not be decremented properly. Fix this, by passing initial size to the parent free. This should fix memory drift for HTTP queries.

v2: disable query profiling and logging in 01540_MemoryTracking (This should make MemoryTracker drift zero).

…for_user)

01540_MemoryTracking is failing on CI for the following reasons: - log_queries (fixed, by adding log_queries=0) - profilers (fixed) - but what can't be fixed is metric_log and so on, so we need separate instance with separate configuration (sigh).

azat · 2020-10-23T23:31:00Z

So the problem was only in the test

Due to metric_log it has some difference, of course I could rewrite the test to allow some small delta, ~20K is enough, but I want ensure/show that after some actions the MemoryTracking does not changes at all.
So the stateless test was replaced with integration test (basically it is needed only to run instance of the server w/o system.*_log)

azat · 2020-10-24T12:29:08Z

So now looks OK (although there is minimal conflict in the fasttest skip list, it is pretty trivial so I will not solve it by myself to avoid trigerring CI)

alexey-milovidov · 2020-10-24T18:31:20Z

src/Common/MemoryTracker.cpp

        DB::TraceCollector::collect(DB::TraceType::MemorySample, StackTrace(), -size);
    }

+    Int64 accounted_size = size;


Don't understand this change.

src/Interpreters/ThreadStatusExt.cpp

This will avoid hiding some exceptions in logs, when the server is under high memory pressure (i.e. when any new allocation will lead to MEMORY_LIMIT_EXCEEDED error). This became more relevent after all memory allocations was tracked with MemoryTracker, by falling back to total_memory_tracking, in ClickHouse#16121

By comparing only megabytes in the memory changes, instead of bytes as before, since it may be tricky at least due to max_untracked_memory and how thread pool handle it. It should be safe, since originally it was written in ClickHouse#16121 which fixes issue ClickHouse#15932, which has ~4MB consumption of memory per request.

Refs: ClickHouse#16121 (comment) Signed-off-by: Azat Khuzhin <[email protected]>

robot-clickhouse added the pr-not-for-changelog This PR should not be mentioned in the changelog label Oct 18, 2020

azat marked this pull request as draft October 18, 2020 11:15

azat force-pushed the total_memory_tracker-by-default branch from 2bfd016 to 48171f9 Compare October 18, 2020 12:56

azat mentioned this pull request Oct 18, 2020

MemoryTracker wrong total: Memory limit (total) exceeded, but no real usage #15932

Closed

alexey-milovidov reviewed Oct 21, 2020

View reviewed changes

src/Common/MemoryTracker.cpp Outdated Show resolved Hide resolved

alexey-milovidov self-assigned this Oct 21, 2020

azat force-pushed the total_memory_tracker-by-default branch from 48171f9 to 7081b10 Compare October 21, 2020 22:42

azat force-pushed the total_memory_tracker-by-default branch from 7081b10 to 990ad21 Compare October 22, 2020 07:49

azat changed the title ~~[WIP/RFC] Track memory even if the MemoryTracker is not set yet.~~ Track memory even if the MemoryTracker is not set yet. Oct 22, 2020

azat changed the title ~~Track memory even if the MemoryTracker is not set yet.~~ Significantly decrease MemoryTracking drift Oct 22, 2020

azat force-pushed the total_memory_tracker-by-default branch from 990ad21 to 52d247f Compare October 22, 2020 20:01

azat marked this pull request as ready for review October 22, 2020 20:01

azat mentioned this pull request Oct 22, 2020

Fix accounting for new/delete from different threads for VariableContext::Thread #16122

Closed

azat force-pushed the total_memory_tracker-by-default branch from 52d247f to 43dd401 Compare October 22, 2020 21:19

akuzm reviewed Oct 23, 2020

View reviewed changes

tests/queries/0_stateless/01540_MemoryTracking.sh Outdated Show resolved Hide resolved

azat added 4 commits October 23, 2020 21:07

Fix parent memory tracker during query detaching

0cccf30

Add a test for MemoryTracking drift

6c42ad5

v2: disable query profiling and logging in 01540_MemoryTracking (This should make MemoryTracker drift zero).

azat force-pushed the total_memory_tracker-by-default branch from 327d822 to 5f09e86 Compare October 23, 2020 18:12

Add a test for memory drift in user memory tracker (max_memory_usage_…

3f594ed

…for_user)

azat force-pushed the total_memory_tracker-by-default branch from 5f09e86 to 3f594ed Compare October 23, 2020 19:14

Make 01540_MemoryTracking integration

6e5b04f

01540_MemoryTracking is failing on CI for the following reasons: - log_queries (fixed, by adding log_queries=0) - profilers (fixed) - but what can't be fixed is metric_log and so on, so we need separate instance with separate configuration (sigh).

azat added 2 commits October 24, 2020 02:33

Disable syncing MemoryTracking with RSS for test_MemoryTracking

96da5f6

Tune TTL of the background query in 01541_max_memory_usage_for_user

c3c6ac3

Merge branch 'master' into total_memory_tracker-by-default

e00f6c4

alexey-milovidov reviewed Oct 24, 2020

View reviewed changes

src/Interpreters/ThreadStatusExt.cpp Show resolved Hide resolved

Update ThreadStatusExt.cpp

34b9d15

alexey-milovidov approved these changes Oct 24, 2020

View reviewed changes

alexey-milovidov merged commit b193113 into ClickHouse:master Oct 24, 2020

azat deleted the total_memory_tracker-by-default branch October 24, 2020 22:49

This was referenced Oct 30, 2020

Process killed by Memory limit (total) exceeded #16537

Closed

Total memory tracker should account all server memory usage, not only queries. #10293

Closed

azat mentioned this pull request Jan 29, 2021

Avoid losing exception messages in logs under high memory pressure #19824

Merged

den-crane mentioned this pull request Apr 1, 2021

Frequent DB::Exception: Memory limit (total) exceeded while inserting #22437

Closed

azat mentioned this pull request Oct 13, 2021

Make test_MemoryTracking::test_http not flaky #30150

Merged

azat mentioned this pull request Apr 20, 2022

Memory limit may not work as expected if query has very large number of UNION ALL #22980

Closed

azat added a commit to azat/ClickHouse that referenced this pull request Apr 29, 2022

Remove outdated comment from ThreadStatusExt

16b1d2b

Refs: ClickHouse#16121 (comment) Signed-off-by: Azat Khuzhin <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significantly decrease MemoryTracking drift#16121

Significantly decrease MemoryTracking drift#16121
alexey-milovidov merged 10 commits intoClickHouse:masterfrom
azat:total_memory_tracker-by-default

azat commented Oct 18, 2020 •

edited

Loading

Uh oh!

Uh oh!

alexey-milovidov commented Oct 21, 2020

Uh oh!

azat commented Oct 21, 2020

Uh oh!

alexey-milovidov commented Oct 21, 2020 •

edited

Loading

Uh oh!

alexey-milovidov commented Oct 21, 2020

Uh oh!

azat commented Oct 21, 2020 •

edited

Loading

Uh oh!

azat commented Oct 22, 2020 •

edited

Loading

Uh oh!

Uh oh!

azat commented Oct 23, 2020

Uh oh!

azat commented Oct 24, 2020

Uh oh!

alexey-milovidov Oct 24, 2020

Uh oh!

alexey-milovidov Oct 24, 2020

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

azat commented Oct 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

alexey-milovidov commented Oct 21, 2020

Uh oh!

azat commented Oct 21, 2020

Uh oh!

alexey-milovidov commented Oct 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexey-milovidov commented Oct 21, 2020

Uh oh!

azat commented Oct 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

azat commented Oct 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

azat commented Oct 23, 2020

Uh oh!

azat commented Oct 24, 2020

Uh oh!

alexey-milovidov Oct 24, 2020

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov Oct 24, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

azat commented Oct 18, 2020 •

edited

Loading

alexey-milovidov commented Oct 21, 2020 •

edited

Loading

azat commented Oct 21, 2020 •

edited

Loading

azat commented Oct 22, 2020 •

edited

Loading