Significantly decrease MemoryTracking drift#16121
Significantly decrease MemoryTracking drift#16121alexey-milovidov merged 10 commits intoClickHouse:masterfrom
Conversation
2bfd016 to
48171f9
Compare
|
I like the idea, I have also tried to make something like that but then found it difficult. |
Ok, great (also it worth add a message when it is synced with RSS to see how it will differs later) It still in draft due to gettid usage (that fact that performance does not shows the regression does not mean that there is no regression in this case, since most of the time total memory tracker is not used) |
|
You can also compare addresses of dummy thread_local variable (with the address of the same var from main thread). |
|
Also you can use atomic bool in constructor of thread-local variable and check it for non-zero. |
48171f9 to
7081b10
Compare
|
Manual testing shows significant improvements (in terms of accounting drift), for http/tcp queries storm, I guess mosty due to memory accounting from the poco thread pools (require some test, will add later) |
7081b10 to
990ad21
Compare
990ad21 to
52d247f
Compare
|
Added a test that checks that after 100 queries the MemoryTracking does not changed (even to 1 byte) more then 6% (6 times), and it passed now (while before the metric is changed, and actually grows in some cases to ~4MB, after each query)! P.S. actually if there will be no background activity it will have the same MemoryTracking value as before running those queries |
52d247f to
43dd401
Compare
This should significantly reduce the MemoryTracking drift, test shows that there is 0 drift after query storm (100 queries, via http/tcp/tcp in one session). TL;DR; To track memory, clickhouse creates memory tracker object for each thread **explicitly**, but until it is not created the memory allocations are not under account. There should not be lot of allocations w/o memory tracker, since most of the time it is created early enough, but even this maybe enough to trigger some problems. Plus sometimes it is not possible to create it, for example some 3d party library does not allow to do this explicitly: - for example before ClickHouse#15740 allocations from librdkafka threads, - or even worse, poco threads, they don't have any routines to do this. This won't be a problem for `MemoryTracking` metric if the deallocation will be done from the same thread w/o memory tracker (or vise versa), but this is not always true. NOTE, that this will slow down per-thread allocations w/o memory tracker, since before this patch there were no memory tracking for them while now they will be accounted in total_memory_tracker, and for total_memory_tracker max_untracked_memory is always reached. But this should not be significant.
…ext::Thread MemoryTracker assumes that for VariableContext::Thread new/delete may be called from different threads, hence the amount of memory can go negative. However the MemoryTracker is nested, so even if the negative amount is allowed for VariableContext::Thread it does not allowed for anything upper, and hence the MemoryTracking will not be decremented properly. Fix this, by passing initial size to the parent free. This should fix memory drift for HTTP queries.
v2: disable query profiling and logging in 01540_MemoryTracking
(This should make MemoryTracker drift zero).
327d822 to
5f09e86
Compare
5f09e86 to
3f594ed
Compare
01540_MemoryTracking is failing on CI for the following reasons: - log_queries (fixed, by adding log_queries=0) - profilers (fixed) - but what can't be fixed is metric_log and so on, so we need separate instance with separate configuration (sigh).
|
So the problem was only in the test Due to metric_log it has some difference, of course I could rewrite the test to allow some small delta, ~20K is enough, but I want ensure/show that after some actions the MemoryTracking does not changes at all. |
|
So now looks OK (although there is minimal conflict in the fasttest skip list, it is pretty trivial so I will not solve it by myself to avoid trigerring CI) |
| DB::TraceCollector::collect(DB::TraceType::MemorySample, StackTrace(), -size); | ||
| } | ||
|
|
||
| Int64 accounted_size = size; |
There was a problem hiding this comment.
Don't understand this change.
This will avoid hiding some exceptions in logs, when the server is under high memory pressure (i.e. when any new allocation will lead to MEMORY_LIMIT_EXCEEDED error). This became more relevent after all memory allocations was tracked with MemoryTracker, by falling back to total_memory_tracking, in ClickHouse#16121
By comparing only megabytes in the memory changes, instead of bytes as before, since it may be tricky at least due to max_untracked_memory and how thread pool handle it. It should be safe, since originally it was written in ClickHouse#16121 which fixes issue ClickHouse#15932, which has ~4MB consumption of memory per request.
Refs: ClickHouse#16121 (comment) Signed-off-by: Azat Khuzhin <[email protected]>
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Detailed description / Documentation draft:
After this PR the memory drift is ~0 after running 100 queries via http/tcp/tcp from one session. (test)
TL;DR;
Details
To track memory, clickhouse creates memory tracker object for each
thread explicitly, but until it is not created the memory
allocations are not under account.
There should not be lot of allocations w/o memory tracker, since most of
the time it is created early enough, but even this maybe enough to
trigger some problems.
Plus sometimes it is not possible to create it, for example some 3d
party library does not allow to do this explicitly:
This won't be a problem for
MemoryTrackingmetric if the deallocationwill be done from the same thread w/o memory tracker (or vise versa),
but this is not always true.
NOTE, that this will slow down per-thread allocations w/o memory
tracker, since before this patch there were no memory tracking for them
while now they will be accounted in total_memory_tracker, and for
total_memory_tracker max_untracked_memory is always reached.
But this should not be significant.
Changelog category (leave one):
Details
HEAD:
Fixes: #15932