Skip to content

Significantly decrease MemoryTracking drift#16121

Merged
alexey-milovidov merged 10 commits intoClickHouse:masterfrom
azat:total_memory_tracker-by-default
Oct 24, 2020
Merged

Significantly decrease MemoryTracking drift#16121
alexey-milovidov merged 10 commits intoClickHouse:masterfrom
azat:total_memory_tracker-by-default

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Oct 18, 2020

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Detailed description / Documentation draft:
After this PR the memory drift is ~0 after running 100 queries via http/tcp/tcp from one session. (test)

TL;DR;

Details

To track memory, clickhouse creates memory tracker object for each
thread explicitly, but until it is not created the memory
allocations are not under account.
There should not be lot of allocations w/o memory tracker, since most of
the time it is created early enough, but even this maybe enough to
trigger some problems.

Plus sometimes it is not possible to create it, for example some 3d
party library does not allow to do this explicitly:

  • for example before Take memory allocations from librdkafka threads into account #15740 allocations from librdkafka threads,
  • or even worse, poco threads, they don't have any routines to do this.
    This won't be a problem for MemoryTracking metric if the deallocation
    will be done from the same thread w/o memory tracker (or vise versa),
    but this is not always true.

NOTE, that this will slow down per-thread allocations w/o memory
tracker, since before this patch there were no memory tracking for them
while now they will be accounted in total_memory_tracker, and for
total_memory_tracker max_untracked_memory is always reached.
But this should not be significant.

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)
Details

HEAD:

  • 7081b1047d50d403ae6ab4935f253107196464d9
  • 990ad21b05254feae81341289e188d2d202b1ed1
  • 43dd401ae5bd99ac954ad3d574c8f368c779b4f0 - worked, but problem in test
  • 327d822b7f1b46b00d9f6167f389580a0ebbac0b - dump attempt to make the test pass

Fixes: #15932

@robot-clickhouse robot-clickhouse added the pr-not-for-changelog This PR should not be mentioned in the changelog label Oct 18, 2020
@azat azat marked this pull request as draft October 18, 2020 11:15
@azat azat force-pushed the total_memory_tracker-by-default branch from 2bfd016 to 48171f9 Compare October 18, 2020 12:56
@alexey-milovidov
Copy link
Copy Markdown
Member

I like the idea, I have also tried to make something like that but then found it difficult.

@alexey-milovidov alexey-milovidov self-assigned this Oct 21, 2020
@azat
Copy link
Copy Markdown
Member Author

azat commented Oct 21, 2020

I like the idea, I have also tried to make something like that but then found it difficult.

Ok, great (also it worth add a message when it is synced with RSS to see how it will differs later)

It still in draft due to gettid usage (that fact that performance does not shows the regression does not mean that there is no regression in this case, since most of the time total memory tracker is not used)

@alexey-milovidov
Copy link
Copy Markdown
Member

alexey-milovidov commented Oct 21, 2020

You can also compare addresses of dummy thread_local variable (with the address of the same var from main thread).

@alexey-milovidov
Copy link
Copy Markdown
Member

Also you can use atomic bool in constructor of thread-local variable and check it for non-zero.

@azat azat force-pushed the total_memory_tracker-by-default branch from 48171f9 to 7081b10 Compare October 21, 2020 22:42
@azat
Copy link
Copy Markdown
Member Author

azat commented Oct 21, 2020

Manual testing shows significant improvements (in terms of accounting drift), for http/tcp queries storm, I guess mosty due to memory accounting from the poco thread pools (require some test, will add later)

@azat azat force-pushed the total_memory_tracker-by-default branch from 7081b10 to 990ad21 Compare October 22, 2020 07:49
@azat azat changed the title [WIP/RFC] Track memory even if the MemoryTracker is not set yet. Track memory even if the MemoryTracker is not set yet. Oct 22, 2020
@azat azat changed the title Track memory even if the MemoryTracker is not set yet. Significantly decrease MemoryTracking drift Oct 22, 2020
@azat azat force-pushed the total_memory_tracker-by-default branch from 990ad21 to 52d247f Compare October 22, 2020 20:01
@azat azat marked this pull request as ready for review October 22, 2020 20:01
@azat
Copy link
Copy Markdown
Member Author

azat commented Oct 22, 2020

Added a test that checks that after 100 queries the MemoryTracking does not changed (even to 1 byte) more then 6% (6 times), and it passed now (while before the metric is changed, and actually grows in some cases to ~4MB, after each query)!

P.S. actually if there will be no background activity it will have the same MemoryTracking value as before running those queries

@azat azat force-pushed the total_memory_tracker-by-default branch from 52d247f to 43dd401 Compare October 22, 2020 21:19
azat added 4 commits October 23, 2020 21:07
This should significantly reduce the MemoryTracking drift, test shows
that there is 0 drift after query storm (100 queries, via http/tcp/tcp
in one session).

TL;DR;

To track memory, clickhouse creates memory tracker object for each
thread **explicitly**, but until it is not created the memory
allocations are not under account.
There should not be lot of allocations w/o memory tracker, since most of
the time it is created early enough, but even this maybe enough to
trigger some problems.

Plus sometimes it is not possible to create it, for example some 3d
party library does not allow to do this explicitly:
- for example before ClickHouse#15740 allocations from librdkafka threads,
- or even worse, poco threads, they don't have any routines to do this.
This won't be a problem for `MemoryTracking` metric if the deallocation
will be done from the same thread w/o memory tracker (or vise versa),
but this is not always true.

NOTE, that this will slow down per-thread allocations w/o memory
tracker, since before this patch there were no memory tracking for them
while now they will be accounted in total_memory_tracker, and for
total_memory_tracker max_untracked_memory is always reached.
But this should not be significant.
…ext::Thread

MemoryTracker assumes that for VariableContext::Thread new/delete may be
called from different threads, hence the amount of memory can go
negative.

However the MemoryTracker is nested, so even if the negative amount is
allowed for VariableContext::Thread it does not allowed for anything
upper, and hence the MemoryTracking will not be decremented properly.

Fix this, by passing initial size to the parent free.

This should fix memory drift for HTTP queries.
v2: disable query profiling and logging in 01540_MemoryTracking
    (This should make MemoryTracker drift zero).
@azat azat force-pushed the total_memory_tracker-by-default branch from 327d822 to 5f09e86 Compare October 23, 2020 18:12
@azat azat force-pushed the total_memory_tracker-by-default branch from 5f09e86 to 3f594ed Compare October 23, 2020 19:14
01540_MemoryTracking is failing on CI for the following reasons:
- log_queries (fixed, by adding log_queries=0)
- profilers (fixed)
- but what can't be fixed is metric_log and so on, so we need separate
  instance with separate configuration (sigh).
@azat
Copy link
Copy Markdown
Member Author

azat commented Oct 23, 2020

So the problem was only in the test

Due to metric_log it has some difference, of course I could rewrite the test to allow some small delta, ~20K is enough, but I want ensure/show that after some actions the MemoryTracking does not changes at all.
So the stateless test was replaced with integration test (basically it is needed only to run instance of the server w/o system.*_log)

@azat
Copy link
Copy Markdown
Member Author

azat commented Oct 24, 2020

So now looks OK (although there is minimal conflict in the fasttest skip list, it is pretty trivial so I will not solve it by myself to avoid trigerring CI)

DB::TraceCollector::collect(DB::TraceType::MemorySample, StackTrace(), -size);
}

Int64 accounted_size = size;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't understand this change.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@alexey-milovidov alexey-milovidov merged commit b193113 into ClickHouse:master Oct 24, 2020
@azat azat deleted the total_memory_tracker-by-default branch October 24, 2020 22:49
azat added a commit to azat/ClickHouse that referenced this pull request Jan 29, 2021
This will avoid hiding some exceptions in logs, when the server is under
high memory pressure (i.e. when any new allocation will lead to
MEMORY_LIMIT_EXCEEDED error).

This became more relevent after all memory allocations was tracked with
MemoryTracker, by falling back to total_memory_tracking, in ClickHouse#16121
azat added a commit to azat/ClickHouse that referenced this pull request Oct 14, 2021
By comparing only megabytes in the memory changes, instead of bytes as
before, since it may be tricky at least due to max_untracked_memory and
how thread pool handle it.

It should be safe, since originally it was written in ClickHouse#16121 which fixes
issue ClickHouse#15932, which has ~4MB consumption of memory per request.
azat added a commit to azat/ClickHouse that referenced this pull request Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MemoryTracker wrong total: Memory limit (total) exceeded, but no real usage

4 participants