[RFC] Intercept malloc/free and other C-style functions for memory management#84082
[RFC] Intercept malloc/free and other C-style functions for memory management#84082nikitamikhaylov merged 24 commits intomasterfrom
malloc/free and other C-style functions for memory management#84082Conversation
|
Workflow [PR], commit [378ece1] Summary: ❌
|
d5f7c10 to
9e2c239
Compare
cc40257 to
ea664f6
Compare
There was a problem hiding this comment.
Need to use __real_* in src/Common/FiberStack.cpp as well.
It will be interesting to look at performance results (one of issues that could arise is that we will call MemoryTracker::allocImpl on each mallocs, due to lack of ThreadStatus somewhere in third party threads, i.e. librdkafka or possible delta-kernel-rs, and we don't have perf tests for this, so we need to do some manual testing)
Also I'm not sure that it is OK to do this for sanitizers, but CI will show...
And we need to make 100% sure that there will be no double accounting
6464fa6 to
bb3c759
Compare
bb3c759 to
88e5fd5
Compare
Funny, Kafka storage already creates But, delta-kernel-rs can be affected, not sure how it is used, but, if it creates threads and do allocations from it, they may became slower. Can you verify this? |
|
Also, let's add a unit test for malloc/free and ensure that |
|
About Current PR: On master: Honestly I don't see much diff here. Yes it is visible, but it's kind of difficult to estimate the impact. Looks Ok for me. |
In a newer version with this sampling it shows that the overhead is 4.5%, what about the query execution times? |
c0b661a to
d3f3a5f
Compare
|
Stress test (msan): #83988 |
cf60107 to
bb1ede1
Compare
|
Stateless tests (amd_asan, distributed plan): #84577 |
azat
left a comment
There was a problem hiding this comment.
Apart from missing NULL handling in the *alloc functions, LGTM
|
Stateless tests (amd_asan, distributed plan, sequential), Stateless tests (amd_binary, ParallelReplicas, s3 storage, sequential): |
19dc7e4 to
378ece1
Compare
|
Stress tests - |
This is an open wound in ClickHouse's memory management as all the allocations from external libraries through
malloc/calloc/posix_memalignare not visible for the ClickHouse memory tracker.There were previous attempt to fix it for the most annoying places, for example: #27252 and #63584, however the solution is not ideal and there are still some problems: #83165
This PR tries to utilize the linker's
--wrapcommand that replaces the symbols at link time. It seems working, but I need to clarify certain things.Cc @azat @alexey-milovidov
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
All the allocations done by external libraries are now visible to ClickHouse's memory tracker and accounted properly. This may result in "increased" reported memory usage for certain queries or failures with
MEMORY_LIMIT_EXCEEDED.Documentation entry for user-facing changes