Skip to content

Flush jemalloc profile in CI#85139

Merged
antonio2368 merged 7 commits intomasterfrom
flush-memory-profiles-in-ciu
Aug 11, 2025
Merged

Flush jemalloc profile in CI#85139
antonio2368 merged 7 commits intomasterfrom
flush-memory-profiles-in-ciu

Conversation

@antonio2368
Copy link
Copy Markdown
Member

@antonio2368 antonio2368 commented Aug 6, 2025

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

  • try creating flamegraph with jeprof

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Fixes: #82035

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Aug 6, 2025

Workflow [PR], commit [2d01463]

Summary:

job_name test_name status info comment
Stateless tests (amd_msan, parallel, 2/2) failure
02765_queries_with_subqueries_profile_events FAIL

@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Aug 6, 2025
@antonio2368 antonio2368 force-pushed the flush-memory-profiles-in-ciu branch from 5bf4fbb to 9624673 Compare August 7, 2025 13:11
@antonio2368 antonio2368 marked this pull request as ready for review August 8, 2025 09:26
@antonio2368
Copy link
Copy Markdown
Member Author

It works, we can improve later on with flush timings (e.g. enable lg_prof_interval or with some step from MemoryTracker)
Currently I create svg and txt from latest heap profile of each PID.
PDF requires some extra tools which I would rather add in a different PR to avoid rebuilding docker images in this one.

@antonio2368 antonio2368 requested a review from azat August 8, 2025 09:28
Copy link
Copy Markdown
Member

@azat azat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that I like this two new server settings, but I guess it is OK for now, PTAL at the comments

Also do you have an example of profiles that has been gathered on CI?

@antonio2368
Copy link
Copy Markdown
Member Author

antonio2368 commented Aug 8, 2025

I am not sure that I like this two new server settings, but I guess it is OK for now, PTAL at the comments

What would be better if I want it only in CI. Also, I think it could be useful when debugging some instance instead of relying on lg_prof_interval because we can focus on peaks instead of collecting bunch of profiles for non interesting periods.

Also do you have an example of profiles that has been gathered on CI?

Pick any stateless check with release or debug build
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=85139&sha=latest&name_0=PR&name_1=Stateless+tests+%28amd_binary%2C+old+analyzer%2C+s3+storage%2C+DatabaseReplicated%2C+parallel%29

@antonio2368 antonio2368 requested a review from azat August 8, 2025 11:53
@antonio2368 antonio2368 added this pull request to the merge queue Aug 11, 2025
Merged via the queue into master with commit 19a4498 Aug 11, 2025
123 of 124 checks passed
@antonio2368 antonio2368 deleted the flush-memory-profiles-in-ciu branch August 11, 2025 08:06
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Aug 11, 2025
azat added a commit to azat/ClickHouse that referenced this pull request Sep 1, 2025
After ClickHouse#85139 the deadlock became possible again memory allocations:

    1  0x00007f4e4c997002 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libc.so.6
    2  0x000055b5a697daaf in pthread_mutex_lock (arg=0x7f4c30020590) at ./ci/tmp/build/./src/Common/ThreadFuzzer.cpp:447
    3  0x000055b5ab409b93 in Poco::MutexImpl::lockImpl (this=0x80) at ./base/poco/Foundation/include/Poco/Mutex_POSIX.h:60
    4  Poco::FastMutex::lock (this=0x80) at ./base/poco/Foundation/include/Poco/Mutex.h:229
    5  Poco::ScopedLock<Poco::FastMutex>::ScopedLock (this=<optimized out>, mutex=...) at ./base/poco/Foundation/include/Poco/ScopedLock.h:37
    6  0x000055b5b65e9b05 in Poco::Thread::name (this=0x7f4c300204e8) at ./base/poco/Foundation/include/Poco/Thread.h:267
    7  Poco::Message::init (this=this@entry=0x7f4c2315c650) at ./ci/tmp/build/./base/poco/Foundation/src/Message.cpp:127
    8  0x000055b5b65e9e66 in Poco::Message::Message (this=0x7f4c2315c650, source=..., text=..., prio=<optimized out>, file=..., line=71, fmt_str=..., fmt_str_args=...) at ./ci/tmp/build/./base/poco/Foundation/src/Message.cpp:61
    9  0x000055b5a694d236 in DB::flushJemallocProfile (file_prefix=...) at ./ci/tmp/build/./src/Common/Jemalloc.cpp:71
    10 0x000055b5a694680c in MemoryTracker::allocImpl (this=<optimized out>, size=32, throw_if_memory_exceeded=<optimized out>, query_tracker=<optimized out>, _sample_probability=<optimized out>) at ./ci/tmp/build/./src/Common/MemoryTracker.cpp:338
    11 0x000055b5a6892064 in trackMemory<> (size=25, trace=...) at ./src/Common/memory.h:134
    12 operator new (size=25) at ./ci/tmp/build/./src/Common/AllocationInterceptors.cpp:55
    20 0x000055b5b65e9b36 in std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::basic_string (this=0x7f4c2315cbb0, __str=...) at ./contrib/llvm-project/libcxx/include/string:1005
    21 Poco::Thread::name (this=0x7f4c300204e8) at ./base/poco/Foundation/include/Poco/Thread.h:269
    22 Poco::Message::init (this=this@entry=0x7f4c2315ccc8) at ./ci/tmp/build/./base/poco/Foundation/src/Message.cpp:127
    23 0x000055b5b65e9e66 in Poco::Message::Message (this=0x7f4c2315ccc8, source=..., text=..., prio=<optimized out>, file=..., line=49, fmt_str=..., fmt_str_args=...) at ./ci/tmp/build/./base/poco/Foundation/src/Message.cpp:61
    24 0x000055b5a6a0cf93 in ServerErrorHandler::logMessageImpl (this=<optimized out>, priority=<optimized out>, msg=...) at ./src/Common/ErrorHandlers.h:49
    25 0x000055b5b65b252c in Poco::ErrorHandler::logMessage (priority=Poco::Message::PRIO_TEST, msg=...) at ./ci/tmp/build/./base/poco/Foundation/src/ErrorHandler.cpp:97
    26 0x000055b5b666eab4 in Poco::Net::TCPServerDispatcher::enqueue (this=0x7f4c31b3e000, socket=...) at ./ci/tmp/build/./base/poco/Net/src/TCPServerDispatcher.cpp:146
    27 0x000055b5b666d94c in Poco::Net::TCPServer::run (this=0x7f4c300204c0) at ./ci/tmp/build/./base/poco/Net/src/TCPServer.cpp:148
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flush jemalloc profile on MEMORY_LIMIT_EXCEEDED on CI

3 participants