Skip to content

Fix data-races in client, session_timezone overrides and some data-races in some corner cases#82444

Merged
azat merged 10 commits intoClickHouse:masterfrom
azat:client-fix-data-races
Jun 30, 2025
Merged

Fix data-races in client, session_timezone overrides and some data-races in some corner cases#82444
azat merged 10 commits intoClickHouse:masterfrom
azat:client-fix-data-races

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Jun 23, 2025

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix data-races in client (by not using global context) and session_timezone overrides (previously in case of session_timezone was set in i.e. users.xml/client options to non empty and in query context to empty, then, value from users.xml was used, while this is wrong, now query context will always have a priority over global context)

The problem is that client calls setSettings() in the global settings, but this produce a data-race with accessing global settings all over the places 1:

  • ThreadStatus::finalizePerformanceCounters()
  • ThreadStatus::initGlobalProfiler()

But since now we do not set global context anymore, we need to use set proper query context for thread, to make session_timeout work after i.e. SET session_timeout='UTC' or SELECT now() SETTINGS session_timeout='UTC'

Also after client stopped using global context, one more issue pops up - incorrect handling of session_timezone, for client it was always OK, since client always adjusted global context before, but for server, if query context overrides it to empty value, while it was non-empty in i.e. users.xml, the value from global context was used.

And also various other UBs in some corner cases had been fixed that had been found during running CI on this PR

This is actually a proper fix over #81759, in addition to incomplete #82233

Fixes: #81893

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jun 23, 2025

Workflow [PR], commit [362026f]

Summary:

job_name test_name status info comment
Stateless tests (amd_debug) failure
00980_merge_alter_settings FAIL
Bugfix validation (functional tests) failure
Integration tests (tsan, 6/6) failure
test_storage_rabbitmq/test.py::test_rabbitmq_sharding_between_queues_publish FAIL

@azat azat changed the title Do not use global context as main context for the client Fix data-race in client by using separate context from global Jun 23, 2025
@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Jun 23, 2025
@azat azat added the 🍃 green ci 🌿 Fixing flaky tests in CI label Jun 23, 2025
@azat azat force-pushed the client-fix-data-races branch from 978a0ac to 2accd50 Compare June 23, 2025 19:36
@azat azat marked this pull request as draft June 23, 2025 21:05
@azat azat marked this pull request as ready for review June 24, 2025 12:44
@azat azat force-pushed the client-fix-data-races branch from d45764f to 434c949 Compare June 24, 2025 14:45
@azat azat changed the title Fix data-race in client by using separate context from global Fix data-races in client and session_timezone overrides Jun 24, 2025
@azat azat force-pushed the client-fix-data-races branch from 434c949 to 526becd Compare June 24, 2025 15:36
@Algunenano Algunenano self-assigned this Jun 24, 2025
Copy link
Copy Markdown
Member

@Algunenano Algunenano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

@azat azat force-pushed the client-fix-data-races branch from 4499c45 to 4d969ed Compare June 24, 2025 19:52
@azat
Copy link
Copy Markdown
Member Author

azat commented Jun 24, 2025

Stress test (amd_tsan) — Sanitizer assert (in stderr.log)

Stress test (amd_debug) — Hung check failed, possible deadlock found (see hung_check.log)
Stress test (amd_ubsan) — Hung check failed, possible deadlock found (see hung_check.log)

I cannot find the reason, and it may be related (this changes broke some obscure places in the past), so need to take a deeper look, will fix some places in CI to get stacktraces

@azat azat marked this pull request as draft June 28, 2025 12:17
@azat
Copy link
Copy Markdown
Member Author

azat commented Jun 28, 2025

Stress test (amd_ubsan) — Hung check failed, possible deadlock found (see hung_check.log)

Finally, after all debugging that I've added, I found it!

Details
Thread 2 (Thread 0x7efe0c3e0640 (LWP 293871) "clickhouse-clie"):
#0  0x00007efe0cb9081c in read () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x0000558d24646b2f in DB::ReadBufferFromFileDescriptor::readImpl (this=this@entry=0x7efe0c3d8710, to=0x7efe0c3d80b0 "", min_bytes=min_bytes@entry=1, max_bytes=536, offset=0) at ./ci/tmp/build/./src/IO/ReadBufferFromFileDescriptor.cpp:73
#2  0x0000558d2464725e in DB::ReadBufferFromFileDescriptor::nextImpl (this=0x7efe0c3d8710) at ./ci/tmp/build/./src/IO/ReadBufferFromFileDescriptor.cpp:122
#3  0x0000558d1658bd32 in DB::ReadBuffer::next() ()
#4  0x0000558d24b304c4 in DB::ReadBuffer::eof (this=0x7efe0c3d8710) at ./src/IO/ReadBuffer.h:96
#5  SignalListener::run (this=<optimized out>) at ./ci/tmp/build/./src/Common/SignalHandlers.cpp:309
#6  0x0000558d371ed02e in Poco::ThreadImpl::runnableEntry (pThread=<optimized out>) at ./base/poco/Foundation/src/Thread_POSIX.cpp:335
#7  0x00007efe0cb10ac3 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#8  0x00007efe0cba2850 in ?? () from /lib/x86_64-linux-gnu/libc.so.6

Thread 1 (Thread 0x7efe0c990b80 (LWP 293870) "clickhouse-clie"):
#0  0x00007efe0cb0d2c0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007efe0cb14002 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x0000558d246086d6 in pthread_mutex_lock (arg=0x558d45420940 <(anonymous namespace)::getLoggerMutex()::$_0::operator()() const::buffer>) at ./ci/tmp/build/./src/Common/ThreadFuzzer.cpp:446
#3  0x0000558d3a6db189 in std::__1::__libcpp_mutex_lock[abi:ne190107](pthread_mutex_t*) (__m=0x558d45420940 <(anonymous namespace)::getLoggerMutex()::$_0::operator()() const::buffer>) at ./contrib/llvm-project/libcxx/include/__thread/support/pthread.h:95
#4  std::__1::mutex::lock (this=0x558d45420940 <(anonymous namespace)::getLoggerMutex()::$_0::operator()() const::buffer>) at ./ci/tmp/build/./contrib/llvm-project/libcxx/src/mutex.cpp:29
#5  0x0000558d371a9df5 in std::__1::lock_guard<std::__1::mutex>::lock_guard[abi:ne190107](std::__1::mutex&) (this=0x7ffdcc075d00, __m=...) at ./contrib/llvm-project/libcxx/include/__mutex/lock_guard.h:33
#6  Poco::Logger::getShared (name=..., should_be_owned_by_shared_ptr_if_created=false) at ./ci/tmp/build/./base/poco/Foundation/src/Logger.cpp:355
#7  0x0000558d245bbfb4 in getLogger<14ul> (name=...) at ./src/Common/Logger.h:41
#8  MemoryTracker::debugLogBigAllocationWithoutCheck (this=<optimized out>, size=21580033) at ./ci/tmp/build/./src/Common/MemoryTracker.cpp:222
#9  0x0000558d245bd0a7 in MemoryTracker::allocImpl (this=0x558d45270360 <DB::MainThreadStatus::getInstance()::thread_status+80>, size=21580033, throw_if_memory_exceeded=<optimized out>, query_tracker=0x0, _sample_probability=-1) at ./ci/tmp/build/./src/Common/MemoryTracker.cpp:400
#10 0x0000558d2460aab3 in DB::ThreadStatus::flushUntrackedMemory (this=0x558d45270310 <DB::MainThreadStatus::getInstance()::thread_status>) at ./ci/tmp/build/./src/Common/ThreadStatus.cpp:245
#11 0x0000558d24480b1e in CurrentMemoryTracker::allocImpl (size=64, throw_if_memory_exceeded=false) at ./ci/tmp/build/./src/Common/CurrentMemoryTracker.cpp:67
#12 0x0000558d2447c4b9 in trackMemory<> (size=64, trace=...) at ./src/Common/memory.h:210
#13 operator new (size=94065240705344) at ./ci/tmp/build/./src/Common/new_delete.cpp:53
#14 0x0000558d371a9b67 in Poco::Logger::unsafeGet (name=..., get_shared=true) at ./ci/tmp/build/./base/poco/Foundation/src/Logger.cpp:396
#15 0x0000558d371a9e02 in Poco::Logger::getShared (name=..., should_be_owned_by_shared_ptr_if_created=false) at ./ci/tmp/build/./base/poco/Foundation/src/Logger.cpp:356
#16 0x0000558d245bbfb4 in getLogger<14ul> (name=...) at ./src/Common/Logger.h:41
#17 MemoryTracker::debugLogBigAllocationWithoutCheck (this=<optimized out>, size=21580033) at ./ci/tmp/build/./src/Common/MemoryTracker.cpp:222
#18 0x0000558d245bd0a7 in MemoryTracker::allocImpl (this=0x558d45270360 <DB::MainThreadStatus::getInstance()::thread_status+80>, size=21580033, throw_if_memory_exceeded=<optimized out>, query_tracker=0x0, _sample_probability=-1) at ./ci/tmp/build/./src/Common/MemoryTracker.cpp:400
#19 0x0000558d2460aab3 in DB::ThreadStatus::flushUntrackedMemory (this=0x558d45270310 <DB::MainThreadStatus::getInstance()::thread_status>) at ./ci/tmp/build/./src/Common/ThreadStatus.cpp:245
#20 0x0000558d2f8785c1 in DB::ThreadStatus::detachFromGroup (this=0x558d45270310 <DB::MainThreadStatus::getInstance()::thread_status>) at ./ci/tmp/build/./src/Interpreters/ThreadStatusExt.cpp:372
#21 0x0000558d2f87be42 in DB::CurrentThread::detachFromGroupIfNotDetached () at ./ci/tmp/build/./src/Interpreters/ThreadStatusExt.cpp:716
#22 DB::CurrentThread::QueryScope::~QueryScope (this=<optimized out>) at ./ci/tmp/build/./src/Interpreters/ThreadStatusExt.cpp:755
#23 0x0000558d24868a98 in std::__1::__optional_destruct_base<DB::CurrentThread::QueryScope, false>::~__optional_destruct_base[abi:ne190107]() (this=0xfffffffffffffe00) at ./contrib/llvm-project/libcxx/include/optional:293
#24 DB::Client::~Client (this=0x7ffdcc076ce0) at ./ci/tmp/build/./programs/client/Client.cpp:73
#25 0x0000558d24883ac4 in mainEntryClickHouseClient (argc=227, argv=<optimized out>) at ./ci/tmp/build/./programs/client/Client.cpp:1140
#26 0x0000558d163ae1e1 in main (argc_=<optimized out>, argv_=<optimized out>) at ./ci/tmp/build/./programs/main.cpp:338
[Inferior 1 (process 293870) detached]

Process group 293869 should be killed

Stateless tests (amd_tsan, s3 storage, 2/3)

And also one more data race in client reproduced in the 02561_null_as_default_more_formats/02435_rollback_cancelled_queries tests, because of which likely expect test fails, but I don't have the other part of the race...

Because we have deadlock for TSan reports!

UPD: should be fixed by 362026f

Details
2025-06-28 19:55:18 #0  0x000055add6e78d7a in __sanitizer::FutexWait(__sanitizer::atomic_uint32_t*, unsigned int) ()
2025-06-28 19:55:18 #1  0x000055add6e796aa in __sanitizer::Semaphore::Wait() ()
2025-06-28 19:55:18 #2  0x000055add6f00ec0 in __tsan::TraceSwitchPartImpl(__tsan::ThreadState*) ()
2025-06-28 19:55:18 #3  0x000055add6f03062 in __tsan::TraceRestartFuncEntry(__tsan::ThreadState*, unsigned long) ()
2025-06-28 19:55:18 #4  0x000055add6e95183 in __tsan::ScopedInterceptor::ScopedInterceptor(__tsan::ThreadState*, char const*, unsigned long) ()
2025-06-28 19:55:18 #5  0x000055add6ecd38a in __interceptor_pthread_setcanceltype ()
2025-06-28 19:55:18 #6  0x000055adf620cd49 in clock_nanosleep (clk=1, flags=1, req=0x7ffc4aae2958, rem=0x0) at ./ci/tmp/build/./base/glibc-compatibility/musl/clock_nanosleep.c:22
2025-06-28 19:55:18 #7  0x000055adf1583434 in sleepForNanoseconds (nanoseconds=<optimized out>) at ./ci/tmp/build/./base/base/sleep.cpp:50
2025-06-28 19:55:18 #8  0x000055adf158357e in sleepForMicroseconds (microseconds=1000000) at ./ci/tmp/build/./base/base/sleep.cpp:56
2025-06-28 19:55:18 #9  sleepForMilliseconds (milliseconds=1000) at ./ci/tmp/build/./base/base/sleep.cpp:61
2025-06-28 19:55:18 #10 sleepForSeconds (seconds=seconds@entry=1) at ./ci/tmp/build/./base/base/sleep.cpp:66
2025-06-28 19:55:18 #11 0x000055ade0ecfc5a in signalHandler (sig=6, info=<optimized out>, context=<optimized out>) at ./ci/tmp/build/./src/Common/SignalHandlers.cpp:134
2025-06-28 19:55:18 #12 0x000055add6ea07a6 in __tsan::CallUserSignalHandler(__tsan::ThreadState*, bool, bool, int, __sanitizer::__sanitizer_siginfo*, void*) ()
2025-06-28 19:55:18 #13 0x000055add6ea0cfb in sighandler(int, __sanitizer::__sanitizer_siginfo*, void*) ()
2025-06-28 19:55:18 #14 <signal handler called>
2025-06-28 19:55:18 #15 0x00007feecd7fd9fc in pthread_kill () from /lib/x86_64-linux-gnu/libc.so.6
2025-06-28 19:55:18 #16 0x00007feecd7a9476 in raise () from /lib/x86_64-linux-gnu/libc.so.6
2025-06-28 19:55:18 #17 0x00007feecd78f7f3 in abort () from /lib/x86_64-linux-gnu/libc.so.6
2025-06-28 19:55:18 #18 0x000055add6e9f267 in __interceptor_abort ()
2025-06-28 19:55:18 #19 0x000055add6e8192c in __sanitizer::Abort() ()
2025-06-28 19:55:18 #20 0x000055add6e801be in __sanitizer::Die() ()
2025-06-28 19:55:18 #21 0x000055add6f119ae in __tsan::OutputReport(__tsan::ThreadState*, __tsan::ScopedReport const&) ()
2025-06-28 19:55:18 #22 0x000055add6f12b48 in __tsan::ReportRace(__tsan::ThreadState*, __tsan::RawShadow*, __tsan::Shadow, __tsan::Shadow, unsigned long) ()
2025-06-28 19:55:18 #23 0x000055add6e941fd in __tsan_memcpy ()
2025-06-28 19:55:18 #24 0x000055ade756a37a in DB::SettingsTraits::Data::operator= (this=this@entry=0x729800006008) at ./ci/tmp/build/./src/Core/Settings.cpp:6975
2025-06-28 19:55:18 #25 0x000055ade73aef0d in DB::BaseSettings<DB::SettingsTraits>::operator= (this=0x729800006000) at ./src/Core/BaseSettings.h:109
2025-06-28 19:55:18 #26 DB::SettingsImpl::operator= (this=0x729800006000) at ./ci/tmp/build/./src/Core/Settings.cpp:6981
2025-06-28 19:55:18 #27 DB::Settings::operator= (this=0x7204000011e0, other=...) at ./ci/tmp/build/./src/Core/Settings.cpp:7188
2025-06-28 19:55:18 #28 0x000055ade8d4facb in DB::Context::setSettings (this=0x727800003c00, settings_=...) at ./ci/tmp/build/./src/Interpreters/Context.cpp:2634
2025-06-28 19:55:18 #29 0x000055adec1c9b18 in DB::ClientBase::processParsedSingleQuery(std::__1::basic_string_view<char, std::__1::char_traits<char> >, std::__1::shared_ptr<DB::IAST>, bool&, unsigned long)::$_0::operator()() const (this=0x7ffc4aae51a8) at ./ci/tmp/build/./src/Client/ClientBase.cpp:2184
2025-06-28 19:55:18 #30 BasicScopeGuard<DB::ClientBase::processParsedSingleQuery(std::__1::basic_string_view<char, std::__1::char_traits<char> >, std::__1::shared_ptr<DB::IAST>, bool&, unsigned long)::$_0>::invoke (this=0x7ffc4aae51a8) at ./base/base/../base/scope_guard.h:101
2025-06-28 19:55:18 #31 BasicScopeGuard<DB::ClientBase::processParsedSingleQuery(std::__1::basic_string_view<char, std::__1::char_traits<char> >, std::__1::shared_ptr<DB::IAST>, bool&, unsigned long)::$_0>::~BasicScopeGuard (this=this@entry=0x7ffc4aae51a8) at ./base/base/../base/scope_guard.h:50
2025-06-28 19:55:18 #32 0x000055adec1bdf11 in DB::ClientBase::processParsedSingleQuery (this=this@entry=0x7ffc4aae57e0, query_=..., parsed_query=..., is_async_insert_with_inlined_data=@0x7ffc4aae542f: false, insert_query_without_data_length=18446618574765148096) at ./ci/tmp/build/./src/Client/ClientBase.cpp:2249
2025-06-28 19:55:18 #33 0x000055adec1cb0df in DB::ClientBase::executeMultiQuery (this=this@entry=0x7ffc4aae57e0, all_queries_text=...) at ./ci/tmp/build/./src/Client/ClientBase.cpp:2620
2025-06-28 19:55:18 #34 0x000055adec1cc827 in DB::ClientBase::processQueryText (this=this@entry=0x7ffc4aae57e0, text=...) at ./ci/tmp/build/./src/Client/ClientBase.cpp:2814
2025-06-28 19:55:18 #35 0x000055adec1d6f8b in DB::ClientBase::runNonInteractive (this=0x7ffc4aae57e0) at ./ci/tmp/build/./src/Client/ClientBase.cpp:3454
2025-06-28 19:55:18 #36 0x000055ade0ca051c in DB::Client::main (this=this@entry=0x7ffc4aae57e0) at ./ci/tmp/build/./programs/client/Client.cpp:403
2025-06-28 19:55:18 #37 0x000055ade0ca0c61 in non-virtual thunk to DB::Client::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) ()
2025-06-28 19:55:18 #38 0x000055adf16df37f in Poco::Util::Application::run (this=0x7ffc4aae6118) at ./ci/tmp/build/./base/poco/Util/src/Application.cpp:315
2025-06-28 19:55:18 #39 0x000055ade0caea0c in mainEntryClickHouseClient (argc=argc@entry=222, argv=argv@entry=0x726c00000000) at ./ci/tmp/build/./programs/client/Client.cpp:1139
2025-06-28 19:55:18 #40 0x000055add6f1ca05 in main (argc_=<optimized out>, argv_=<optimized out>) at ./ci/tmp/build/./programs/main.cpp:338

PR/Stateless tests (amd_tsan, s3 storage, 3/3)

  • 00900_parquet_time_to_ch_date_time

Also can this be related?

@azat
Copy link
Copy Markdown
Member Author

azat commented Jun 29, 2025

Looks better, but, data-race is not fixed yet, so we need to catch it again with proper TSan report

azat added 9 commits June 30, 2025 11:27
…se_query

Got it while running 1000 tests with 32 concurrency
The problem is that clietn calls setSettings() in the global settings,
but this produce a data-race with accessing global settings all over the
places [1]:

- ThreadStatus::finalizePerformanceCounters()
- ThreadStatus::initGlobalProfiler()

  [1]: ClickHouse#81893 (comment)

v2: Initialize query_id in a proper context
v3: Fix apply settings from config on the client
v4 + v5: Fix other settings adjustments
This time I was fixing it for session_timezone, but there are other
reasons to have it properly configured.
Though not sure that it worth it, previosuly code was more generic
…Check()

This fixes possible deadlock in clickhouse-client [1]:

    Thread 2 (Thread 0x7efe0c3e0640 (LWP 293871) "clickhouse-clie"):
    0  0x00007efe0cb9081c in read () from /lib/x86_64-linux-gnu/libc.so.6
    1  0x0000558d24646b2f in DB::ReadBufferFromFileDescriptor::readImpl (this=this@entry=0x7efe0c3d8710, to=0x7efe0c3d80b0 "", min_bytes=min_bytes@entry=1, max_bytes=536, offset=0) at ./ci/tmp/build/./src/IO/ReadBufferFromFileDescriptor.cpp:73
    2  0x0000558d2464725e in DB::ReadBufferFromFileDescriptor::nextImpl (this=0x7efe0c3d8710) at ./ci/tmp/build/./src/IO/ReadBufferFromFileDescriptor.cpp:122
    3  0x0000558d1658bd32 in DB::ReadBuffer::next() ()
    4  0x0000558d24b304c4 in DB::ReadBuffer::eof (this=0x7efe0c3d8710) at ./src/IO/ReadBuffer.h:96
    5  SignalListener::run (this=<optimized out>) at ./ci/tmp/build/./src/Common/SignalHandlers.cpp:309
    6  0x0000558d371ed02e in Poco::ThreadImpl::runnableEntry (pThread=<optimized out>) at ./base/poco/Foundation/src/Thread_POSIX.cpp:335
    7  0x00007efe0cb10ac3 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
    8  0x00007efe0cba2850 in ?? () from /lib/x86_64-linux-gnu/libc.so.6

    Thread 1 (Thread 0x7efe0c990b80 (LWP 293870) "clickhouse-clie"):
    0  0x00007efe0cb0d2c0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
    1  0x00007efe0cb14002 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libc.so.6
    2  0x0000558d246086d6 in pthread_mutex_lock (arg=0x558d45420940 <(anonymous namespace)::getLoggerMutex()::$_0::operator()() const::buffer>) at ./ci/tmp/build/./src/Common/ThreadFuzzer.cpp:446
    3  0x0000558d3a6db189 in std::__1::__libcpp_mutex_lock[abi:ne190107](pthread_mutex_t*) (__m=0x558d45420940 <(anonymous namespace)::getLoggerMutex()::$_0::operator()() const::buffer>) at ./contrib/llvm-project/libcxx/include/__thread/support/pthread.h:95
    4  std::__1::mutex::lock (this=0x558d45420940 <(anonymous namespace)::getLoggerMutex()::$_0::operator()() const::buffer>) at ./ci/tmp/build/./contrib/llvm-project/libcxx/src/mutex.cpp:29
    5  0x0000558d371a9df5 in std::__1::lock_guard<std::__1::mutex>::lock_guard[abi:ne190107](std::__1::mutex&) (this=0x7ffdcc075d00, __m=...) at ./contrib/llvm-project/libcxx/include/__mutex/lock_guard.h:33
    6  Poco::Logger::getShared (name=..., should_be_owned_by_shared_ptr_if_created=false) at ./ci/tmp/build/./base/poco/Foundation/src/Logger.cpp:355
    7  0x0000558d245bbfb4 in getLogger<14ul> (name=...) at ./src/Common/Logger.h:41
    8  MemoryTracker::debugLogBigAllocationWithoutCheck (this=<optimized out>, size=21580033) at ./ci/tmp/build/./src/Common/MemoryTracker.cpp:222
    9  0x0000558d245bd0a7 in MemoryTracker::allocImpl (this=0x558d45270360 <DB::MainThreadStatus::getInstance()::thread_status+80>, size=21580033, throw_if_memory_exceeded=<optimized out>, query_tracker=0x0, _sample_probability=-1) at ./ci/tmp/build/./src/Common/MemoryTracker.cpp:400
    10 0x0000558d2460aab3 in DB::ThreadStatus::flushUntrackedMemory (this=0x558d45270310 <DB::MainThreadStatus::getInstance()::thread_status>) at ./ci/tmp/build/./src/Common/ThreadStatus.cpp:245
    11 0x0000558d24480b1e in CurrentMemoryTracker::allocImpl (size=64, throw_if_memory_exceeded=false) at ./ci/tmp/build/./src/Common/CurrentMemoryTracker.cpp:67
    12 0x0000558d2447c4b9 in trackMemory<> (size=64, trace=...) at ./src/Common/memory.h:210
    13 operator new (size=94065240705344) at ./ci/tmp/build/./src/Common/new_delete.cpp:53
    14 0x0000558d371a9b67 in Poco::Logger::unsafeGet (name=..., get_shared=true) at ./ci/tmp/build/./base/poco/Foundation/src/Logger.cpp:396
    15 0x0000558d371a9e02 in Poco::Logger::getShared (name=..., should_be_owned_by_shared_ptr_if_created=false) at ./ci/tmp/build/./base/poco/Foundation/src/Logger.cpp:356
    16 0x0000558d245bbfb4 in getLogger<14ul> (name=...) at ./src/Common/Logger.h:41
    17 MemoryTracker::debugLogBigAllocationWithoutCheck (this=<optimized out>, size=21580033) at ./ci/tmp/build/./src/Common/MemoryTracker.cpp:222
    18 0x0000558d245bd0a7 in MemoryTracker::allocImpl (this=0x558d45270360 <DB::MainThreadStatus::getInstance()::thread_status+80>, size=21580033, throw_if_memory_exceeded=<optimized out>, query_tracker=0x0, _sample_probability=-1) at ./ci/tmp/build/./src/Common/MemoryTracker.cpp:400
    19 0x0000558d2460aab3 in DB::ThreadStatus::flushUntrackedMemory (this=0x558d45270310 <DB::MainThreadStatus::getInstance()::thread_status>) at ./ci/tmp/build/./src/Common/ThreadStatus.cpp:245
    20 0x0000558d2f8785c1 in DB::ThreadStatus::detachFromGroup (this=0x558d45270310 <DB::MainThreadStatus::getInstance()::thread_status>) at ./ci/tmp/build/./src/Interpreters/ThreadStatusExt.cpp:372
    21 0x0000558d2f87be42 in DB::CurrentThread::detachFromGroupIfNotDetached () at ./ci/tmp/build/./src/Interpreters/ThreadStatusExt.cpp:716
    22 DB::CurrentThread::QueryScope::~QueryScope (this=<optimized out>) at ./ci/tmp/build/./src/Interpreters/ThreadStatusExt.cpp:755
    23 0x0000558d24868a98 in std::__1::__optional_destruct_base<DB::CurrentThread::QueryScope, false>::~__optional_destruct_base[abi:ne190107]() (this=0xfffffffffffffe00) at ./contrib/llvm-project/libcxx/include/optional:293
    24 DB::Client::~Client (this=0x7ffdcc076ce0) at ./ci/tmp/build/./programs/client/Client.cpp:73
    25 0x0000558d24883ac4 in mainEntryClickHouseClient (argc=227, argv=<optimized out>) at ./ci/tmp/build/./programs/client/Client.cpp:1140
    26 0x0000558d163ae1e1 in main (argc_=<optimized out>, argv_=<optimized out>) at ./ci/tmp/build/./programs/main.cpp:338

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=82444&sha=f3fc8c98f4994d88b927cebfdc31c4bcba3d2a6f&name_0=PR&name_1=Stress%20test%20%28amd_ubsan%29
Otherwise it may lead to data-race:

    WARNING: ThreadSanitizer: data race (pid=26127)
      Write of size 6 at 0x7298000099e8 by main thread:
        0 __tsan_memcpy <null> (clickhouse-82444-tsan+0x8be71be) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
        1 DB::SettingsTraits::Data::operator=(DB::SettingsTraits::Data const&) ci/tmp/build/./src/Core/Settings.cpp:6978:1 (clickhouse-82444-tsan+0x192ae4f9) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
        2 DB::BaseSettings<DB::SettingsTraits>::operator=(DB::BaseSettings<DB::SettingsTraits> const&) ci/tmp/build/./src/Core/BaseSettings.h:109:60 (clickhouse-82444-tsan+0x190f2bcc) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
        3 DB::SettingsImpl::operator=(DB::SettingsImpl const&) ci/tmp/build/./src/Core/Settings.cpp:6984:8 (clickhouse-82444-tsan+0x190f2bcc)
        4 DB::Settings::operator=(DB::Settings const&) ci/tmp/build/./src/Core/Settings.cpp:7191:11 (clickhouse-82444-tsan+0x190f2bcc)
        5 DB::Context::setSettings(DB::Settings const&) ci/tmp/build/./src/Interpreters/Context.cpp:2635:15 (clickhouse-82444-tsan+0x1aa94c0a) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
        6 DB::ClientBase::processParsedSingleQuery(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::shared_ptr<DB::IAST>, bool&, unsigned long)::$_0::operator()() const ci/tmp/build/./src/Client/ClientBase.cpp:2184:9 (clickhouse-82444-tsan+0x1df0f4d7) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
        7 BasicScopeGuard<DB::ClientBase::processParsedSingleQuery(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::shared_ptr<DB::IAST>, bool&, unsigned long)::$_0>::invoke() ci/tmp/build/./base/base/../base/scope_guard.h:101:9 (clickhouse-82444-tsan+0x1df0f4d7)
        8 BasicScopeGuard<DB::ClientBase::processParsedSingleQuery(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::shared_ptr<DB::IAST>, bool&, unsigned long)::$_0>::~BasicScopeGuard() ci/tmp/build/./base/base/../base/scope_guard.h:50:26 (clickhouse-82444-tsan+0x1df0f4d7)
        9 DB::ClientBase::processParsedSingleQuery(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::shared_ptr<DB::IAST>, bool&, unsigned long) ci/tmp/build/./src/Client/ClientBase.cpp:2249:5 (clickhouse-82444-tsan+0x1df021ca) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)

      Previous read of size 1 at 0x7298000099e8 by thread T4:
        #0 DB::SettingFieldNumber<bool>::operator bool() const ci/tmp/build/./src/Core/SettingsFields.h:40:36 (clickhouse-82444-tsan+0x1b114e5f) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
        ClickHouse#1 DB::ThreadStatus::finalizePerformanceCounters() ci/tmp/build/./src/Interpreters/ThreadStatusExt.cpp:547:17 (clickhouse-82444-tsan+0x1b114e5f)
        ClickHouse#2 DB::ThreadStatus::detachFromGroup() ci/tmp/build/./src/Interpreters/ThreadStatusExt.cpp:375:5 (clickhouse-82444-tsan+0x1b113aec) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
        ClickHouse#3 DB::CurrentThread::detachFromGroupIfNotDetached() ci/tmp/build/./src/Interpreters/ThreadStatusExt.cpp:716:21 (clickhouse-82444-tsan+0x1b1119b3) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
        ClickHouse#4 DB::ThreadGroupSwitcher::~ThreadGroupSwitcher() ci/tmp/build/./src/Interpreters/ThreadStatusExt.cpp:261:9 (clickhouse-82444-tsan+0x1b1119b3)
        ClickHouse#5 DB::ThreadPoolCallbackRunnerLocal<void, ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>, std::__1::function<void ()>>::operator()(std::__1::function<void ()>&&, Priority, std::__1::optional<unsigned long>)::'lambda'()::operator()() ci/tmp/build/./src/Common/threadPoolCallbackRunner.h:179:9 (clickhouse-82444-tsan+0x14beafde) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)

      Thread T4 'ThreadPool' (tid=26133, running) created by thread T3 at:
        ...
        12 DB::ThreadPoolCallbackRunnerLocal<void, ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>, std::__1::function<void ()>>::operator()(std::__1::function<void ()>&&, Priority, std::__1::optional<unsigned long>) ci/tmp/build/./src/Common/threadPoolCallbackRunner.h:188:22 (clickhouse-82444-tsan+0x14be522d) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
        13 DB::ParallelParsingInputFormat::scheduleParserThreadForUnitWithNumber(unsigned long) ci/tmp/build/./src/Processors/Formats/Impl/ParallelParsingInputFormat.h:287:9 (clickhouse-82444-tsan+0x1e495627) (BuildId: e5eeeb8fc2a0ab65c098bb1f3db89853f2439448)
        14 DB::ParallelParsingInputFormat::segmentatorThreadFunction(std::__1::shared_ptr<DB::ThreadGroup>) ci/tmp/build/./src/Processors/Formats/Impl/ParallelParsingInputFormat.cpp:45:13 (clickhouse-82444-tsan+0x1e495627)

Refs: https://pastila.nl/?0001fdef/9a58e9d59c32d45100a481de26dccf68#T4iNrMFUnu4F2hWST5wLdQ==
@azat azat force-pushed the client-fix-data-races branch from d8f311f to 362026f Compare June 30, 2025 09:41
@azat azat changed the title Fix data-races in client and session_timezone overrides Fix data-races in client, session_timezone overrides and some corner cases Jun 30, 2025
@azat azat changed the title Fix data-races in client, session_timezone overrides and some corner cases Fix data-races in client, session_timezone overrides and some data-races in some corner cases Jun 30, 2025
@azat
Copy link
Copy Markdown
Member Author

azat commented Jun 30, 2025

Stateless tests (amd_binary, ParallelReplicas, s3 storage) — Failed: 1, Passed: 7121, Skipped: 893

Stateless tests (amd_debug) — Failed: 1, Passed: 8030, Skipped: 121

Integration tests (tsan, 6/6) — fail: 1, passed: 638

  • test_storage_rabbitmq

@azat azat closed this Jun 30, 2025
@azat azat reopened this Jun 30, 2025
@azat azat marked this pull request as ready for review June 30, 2025 15:27
@azat
Copy link
Copy Markdown
Member Author

azat commented Jun 30, 2025

OK, I think I fixed all issues that I saw, @Algunenano want to take a final look?

P.S. the history should be clean, I've removed all the testing commits (including merges)

Copy link
Copy Markdown
Member

@Algunenano Algunenano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I wonder why we have profile events counters for tasks in the customer (related to 362026f). I doubt we do anything with them

@azat azat enabled auto-merge June 30, 2025 15:48
@azat azat added this pull request to the merge queue Jun 30, 2025
Merged via the queue into ClickHouse:master with commit e90e5e1 Jun 30, 2025
202 of 241 checks passed
@azat azat deleted the client-fix-data-races branch June 30, 2025 16:12
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 30, 2025
robot-ch-test-poll3 added a commit that referenced this pull request Jun 30, 2025
Cherry pick #82444 to 25.6: Fix data-races in client, session_timezone overrides and some data-races in some corner cases
robot-clickhouse added a commit that referenced this pull request Jun 30, 2025
…verrides and some data-races in some corner cases
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-backports-created-cloud deprecated label, NOOP label Jun 30, 2025
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Jun 30, 2025
clickhouse-gh bot added a commit that referenced this pull request Jun 30, 2025
Backport #82444 to 25.6: Fix data-races in client, session_timezone overrides and some data-races in some corner cases
@robot-clickhouse robot-clickhouse added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Jul 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🍃 green ci 🌿 Fixing flaky tests in CI pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-backports-created-cloud deprecated label, NOOP pr-bugfix Pull request with bugfix, not backported by default pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo v25.6-must-backport

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data races in client (leads to expect tests failures)

5 participants