Skip to content

Thread pool: move thread creation out of lock#68694

Merged
serxa merged 26 commits intoClickHouse:masterfrom
filimonov:thread_pool_thread_creation_out_of_lock
Oct 5, 2024
Merged

Thread pool: move thread creation out of lock#68694
serxa merged 26 commits intoClickHouse:masterfrom
filimonov:thread_pool_thread_creation_out_of_lock

Conversation

@filimonov
Copy link
Copy Markdown
Contributor

@filimonov filimonov commented Aug 22, 2024

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Optimized thread creation in the ThreadPool to minimize lock contention. Thread creation is now performed outside of the critical section to avoid delays in job scheduling and thread management under high load conditions. This leads to a much more responsive ClickHouse under heavy concurrent load.

Documentation entry for user-facing changes

The thread creation process within the ThreadPool has been modified to address potential delays caused by holding a lock during thread creation. Previously, threads were created within a locked section, which could lead to significant contention and delays, especially when thread creation is slow. To mitigate this, the thread creation has been moved outside of the critical section.

Same test https://gist.github.com/filimonov/7e7adde17421d4a9f83c6fea2be8f802 - before and after the change - results are in comment below.

It looks like a clear win giving about 10% better QPS, much better response times and better threadpool and CPU usage, despite that thread creation become even slower - it's because now several thread can be created simultaneously and kernel side contetion is bigger. But now it does not block the thread pool work anymore.

P.S. I have no idea how to test it in CI/CD, but there is a change that performance tests will show something.

@Avogar Avogar added the can be tested Allows running workflows for external contributors label Aug 22, 2024
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-performance Pull request with some performance improvements label Aug 22, 2024
@robot-ch-test-poll1
Copy link
Copy Markdown
Contributor

robot-ch-test-poll1 commented Aug 22, 2024

This is an automated comment for commit 48e4092 with description of existing statuses. It's updated for the latest CI running

✅ Click here to open a full report in a separate page

Successful checks
Check nameDescriptionStatus
AST fuzzerRuns randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help✅ success
BuildsThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
ClickBenchRuns [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table✅ success
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker keeper imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docker server imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docs checkBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Flaky testsChecks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integration tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests✅ success
Performance ComparisonMeasure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stress testRuns stateless functional tests concurrently from several clients to detect concurrency-related errors✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success
Upgrade checkRuns stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts✅ success

@serxa serxa self-assigned this Sep 2, 2024
Copy link
Copy Markdown
Member

@serxa serxa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few concerns with this approach:

  1. The thread limit is not respected in a corner case (see comment in code).
  2. We have two critical sections due to unlock/lock introduced + more sync using std::future. This could hurt the performance in normal situations when threads are created fast.

I think a better approach would be to delegate the task of inserting a thread into the list to the thread itself. But it should be done with some care: we need to count the number of all threads including "initializing" threads using atomics.

  1. Add std::atomic<UInt64> ThreadPoolImpl::threads_count;
  2. When we start a thread in scheduleImpl() we only try to acquire a "slot" for the new thread and respect the limits without locking the mutex (see code below). It guarantees that multiple threads might be created simultaneously.
  3. I think thread creation can be moved to the instant after critical section (after ++scheduled_jobs;). The main point is to make only one small critical section and avoid futures.
  4. When the worker starts and locks the mutex for the first time it should first insert itself into threads list and get iterator it needs, then start job processing as usual.
  5. Take care of decrementing threads_count when thread removes itself from threads
  6. Wait for threads_count == 0 (busy wait, maybe with sleeps) in destructor.
    UInt64 threads_count_value = threads_count.load();
    while (true)
    {
        if (threads_count_value < std::min(max_threads, scheduled_jobs))
        {
            if (threads_count.compare_exchange_strong(threads_count_value, threads_count_value + 1))
            {
                // Thread creation code-path
                break;
            }
            // else retry (threads count has changed)
        }
        else
        {
            // Limit exceeded code-path (do not create new thread)
            break;
        }
    }

@filimonov
Copy link
Copy Markdown
Contributor Author

I think thread creation can be moved to the instant after critical section (after ++scheduled_jobs;).

That may lead to some other side effects, - it breaks some of the invariants existing currently.

Namely currenly there is a strong guarantee that at the moment when job is accepted at least one thread is there.

I.e. consider the scenario when we push the job but later will fail to create a thread, then that job will stay there and will never be executed...

Actually i was trying a lot of variants - also with 2 mutexes, 'outsoursing' thread creation, also trying to push the thread by the worker... but they either just fails with some deadlocks (like if you create a thread before a critical section in schedule - worker can take the lock earlier and will wait for a condvar forever), or have side effects, either the complexity grows too much.

@ilejn also did attempt to createing the lock-free thread pool - but again is quite hard to guaranee all the existing invarants (again no hard guaranees of having the desired number of threads). And hard to prove that everything will still work ok if they will be bit not as strict...

Atomics are also quite expensive here due to huge contention...

Scenario of the global thread pool expansion is not a hot path for sure, so I don't think that those 2 locks will hurt too much. But maybe indeed it's not the best for local pools...

Anyway, thanks for review, will try to address the issues you mentioned.

@filimonov filimonov force-pushed the thread_pool_thread_creation_out_of_lock branch from 9c95e64 to d6e662a Compare September 9, 2024 13:25
@filimonov filimonov marked this pull request as draft September 9, 2024 13:26
@filimonov filimonov marked this pull request as ready for review September 9, 2024 15:29
@filimonov filimonov requested a review from serxa September 9, 2024 15:29
@filimonov filimonov force-pushed the thread_pool_thread_creation_out_of_lock branch from f7c497e to debb501 Compare September 9, 2024 19:19
Copy link
Copy Markdown
Member

@serxa serxa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I still do not like extra sync costs. Single std::mutex is better than std::future and its cost is ~2 atomic ops, so it might be good enough. We have separate mutex in every thread, which is good, less contention. And we now have shorter critical section on the main mutex which is also better. So it is now hard to tell what implementation is better. I want to do a few synthetic tests for ThreadPool just to make a comparison with some figures.

@filimonov filimonov marked this pull request as draft September 16, 2024 15:34
@filimonov filimonov marked this pull request as ready for review September 18, 2024 09:30
@filimonov filimonov requested a review from serxa September 18, 2024 09:30
@filimonov
Copy link
Copy Markdown
Contributor Author

filimonov commented Sep 18, 2024

@serxa i was able to get rid of extra locks using atomics.

@filimonov
Copy link
Copy Markdown
Contributor Author

filimonov commented Sep 20, 2024

https://s3.amazonaws.com/clickhouse-test-reports/68694/209f491e2ae35bd5be7cedaea03447aacc65fc56/stress_test__tsan_/stderr.log

seems related... The thread tries to access a properties of the pool which is destructed. Will check it...

UPD: fixed.

@filimonov
Copy link
Copy Markdown
Contributor Author

filimonov commented Sep 20, 2024

Synthetic tests results (12 cores laptop, ubuntu 22)

Before

localhost:9000, queries: 2000, QPS: 91.536, RPS: 63416263.944, MiB/s: 483.827, result RPS: 91.536, result MiB/s: 0.001.

0.000%		0.056 sec.	
10.000%		2.413 sec.	
20.000%		3.040 sec.	
30.000%		3.545 sec.	
40.000%		3.917 sec.	
50.000%		4.322 sec.	
60.000%		4.868 sec.	
70.000%		5.306 sec.	
80.000%		5.850 sec.	
90.000%		6.825 sec.	
95.000%		7.553 sec.	
99.000%		9.198 sec.	
99.900%		10.210 sec.	
99.990%		11.060 sec.


Queries executed: 50000.

localhost:9000, queries: 50000, QPS: 1360.362, RPS: 1360.362, MiB/s: 0.001, result RPS: 1360.362, result MiB/s: 0.001.

0.000%		0.001 sec.	
10.000%		0.004 sec.	
20.000%		0.006 sec.	
30.000%		0.008 sec.	
40.000%		0.009 sec.	
50.000%		0.010 sec.	
60.000%		0.011 sec.	
70.000%		0.013 sec.	
80.000%		0.016 sec.	
90.000%		0.032 sec.	
95.000%		0.107 sec.	
99.000%		0.428 sec.	
99.900%		1.170 sec.	
99.990%		2.548 sec.	


Row 1:
──────
sum(ProfileEvent_GlobalThreadPoolExpansions):                 5798
sum(ProfileEvent_GlobalThreadPoolShrinks):                    4883
sum(ProfileEvent_GlobalThreadPoolThreadCreationMicroseconds): 15846601
sum(ProfileEvent_GlobalThreadPoolLockWaitMicroseconds):       24515791498
sum(ProfileEvent_GlobalThreadPoolJobs):                       102413
sum(ProfileEvent_GlobalThreadPoolJobWaitTimeMicroseconds):    2180480667
sum(ProfileEvent_LocalThreadPoolExpansions):                  49472
sum(ProfileEvent_LocalThreadPoolShrinks):                     48996
sum(ProfileEvent_LocalThreadPoolThreadCreationMicroseconds):  4687873636
sum(ProfileEvent_LocalThreadPoolLockWaitMicroseconds):        20154394438
sum(ProfileEvent_LocalThreadPoolJobs):                        52358
sum(ProfileEvent_LocalThreadPoolBusyMicroseconds):            20795340229
sum(ProfileEvent_LocalThreadPoolJobWaitTimeMicroseconds):     19762541701
Row 1:
──────
quantiles(0.001, 0.5, 0.999)(CurrentMetric_GlobalThread):          [785,1701,5488.696]
quantiles(0.001, 0.5, 0.999)(CurrentMetric_GlobalThreadActive):    [689,909,4575.11]
quantiles(0.001, 0.5, 0.999)(CurrentMetric_GlobalThreadScheduled): [689,966,4665.624]

After

localhost:9000, queries: 2000, QPS: 100.308, RPS: 73195401.707, MiB/s: 558.436, result RPS: 100.308, result MiB/s: 0.001.

0.000%		0.008 sec.	
10.000%		0.233 sec.	
20.000%		0.511 sec.	
30.000%		0.707 sec.	
40.000%		0.898 sec.	
50.000%		1.130 sec.	
60.000%		1.384 sec.	
70.000%		1.584 sec.	
80.000%		1.994 sec.	
90.000%		2.764 sec.	
95.000%		3.523 sec.	
99.000%		4.579 sec.	
99.900%		5.624 sec.	
99.990%		5.926 sec.	


localhost:9000, queries: 50000, QPS: 1506.679, RPS: 1506.679, MiB/s: 0.001, result RPS: 1506.679, result MiB/s: 0.001.

0.000%		0.001 sec.	
10.000%		0.004 sec.	
20.000%		0.006 sec.	
30.000%		0.007 sec.	
40.000%		0.008 sec.	
50.000%		0.010 sec.	
60.000%		0.011 sec.	
70.000%		0.013 sec.	
80.000%		0.017 sec.	
90.000%		0.036 sec.	
95.000%		0.078 sec.	
99.000%		0.317 sec.	
99.900%		0.974 sec.	
99.990%		1.631 sec.	

Row 1:
──────
sum(ProfileEvent_GlobalThreadPoolExpansions):                 4957
sum(ProfileEvent_GlobalThreadPoolShrinks):                    4957
sum(ProfileEvent_GlobalThreadPoolThreadCreationMicroseconds): 1703607416
sum(ProfileEvent_GlobalThreadPoolLockWaitMicroseconds):       28217693
sum(ProfileEvent_GlobalThreadPoolJobs):                       100861
sum(ProfileEvent_GlobalThreadPoolJobWaitTimeMicroseconds):    74449960
sum(ProfileEvent_LocalThreadPoolExpansions):                  47834
sum(ProfileEvent_LocalThreadPoolShrinks):                     47834
sum(ProfileEvent_LocalThreadPoolThreadCreationMicroseconds):  945375547
sum(ProfileEvent_LocalThreadPoolLockWaitMicroseconds):        78741193
sum(ProfileEvent_LocalThreadPoolJobs):                        51104
sum(ProfileEvent_LocalThreadPoolBusyMicroseconds):            21127410099
sum(ProfileEvent_LocalThreadPoolJobWaitTimeMicroseconds):     143136141

Row 1:
──────
quantiles(0.001, 0.5, 0.999)(CurrentMetric_GlobalThread):          [1702.9999999999998,1703,3095.3309999999997]
quantiles(0.001, 0.5, 0.999)(CurrentMetric_GlobalThreadActive):    [703,1007.5,3001.5869999999995]
quantiles(0.001, 0.5, 0.999)(CurrentMetric_GlobalThreadScheduled): [703,1007.5,3001.5869999999995]

About 10% highter QPS,
much more stable times,
Proportion busy / job wait time: 51%-49% before, 99.4%-0.6% after.
Fewer threads was working (so same amount of tasks was processed by fewer threads)

Results from another test on a bigger instance (128 cores, redhat 8) - query from number_mt with a high max_threads and another load in backgroud:

-- before
0 rows in set. Elapsed: 68.922 sec. Processed 17.62 billion rows, 140.96 GB (255.65 million rows/s., 2.05 GB/s.)

-- after
0 rows in set. Elapsed: 54.987 sec. Processed 18.87 billion rows, 150.96 GB (343.18 million rows/s., 2.75 GB/s.)

Sounds like a clear win, visible on the high concurrency scenarios.

Copy link
Copy Markdown
Member

@serxa serxa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Requires a little bit of polishing.

@filimonov filimonov force-pushed the thread_pool_thread_creation_out_of_lock branch from f490b29 to a03822e Compare October 2, 2024 19:52
@filimonov
Copy link
Copy Markdown
Contributor Author

filimonov commented Oct 2, 2024

0x00007f081f5f89fc in pthread_kill () from /lib/x86_64-linux-gnu/libc.so.6
#0  0x00007f081f5f89fc in pthread_kill () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x00007f081f5a4476 in raise () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#2  0x00007f081f58a7f3 in abort () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#3  0x00005619ea6ba4ff in __interceptor_abort ()
No symbol table info available.
#4  0x00005619f295e44b in terminate_handler () at ./build_docker/./src/Common/SignalHandlers.cpp:157
        terminating = true
        buf_size = <optimized out>
        buf = "\377\377\377\377\2407\001\000\360\037Terminate called for uncaught exception:\nCode: 439. DB::Exception: Cannot schedule a task: fault injected (threads=67, jobs=0). (CANNOT_SCHEDULE_TASK), Stack trace (when copying this message"...
        out = {<DB::WriteBufferFromFileBase> = {<DB::BufferWithOwnMemory<DB::WriteBuffer>> = {<DB::WriteBuffer> = {<DB::BufferBase> = {pos = 0x7f06ee68ed40 "\377\377\377\377\2407\001", bytes = 4090, working_buffer = {begin_pos = 0x7f06ee68ed40 "\377\377\377\377\2407\001", end_pos = 0x7f06ee68fd40 "\260\t\343\004\032V"}, internal_buffer = {begin_pos = 0x7f06ee68ed40 "\377\377\377\377\2407\001", end_pos = 0x7f06ee68fd40 "\260\t\343\004\032V"}, padded = false}, _vptr$WriteBuffer = 0x561a04e309b0 <vtable for DB::WriteBufferFromFileDescriptor+16>, finalized = false, canceled = false, nextimpl_working_buffer_offset = 0}, memory = {<boost::noncopyable_::noncopyable> = {<boost::noncopyable_::base_token> = {<No data fields>}, <No data fields>}, <Allocator<false, false>> = {static clear_memory = 127}, static pad_right = 63, m_capacity = 0, m_size = 0, m_data = 0x0, alignment = 0, allow_gwp_asan_force_sample = true}}, <No data fields>}, fd = 8, throttler = {__ptr_ = 0x0, __cntrl_ = 0x0}, file_name = {static __endian_factor = 1, __r_ = {<std::__1::__compressed_pair_elem<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__rep, 0, false>> = {__value_ = {{__l = {__data_ = 0x7f05a21c3d00 "", __size_ = 1, __cap_ = 139662171323696, __is_long_ = 0}, __s = {__data_ = "\000=\034\242\005\177\000\000\001\000\000\000\000\000\000\0000=\034\242\005\177", __padding_ = 0x7f06ee68fddf "", __size_ = 0 '\000', __is_long_ = 0 '\000'}, __r = {__words = {139662171323648, 1, 139662171323696}}}}}, <std::__1::__compressed_pair_elem<std::__1::allocator<char>, 1, true>> = {<std::__1::allocator<char>> = {<std::__1::__non_trivial_if<true, std::__1::allocator<char> >> = {<No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, static npos = 18446744073709551615}, use_adaptive_buffer_size = false, adaptive_max_buffer_size = 4096}
        log_message = <optimized out>
        signal_pipe = <optimized out>
#5  0x0000561a04c60914 in std::__terminate (func=0x5619f295e260 <terminate_handler()>) at ./build_docker/./contrib/llvm-project/libcxxabi/src/cxa_handlers.cpp:59
No locals.
#6  0x0000561a04c60826 in std::terminate () at ./build_docker/./contrib/llvm-project/libcxxabi/src/cxa_handlers.cpp:88
        unwind_exception = <optimized out>
        exception_header = 0x72200056df00
        globals = <optimized out>
#7  0x00005619fcd74d6e in DB::MergeTreeData::loadOutdatedDataParts (this=0x728000470040, is_async=true) at ./build_docker/./src/Storages/MergeTree/MergeTreeData.cpp:2166
No locals.
#8  0x00005619fce08ee7 in DB::MergeTreeData::loadDataParts(bool, std::__1::optional<std::__1::unordered_set<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > >)::$_4::operator()() const (this=<optimized out>) at ./build_docker/./src/Storages/MergeTree/MergeTreeData.cpp:2006
No locals.

#70257

@filimonov
Copy link
Copy Markdown
Contributor Author

@filimonov
Copy link
Copy Markdown
Contributor Author

filimonov commented Oct 3, 2024

RabbitMQ issue - #45160 (details in integration_run_parallel4_0.tar.zst test_storage_rabbitmq/_instances-0-gw1 )

looks like container just starts too slow and 1 minute timeout is not enough:

2024-10-02 23:22:16 [ 607 ] DEBUG : Stderr: Container rootteststoragerabbitmq-gw1-rabbitmq1-1  Creating (cluster.py:140, run_and_check)

...
rabbitmq1-1  | 2024-10-02 23:22:22.896241+00:00 [info] <0.230.0>  data dir       : /var/lib/rabbitmq/mnesia/rabbit@rabbitmq1

2024-10-02 23:22:22.894333+00:00 [debug] <0.230.0> == Prelaunch DONE ==
2024-10-02 23:22:22.894698+00:00 [info] <0.230.0> 
2024-10-02 23:22:22.894698+00:00 [info] <0.230.0>  Starting RabbitMQ 3.12.6 on Erlang 25.3.2.7 [jit]


2024-10-02 23:22:22.896348+00:00 [debug] <0.230.0> == Plugins (prelaunch phase) ==
2024-10-02 23:22:22.896407+00:00 [debug] <0.230.0> Setting plugins up

...

2024-10-02 23:22:23.511204+00:00 [debug] <0.291.0> Feature flags: [global unregister] @ rabbit@rabbitmq1
2024-10-02 23:22:23.511362+00:00 [debug] <0.230.0> 
2024-10-02 23:22:23.511384+00:00 [debug] <0.230.0> == Boot steps ==
2024-10-02 23:23:17.408942+00:00 [info] <0.230.0> Running boot step pre_boot defined by app rabbit

... 

2024-10-02 23:23:19.807306+00:00 [debug] <0.230.0> DB: initialization successeful

2024-10-02 23:23:20.194852+00:00 [debug] <0.230.0> Change boot state to `core_started`
2024-10-02 23:23:20.196174+00:00 [debug] <0.625.0> 
2024-10-02 23:23:20.196236+00:00 [debug] <0.625.0> == Postlaunch phase ==
2024-10-02 23:23:20.196255+00:00 [debug] <0.134.0> Boot state/systemd: sending

2024-10-02 23:23:21.213615+00:00 [info] <0.625.0> Server startup complete; 3 plugins started.



2024-10-02 23:23:18 [ 607 ] DEBUG : Can't connect to RabbitMQ Command '('docker', 'exec', '-i', '-e', 'RABBITMQ_ERLANG_COOKIE=rootteststoragerabbitmq-gw1-rabbitmq1-1', 'rootteststoragerabbitmq-gw1-rabbitmq1-1', 'rabbitmqctl', 'await_startup')' timed out after 60 seconds (cluster.py:2364, wait_rabbitmq_to_start)
2024-10-02 23:23:29 [ 607 ] DEBUG : Failed to start cluster:  (cluster.py:3101, start)
2024-10-02 23:23:29 [ 607 ] DEBUG : Cannot wait RabbitMQ container (cluster.py:3102, start)
2024-10-02 23:23:36 [ 607 ] DEBUG : Stderr: Container rootteststoragerabbitmq-gw1-rabbitmq1-1  Stopping (cluster.py:140, run_and_check)
2024-10-02 23:23:36 [ 607 ] DEBUG : Stderr: Container rootteststoragerabbitmq-gw1-rabbitmq1-1  Stopped (cluster.py:140, run_and_check)
2024-10-02 23:23:36 [ 607 ] DEBUG : Stderr: Container rootteststoragerabbitmq-gw1-rabbitmq1-1  Removing (cluster.py:140, run_and_check)
2024-10-02 23:23:36 [ 607 ] DEBUG : Stderr: Container rootteststoragerabbitmq-gw1-rabbitmq1-1  Removed (cluster.py:140, run_and_check)

At 2024-10-02 23:22:16 container was started, 1 minute later (at 2024-10-02 23:23:18) test reported "can't connect to RabbitMQ", and 2024-10-02 23:23:29 it said "Cannot wait RabbitMQ container".

rabbit logs show that it was started at 2024-10-02 23:23:21 (3 seconds after 'can't connect')

@filimonov filimonov mentioned this pull request Oct 3, 2024
@filimonov filimonov force-pushed the thread_pool_thread_creation_out_of_lock branch from a03822e to 48e4092 Compare October 4, 2024 14:08
@filimonov filimonov requested a review from serxa October 4, 2024 20:27
@serxa serxa added this pull request to the merge queue Oct 5, 2024
Merged via the queue into ClickHouse:master with commit 1961304 Oct 5, 2024
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Oct 5, 2024
ilejn pushed a commit to Altinity/ClickHouse that referenced this pull request Dec 19, 2024
…_creation_out_of_lock

Thread pool: move thread creation out of lock
Enmk added a commit to Altinity/ClickHouse that referenced this pull request Dec 20, 2024
…read_creation_out_of_lock

24.8.8 Backport PR ClickHouse#68694 Thread pool: move thread creation out of lock
@azat azat mentioned this pull request Jun 30, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-performance Pull request with some performance improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants