Fix deadlock in PipelineExecutor downscaling logic by serxa · Pull Request #86089 · ClickHouse/ClickHouse

serxa · 2025-08-23T20:22:50Z

The pipeline shutdown logical condition is that the number of idle threads equals the total number of threads + no more work. It was checked only when the thread was transitioning into the idle state (i.e., putting itself into the threads_queue). However, with preemption and downscaling logic, the total number of threads can also be decreased dynamically, which may also trigger the pipeline's shutdown condition. Without this fix, the pipeline hangs.

It is hard to add a test for this change because it is rare. The issue happens only when a thread is downscaled just after it executes the last task of the whole pipeline. Existing tests cover this, but downscales are rare in these tests.

Changelog category (leave one):

Not for changelog (changelog entry is not required)

clickhouse-gh · 2025-08-23T20:23:15Z

Workflow [PR], commit [8335bbc]

Summary: ⏳

job_name	test_name	status	info	comment
Stress test (amd_tsan)		failure

serxa · 2025-08-24T19:43:28Z

I've discovered that a similar problem may occur with preempted threads. For example, a query might have 6 threads: 2 of which are preempted, waiting for 1 1-second preemption timeout to downscale or a brand new granted slot to continue working. In the meantime other 4 threads do the rest of the the work an finish by going to idle state. This creates a hanging query waiting for its preempted threads.

In this PR, I will add the logic to properly wake and shut down preempted threads when the query finishes, and update the pipeline shutdown condition to take preempted threads into account.

This reverts commit e2e17cd.

serxa · 2025-08-29T18:36:32Z

There are too many failed tests, but all seem unrelated. Let's rerun one more time.

serxa · 2025-09-01T17:30:32Z

All failed tests are unrelated:

02177_issue_31009

#86335

02443_detach_attach_partition

Infamously known to be flaky

test_storage_kafka_sasl/test.py::test_kafka_sasl

flaky

test_storage_s3_queue/test_4.py::test_list_and_delete_race

#86506

alesapin

It would be nice to have a test...

serxa · 2025-09-16T09:06:50Z

Okay, I think I can do one with a reduced preemption timeout to trigger the issue more readily

serxa · 2025-09-17T14:28:15Z

Sanitizer have found related issue. Investigating...

UPD. I was not careful enough and introduced the following data race. TaskQueue itself is not thread-safe and relies on external synchronization (the ExecutorTasks::mutex). In ExecutorTasks::preempt(), it acquires the mutex. If the preempted thread had a local task, it pushes that task back into task_queue and calls tryWakeUpAnyOtherThreadWithTasks(). That helper may call lock.unlock() internally (to wake a thread outside the critical section). Immediately after returning, preempt() checks task_queue.empty() without re-acquiring the lock. That read races with other threads popping tasks (which hold the mutex), exactly as TSan reports. This is fixed in 8335bbc

Cherry pick #86089 to 25.8: Fix deadlock in PipelineExecutor downscaling logic

… logic

Backport #86089 to 25.8: Fix deadlock in PipelineExecutor downscaling logic

Fix deadlock in PipelineExecutor downscaling logic

12a70f2

clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Aug 23, 2025

serxa marked this pull request as draft August 24, 2025 19:36

serxa added 7 commits August 26, 2025 21:23

Fix shutdown of a pipeline with preempted threads

0bfa1ee

fix recursive locking of CPULeaseAllocation::mutex

4efdd19

Try fix use-after-free in ~PipelineExecutor()

e2e17cd

fix build

0c20f56

Increasing concurrency mode for benchmark

ecdb9ca

Revert "Try fix use-after-free in ~PipelineExecutor()"

01a0b5c

This reverts commit e2e17cd.

fix heap-use-after free in ~PipelineExecutor

f0cd07f

serxa added the v25.8-must-backport label Aug 27, 2025

serxa added 2 commits August 27, 2025 19:48

fix data race in freeCPU()

7364c1e

Merge branch 'master' into fix-deadlock-with-cpusched

9be32c3

Merge branch 'master' into fix-deadlock-with-cpusched

0889558

serxa marked this pull request as ready for review August 29, 2025 18:38

alesapin self-assigned this Sep 5, 2025

alesapin approved these changes Sep 15, 2025

View reviewed changes

serxa added 3 commits September 17, 2025 09:49

Merge branch 'master' into fix-deadlock-with-cpusched

296faec

add test for downscaling

494ed09

fix integration test

523e11e

fix data race on downscaling

8335bbc

serxa enabled auto-merge September 18, 2025 17:17

serxa added this pull request to the merge queue Sep 18, 2025

Merged via the queue into master with commit a00aad3 Sep 18, 2025
119 of 123 checks passed

serxa deleted the fix-deadlock-with-cpusched branch September 18, 2025 17:34

robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Sep 18, 2025

robot-clickhouse-ci-1 added pr-backports-created-cloud deprecated label, NOOP pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR labels Sep 18, 2025

robot-ch-test-poll2 mentioned this pull request Sep 18, 2025

Cherry pick #86089 to 25.8: Fix deadlock in PipelineExecutor downscaling logic #87301

Merged

robot-ch-test-poll2 added a commit that referenced this pull request Sep 18, 2025

Merge pull request #87301 from ClickHouse/cherrypick/25.8/86089

7852469

Cherry pick #86089 to 25.8: Fix deadlock in PipelineExecutor downscaling logic

robot-clickhouse added a commit that referenced this pull request Sep 18, 2025

Backport #86089 to 25.8: Fix deadlock in PipelineExecutor downscaling…

74cd25a

… logic

robot-ch-test-poll2 mentioned this pull request Sep 18, 2025

Backport #86089 to 25.8: Fix deadlock in PipelineExecutor downscaling logic #87302

Merged

robot-clickhouse-ci-1 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Sep 18, 2025

clickhouse-gh bot added a commit that referenced this pull request Sep 18, 2025

Merge pull request #87302 from ClickHouse/backport/25.8/86089

3b76812

Backport #86089 to 25.8: Fix deadlock in PipelineExecutor downscaling logic

rienath mentioned this pull request Oct 20, 2025

Flaky test_scheduler_cpu_preemptive/test.py::test_downscaling #88807

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix deadlock in PipelineExecutor downscaling logic#86089

Fix deadlock in PipelineExecutor downscaling logic#86089
serxa merged 15 commits intomasterfrom
fix-deadlock-with-cpusched

serxa commented Aug 23, 2025 •

edited

Loading

Uh oh!

clickhouse-gh bot commented Aug 23, 2025 •

edited

Loading

Uh oh!

serxa commented Aug 24, 2025

Uh oh!

serxa commented Aug 29, 2025

Uh oh!

serxa commented Sep 1, 2025

Uh oh!

alesapin left a comment

Uh oh!

serxa commented Sep 16, 2025

Uh oh!

serxa commented Sep 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

serxa commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Uh oh!

clickhouse-gh bot commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serxa commented Aug 24, 2025

Uh oh!

serxa commented Aug 29, 2025

Uh oh!

serxa commented Sep 1, 2025

Uh oh!

alesapin left a comment

Choose a reason for hiding this comment

Uh oh!

serxa commented Sep 16, 2025

Uh oh!

serxa commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

serxa commented Aug 23, 2025 •

edited

Loading

clickhouse-gh bot commented Aug 23, 2025 •

edited

Loading

serxa commented Sep 17, 2025 •

edited

Loading