Skip to content

Fix deadlock in PipelineExecutor downscaling logic#86089

Merged
serxa merged 15 commits intomasterfrom
fix-deadlock-with-cpusched
Sep 18, 2025
Merged

Fix deadlock in PipelineExecutor downscaling logic#86089
serxa merged 15 commits intomasterfrom
fix-deadlock-with-cpusched

Conversation

@serxa
Copy link
Copy Markdown
Member

@serxa serxa commented Aug 23, 2025

The pipeline shutdown logical condition is that the number of idle threads equals the total number of threads + no more work. It was checked only when the thread was transitioning into the idle state (i.e., putting itself into the threads_queue). However, with preemption and downscaling logic, the total number of threads can also be decreased dynamically, which may also trigger the pipeline's shutdown condition. Without this fix, the pipeline hangs.

It is hard to add a test for this change because it is rare. The issue happens only when a thread is downscaled just after it executes the last task of the whole pipeline. Existing tests cover this, but downscales are rare in these tests.

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Aug 23, 2025

Workflow [PR], commit [8335bbc]

Summary:

job_name test_name status info comment
Stress test (amd_tsan) failure

@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Aug 23, 2025
@serxa serxa marked this pull request as draft August 24, 2025 19:36
@serxa
Copy link
Copy Markdown
Member Author

serxa commented Aug 24, 2025

I've discovered that a similar problem may occur with preempted threads. For example, a query might have 6 threads: 2 of which are preempted, waiting for 1 1-second preemption timeout to downscale or a brand new granted slot to continue working. In the meantime other 4 threads do the rest of the the work an finish by going to idle state. This creates a hanging query waiting for its preempted threads.

In this PR, I will add the logic to properly wake and shut down preempted threads when the query finishes, and update the pipeline shutdown condition to take preempted threads into account.

@serxa
Copy link
Copy Markdown
Member Author

serxa commented Aug 29, 2025

There are too many failed tests, but all seem unrelated. Let's rerun one more time.

@serxa serxa marked this pull request as ready for review August 29, 2025 18:38
@serxa
Copy link
Copy Markdown
Member Author

serxa commented Sep 1, 2025

All failed tests are unrelated:

02177_issue_31009

#86335

02443_detach_attach_partition

Infamously known to be flaky

test_storage_kafka_sasl/test.py::test_kafka_sasl

flaky

test_storage_s3_queue/test_4.py::test_list_and_delete_race

#86506

@alesapin alesapin self-assigned this Sep 5, 2025
Copy link
Copy Markdown
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a test...

@serxa
Copy link
Copy Markdown
Member Author

serxa commented Sep 16, 2025

Okay, I think I can do one with a reduced preemption timeout to trigger the issue more readily

@serxa
Copy link
Copy Markdown
Member Author

serxa commented Sep 17, 2025

Sanitizer have found related issue. Investigating...

UPD. I was not careful enough and introduced the following data race. TaskQueue itself is not thread-safe and relies on external synchronization (the ExecutorTasks::mutex). In ExecutorTasks::preempt(), it acquires the mutex. If the preempted thread had a local task, it pushes that task back into task_queue and calls tryWakeUpAnyOtherThreadWithTasks(). That helper may call lock.unlock() internally (to wake a thread outside the critical section). Immediately after returning, preempt() checks task_queue.empty() without re-acquiring the lock. That read races with other threads popping tasks (which hold the mutex), exactly as TSan reports. This is fixed in 8335bbc

@serxa serxa enabled auto-merge September 18, 2025 17:17
@serxa serxa added this pull request to the merge queue Sep 18, 2025
Merged via the queue into master with commit a00aad3 Sep 18, 2025
119 of 123 checks passed
@serxa serxa deleted the fix-deadlock-with-cpusched branch September 18, 2025 17:34
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Sep 18, 2025
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added pr-backports-created-cloud deprecated label, NOOP pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR labels Sep 18, 2025
robot-ch-test-poll2 added a commit that referenced this pull request Sep 18, 2025
Cherry pick #86089 to 25.8: Fix deadlock in PipelineExecutor downscaling logic
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Sep 18, 2025
clickhouse-gh bot added a commit that referenced this pull request Sep 18, 2025
Backport #86089 to 25.8: Fix deadlock in PipelineExecutor downscaling logic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-backports-created-cloud deprecated label, NOOP pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo v25.8-must-backport

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants