Skip to content

[9.0.2] Fix infinite spin-wait and missing cancellation propagation in TaskDeduplicator. (https://github.com/bazelbuild/bazel/pull/28938)#28990

Merged
iancha1992 merged 1 commit intobazelbuild:release-9.0.2from
bazel-io:cp28938-9.0.2-141434
Mar 13, 2026

Conversation

@bazel-io
Copy link
Copy Markdown
Member

The test TaskDeduplicatorTest.executeIfNeeded_executeAndCancelLoop_noErrors sporadically hung during the ExecutorService.close() call. This was due to two issues:

  1. In executeIfNew, if a thread encountered a RefcountedFuture that was already canceled (refcount = 0), it would call Thread.yield() and continue the while(true) loop. It relied on a listener attached to the canceled future to eventually remove it from the map. However, when using virtual threads, many threads could enter this spin loop simultaneously, saturating the underlying carrier threads. This prevented the listener from being scheduled, creating a deadlock where the spinning threads never saw the entry removed.

  2. RefcountedFuture was not propagating the cancel() call to its delegate future. This meant that even if all callers canceled their interest in a task, the task would continue to execute in the background, wasting resources and increasing contention.

This PR makes the following changes:

  1. executeIfNew and maybeJoinExecution are modified to explicitly call inFlightTasks.remove(key, future) if retain() fails. This ensures the spin loop is broken immediately by the next thread to encounter the canceled future, rather than waiting for an asynchronous listener.

  2. RefcountedFuture.cancel now calls delegate.cancel(mayInterruptIfRunning) when the internal reference count drops to zero.

  3. Thread.yield() and the associated @SuppressWarnings("ThreadPriorityCheck") are removed, as they are no longer necessary.

Fixes #28302.

Closes #28938.

PiperOrigin-RevId: 883147973
Change-Id: I28c1db252573a4c39b1a9e53d32e218327340054

Commit a0760f1

…duplicator. (bazelbuild#28938)

The test TaskDeduplicatorTest.executeIfNeeded_executeAndCancelLoop_noErrors sporadically hung during the ExecutorService.close() call. This was due to two issues:

1. In executeIfNew, if a thread encountered a RefcountedFuture that was already canceled (refcount = 0), it would call Thread.yield() and continue the while(true) loop. It relied on a listener attached to the canceled future to eventually remove it from the map. However, when using virtual threads, many threads could enter this spin loop simultaneously, saturating the underlying carrier threads. This prevented the listener from being scheduled, creating a deadlock where the spinning threads never saw the entry removed.

2. RefcountedFuture was not propagating the cancel() call to its delegate future. This meant that even if all callers canceled their interest in a task, the task would continue to execute in the background, wasting resources and increasing contention.

This PR makes the following changes:

1. executeIfNew and maybeJoinExecution are modified to explicitly call inFlightTasks.remove(key, future) if retain() fails. This ensures the spin loop is broken immediately by the next thread to encounter the canceled future, rather than waiting for an asynchronous listener.

2. RefcountedFuture.cancel now calls delegate.cancel(mayInterruptIfRunning) when the internal reference count drops to zero.

3. Thread.yield() and the associated @SuppressWarnings("ThreadPriorityCheck") are removed, as they are no longer necessary.

Fixes bazelbuild#28302.

Closes bazelbuild#28938.

PiperOrigin-RevId: 883147973
Change-Id: I28c1db252573a4c39b1a9e53d32e218327340054
@bazel-io bazel-io requested a review from a team as a code owner March 13, 2026 14:15
@bazel-io bazel-io added team-Remote-Exec Issues and PRs for the Execution (Remote) team awaiting-review PR is awaiting review from an assigned reviewer labels Mar 13, 2026
@bazel-io bazel-io requested a review from fmeum March 13, 2026 14:15
@iancha1992 iancha1992 enabled auto-merge March 13, 2026 18:43
@iancha1992 iancha1992 added this pull request to the merge queue Mar 13, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Mar 13, 2026
@iancha1992 iancha1992 added this pull request to the merge queue Mar 13, 2026
Merged via the queue into bazelbuild:release-9.0.2 with commit 7122353 Mar 13, 2026
46 checks passed
@github-actions github-actions Bot removed the awaiting-review PR is awaiting review from an assigned reviewer label Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

team-Remote-Exec Issues and PRs for the Execution (Remote) team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants