Fix deadlock in ServiceEndpointWatcher when disposing change token registration#7255
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a critical deadlock issue in ServiceEndpointWatcher that occurs when disposing change token registrations while holding a lock. The deadlock happens when one thread holds the lock and waits for a callback to complete, while the callback thread is blocked trying to acquire the same lock.
Changes:
- Moved
_changeTokenRegistration?.Dispose()outside the lock inRefreshAsyncInternal()by capturing the registration before disposal - Applied the same pattern to
DisposeAsync()for both_changeTokenRegistrationand_pollingTimer - Added explanatory comments documenting the deadlock scenario and why disposal must occur outside the lock
|
/backport to release/10.2 |
|
Started backporting to release/10.2: https://github.com/dotnet/extensions/actions/runs/21645137209 |
|
@joperezr backporting to "release/10.2" failed, the patch most likely resulted in conflicts: $ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch
Patch format detection failed.
Error: The process '/usr/bin/git' failed with exit code 128Please backport manually! |
d957975 to
f278292
Compare
…gistration Move _changeTokenRegistration.Dispose() outside the lock to avoid deadlock. CancellationTokenRegistration.Dispose() blocks waiting for any in-flight callback to complete, but the callback (RefreshAsync) tries to acquire the same lock, causing a deadlock. The fix captures the registration reference while holding the lock, then disposes it after releasing the lock. Applied to both RefreshAsyncInternal and DisposeAsync methods.
f278292 to
730ee47
Compare
|
/backport to release/10.2 |
|
Started backporting to release/10.2: https://github.com/dotnet/extensions/actions/runs/21650268580 |
Summary
Fixes a deadlock in
ServiceEndpointWatcherthat occurs when disposing the change token registration while holding_lock.The Problem
A deadlock can occur with the following sequence:
RefreshAsyncInternal(), acquires_lock, and calls_changeTokenRegistration?.Dispose()CancellationTokenRegistration.Dispose()blocks waiting for any in-flight callback to completeRefreshAsync(force: false), which tries to acquire_lock_lockand waits for Thread B's callback → Thread B waits for_lockheld by Thread AThe Fix
Move
_changeTokenRegistration?.Dispose()outside the lock:Applied to both
RefreshAsyncInternal()andDisposeAsync()methods.Testing
Microsoft Reviewers: Open in CodeFlow