[release/8.0-staging] Fixes deadlock for IncrementingPollingCounter callbacks#108648
Merged
noahfalk merged 1 commit intodotnet:release/8.0-stagingfrom Oct 11, 2024
Merged
Conversation
This is a modified backport of dotnet#105548. It mostly preserves the logic of the original fix in .NET 9 with a few adjustments: - Added a config switch System.Diagnostics.Tracing.CounterCallbackOnTimerThread that must be set to true to opt-in to the fix behavior. The .NET 9 fix was documented as a breaking change because it slighly modifies the timing and thread used for first call to an IncrementingPollingCounter callback. I did not want anyone in 8.0 to opted into this by default. - The opt-in switch sets the property CounterCallbackOnTimerThread and I added this condition to several of the if checks in the code. Its more than would be strictly necessary just to make it obvious when code reviewing individual methods that the new code paths are unreachable unless the app opts in. - The original 9.0 change had a bit more refactoring that wasn't essential (renaming a method, removing an unneeded lock() scope) and I removed that here to reduce the code delta.
Contributor
|
Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti |
jeffschwMSFT
approved these changes
Oct 8, 2024
Member
jeffschwMSFT
left a comment
There was a problem hiding this comment.
lgtm. please get a code review. we will take for consideration in 8.0.x
tarekgh
approved these changes
Oct 8, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Modified backport of #105548 to release/8.0-staging
/cc @noahfalk @eterekhin
Customer Impact
The servicing request comes from Microsoft Exchange team via internal email. This bug causes their service to occasionally hang at startup when a monitoring tool has enabled listening to the System.Runtime EventCounters. We've already had variants of this bug reported by multiple external customers, for example #93175.
The underlying issue is a deadlock caused by a lock ordering issue between the static constructor lock and the EventListener lock. It is fixed by changing the thread we issue the IncrementingPollingCounter callback on so that the EventListener lock isn't held when the callback runs.
Regression
To the best of my understanding this bug has been present since the counters were first introduced in .NET Core 3. However its possible that specific details have shifted over time allowing the bug to be hit more easily.
Testing
I manually tested in a debugger stepping through all the modified code and verifying the expected behavior.
Risk
Low - I have guarded all the changed behavior with an opt-in AppContext switch (System.Diagnostics.Tracing.CounterCallbackOnTimerThread) and verified in the debugger that the switch operates as expected. The code change is also relatively isolated and has gotten some testing in our 9.0 development branches.
More details about the code change
This is a modified backport of #105548. It mostly preserves the logic of the original fix in .NET 9 with a few adjustments: