Fix SegmentResortChains synchronization#107468
Merged
ChrisAhna merged 1 commit intodotnet:mainfrom Sep 6, 2024
Merged
Conversation
Member
|
Great catch!
The several VM callsites that create/destroy GC handles in preemptive mode are likely a corner-case bug farm. We may want to open an issue on refactoring the code so that the GC handles are created/destroyed in cooperative mode only. This fix LGTM as something that is easy to backport. |
AaronRobinsonMSFT
approved these changes
Sep 6, 2024
Member
Author
|
Thanks for the reviews! |
jtschuster
pushed a commit
to jtschuster/runtime
that referenced
this pull request
Sep 17, 2024
sirntar
pushed a commit
to sirntar/runtime
that referenced
this pull request
Sep 30, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #107467.
See the issue for details on the problem that is being fixed.
With this change in place, the repro app discussed in the issue runs indefinitely without ever failing (in the same environment where, without this change, the repro app always gets stuck in an infinite loop within the first several minutes of execution). This observed behavior is consistent across Checked and Release builds of the rutime.
Note that the new lock acquire statement in StandardSegmentIterator is identical to the ones which already run in FullSegmentIterator under certain conditions.
The new code only translates to a new lock acquire during any Gen1-or-larger foreground GCs which specifically occur in an environment where preceding handle table activity in the process has driven "whole-block" allocs/frees and has therefore forced "fResortChains" back to true.
I believe that the handle table lock is generally lightly contended while foreground GC is in progress. So even in scenarios which manage to chronically drive "whole-block" allocs/frees in the steady state (and thus push the new lock acquire to the worst-case "one per handle table per Gen1-or-larger foreground GC" rate), I believe that the cost of the new lock acquire will generally be negligible.
(Other than adding a new lock acquire, I believe the only other option for fixing this problem would be to somehow rework the system to eliminate and ban any notion that handle table slots can be created or destroyed by the coreclr!Thread ctor or any other preemptive-mode code.)