Fix SegmentResortChains synchronization by ChrisAhna · Pull Request #107468 · dotnet/runtime

ChrisAhna · 2024-09-06T14:21:11Z

See the issue for details on the problem that is being fixed.

With this change in place, the repro app discussed in the issue runs indefinitely without ever failing (in the same environment where, without this change, the repro app always gets stuck in an infinite loop within the first several minutes of execution). This observed behavior is consistent across Checked and Release builds of the rutime.

Note that the new lock acquire statement in StandardSegmentIterator is identical to the ones which already run in FullSegmentIterator under certain conditions.

The new code only translates to a new lock acquire during any Gen1-or-larger foreground GCs which specifically occur in an environment where preceding handle table activity in the process has driven "whole-block" allocs/frees and has therefore forced "fResortChains" back to true.

I believe that the handle table lock is generally lightly contended while foreground GC is in progress. So even in scenarios which manage to chronically drive "whole-block" allocs/frees in the steady state (and thus push the new lock acquire to the worst-case "one per handle table per Gen1-or-larger foreground GC" rate), I believe that the cost of the new lock acquire will generally be negligible.

(Other than adding a new lock acquire, I believe the only other option for fixing this problem would be to somehow rework the system to eliminate and ban any notion that handle table slots can be created or destroyed by the coreclr!Thread ctor or any other preemptive-mode code.)

jkotas · 2024-09-06T15:12:28Z

Great catch!

rework the system to eliminate and ban any notion that handle table slots can be created or destroyed by the coreclr!Thread ctor or any other preemptive-mode code

The several VM callsites that create/destroy GC handles in preemptive mode are likely a corner-case bug farm. We may want to open an issue on refactoring the code so that the GC handles are created/destroyed in cooperative mode only.

This fix LGTM as something that is easy to backport.

jkotas

Thank you!

ChrisAhna · 2024-09-06T18:26:15Z

Thanks for the reviews!

Fix SegmentResortChains synchronization

0d6599e

ChrisAhna added the area-VM-coreclr label Sep 6, 2024

ChrisAhna requested review from Maoni0, davidwrighton and jkotas September 6, 2024 14:21

ChrisAhna self-assigned this Sep 6, 2024

jkotas approved these changes Sep 6, 2024

View reviewed changes

jkotas mentioned this pull request Sep 6, 2024

Remove Helper Method Frames for Exception, GC and Thread methods #107218

Merged

AaronRobinsonMSFT approved these changes Sep 6, 2024

View reviewed changes

ChrisAhna merged commit 1c4755d into dotnet:main Sep 6, 2024

marklio added the tenet-reliability Reliability/stability related issue (stress, load problems, etc.) label Sep 13, 2024

jtschuster pushed a commit to jtschuster/runtime that referenced this pull request Sep 17, 2024

Fix SegmentResortChains synchronization (dotnet#107468)

16494ec

sirntar pushed a commit to sirntar/runtime that referenced this pull request Sep 30, 2024

Fix SegmentResortChains synchronization (dotnet#107468)

b53f2db

github-actions bot locked and limited conversation to collaborators Oct 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix SegmentResortChains synchronization#107468

Fix SegmentResortChains synchronization#107468
ChrisAhna merged 1 commit intodotnet:mainfrom
ChrisAhna:chrisahna-resortchainsfix

ChrisAhna commented Sep 6, 2024

Uh oh!

jkotas commented Sep 6, 2024

Uh oh!

jkotas left a comment

Uh oh!

ChrisAhna commented Sep 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ChrisAhna commented Sep 6, 2024

Uh oh!

jkotas commented Sep 6, 2024

Uh oh!

jkotas left a comment

Choose a reason for hiding this comment

Uh oh!

ChrisAhna commented Sep 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants