Skip to content

Conversation

@BrzVlad
Copy link
Member

@BrzVlad BrzVlad commented Mar 19, 2025

This PR ports over some bridge tests that we had in mono/mono. In order to enforce finalization of some objects that are logically dead, we reuse the FinalizerHelpers.PerformNoPinAction, which runs an Action while ensuring objects allocated by it are seen dead by the GC. Because mono GC is conservative, we go through extra lengths to ensure no pinning occurs.

Mono currently has two algorithms for building the SCC/xref set over the dead bridge objects: new and tarjan. The new bridge is the legacy one which should be more stable and this PR attempts to reuse the bridge output compare functionality to catch potential bugs with the tarjan bridge (which is the default used bridge). When using the bridge compare functionality, the GC will randomly keep alive half of the bridge objects (since we don't have a java counterpart that responds with the liveness). We also pass the additional debug flag bridge=BridgeBase which means that all objects that are instances of BridgeBase or a derived type are treated as a bridge object and they take part in the SCC construction.

This testing approach is very cheap but it has some potential pitfalls: maybe the new bridge itself has some bugs, the output between the bridges differs for benign reasons and the tarjan bridge needs to have some functionality disabled in order for this comparison to work (scc_precise_merge and disable_non_bridge_scc, which are optimizations meant to speed up the bridge processing, even if they don't produce the theoretical correct set of SCCs/xrefs).

Currently there are quite a bit of tests crashing that need investigation. In the future we could also add more complex random graph generators for stress testing.

This PR also fixes a few logging issues in the tarjan bridge due to bitrot.

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @BrzVlad
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds and ports several bridge tests used to ensure proper finalization of logically dead objects and to compare two GC bridge algorithms (legacy "new" and default "tarjan") in the mono runtime. The changes include a new test runner in BridgeTester.cs and a comprehensive set of bridge and graph tests in Bridge.cs to exercise various object linking, fragmentation, and cycle handling scenarios.

Reviewed Changes

Copilot reviewed 2 out of 5 changed files in this pull request and generated 2 comments.

File Description
src/tests/GC/Features/Bridge/BridgeTester.cs Adds a test runner that launches the Bridge test application with specific GC environment variables
src/tests/GC/Features/Bridge/Bridge.cs Introduces multiple bridge/graph test scenarios and helper methods to exercise the GC bridge behavior
Files not reviewed (3)
  • src/mono/mono/metadata/sgen-tarjan-bridge.c: Language not supported
  • src/tests/GC/Features/Bridge/Bridge.csproj: Language not supported
  • src/tests/GC/Features/Bridge/BridgeTester.csproj: Language not supported

@BrzVlad
Copy link
Member Author

BrzVlad commented Mar 19, 2025

cc @Maoni0

@Maoni0
Copy link
Member

Maoni0 commented Mar 19, 2025

this is great! thanks so much @BrzVlad. I think Spider does test the case where we need to include a non bridge object in the graph. but I wonder if we could add a simple test that demonstrates that (which would make debugging much easier to just see how the algo works in general)? basically the idea is that you have multiple bridge objects (or bridge objects that would form multiple SCCs) that point to one non bridge object which points to multiple bridge objects, right?

@BrzVlad
Copy link
Member Author

BrzVlad commented Mar 20, 2025

Given Spider test only has 3 bridge objects allocated, it doesn't make sense to include non bridge objects in the set of SCCs to be passed, since there is a very small number of xrefs that you can have anyway. I think the test was meant to test the ability of the tarjan bridge to ignore large numbers of SCCs (that are irrelevant because they contain only non-bridge objects).

I haven't looked into the inclusion of non bridge objects in the final set of SCCs, but what you are describing sounds like what I would expect to trigger this optimization. Note that when we have bridge output comparison enabled, this optimization is disabled.

These tests were lying around from mono days and they create quite complex graphs. But the NestedCycles test that @filipnavara wrote in order to reproduce a specific crash is a pretty good starting point to see how the algorithm works, if this is what you are asking.

@BrzVlad BrzVlad force-pushed the fix-tarjan-bridge branch from 5a662af to e5daa05 Compare March 24, 2025 07:46
@Maoni0
Copy link
Member

Maoni0 commented Mar 24, 2025

  ScanData *sd = find_data (dyn_array_ptr_get (&registered_bridges, i));

can you also get rid of int here too please?


Refers to: src/mono/mono/metadata/sgen-tarjan-bridge.c:995 in e5daa05. [](commit_id = e5daa05, deletion_comment = False)

Copy link
Member

@Maoni0 Maoni0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than the couple of nits this LGTM! I might add a couple of test cases later when I've looked more at how the bridge works but that doesn't need to be in this PR.

@BrzVlad BrzVlad force-pushed the fix-tarjan-bridge branch from e5daa05 to a526d3c Compare March 24, 2025 10:15
BrzVlad added 4 commits March 24, 2025 12:16
It serves no purpose and it would later crash the runtime since we didn't patch the lockword back in place.
These are ported from some of the bridge tests we had on mono/mono. In order to test them we compare between the output of the new and the tarjan bridge.
@BrzVlad BrzVlad force-pushed the fix-tarjan-bridge branch from a526d3c to 71fc2cc Compare March 24, 2025 10:17
@BrzVlad BrzVlad merged commit e0f1b21 into dotnet:main Mar 24, 2025
112 of 115 checks passed
filipnavara pushed a commit to filipnavara/runtime that referenced this pull request Apr 8, 2025
* [mono][sgen] Fix DUMP_GRAPH debug option build for tarjan bridge

* [mono][sgen] Don't create ScanData* during debug dumping of SCCs

It serves no purpose and it would later crash the runtime since we didn't patch the lockword back in place.

* [mono][sgen] Fix some null deref crashes in DUMP_GRAPH debug option

* [mono][tests] Add bridge tests

These are ported from some of the bridge tests we had on mono/mono. In order to test them we compare between the output of the new and the tarjan bridge.
akoeplinger added a commit that referenced this pull request Apr 11, 2025
* Fix dump_processor_state debug code to compile and work on Android (#112970)

Fix typo in GC bridge comparison message (SCCS -> XREFS)

* [mono] Add a few bridge tests (#113703)

* [mono][sgen] Fix DUMP_GRAPH debug option build for tarjan bridge

* [mono][sgen] Don't create ScanData* during debug dumping of SCCs

It serves no purpose and it would later crash the runtime since we didn't patch the lockword back in place.

* [mono][sgen] Fix some null deref crashes in DUMP_GRAPH debug option

* [mono][tests] Add bridge tests

These are ported from some of the bridge tests we had on mono/mono. In order to test them we compare between the output of the new and the tarjan bridge.

* Fix an edge case in the Tarjan GC bridge that leads to losing xref information (#112825)

* Fix an edge case in the Tarjan SCC that lead to losing xref information

In the Tarjan SCC bridge processing there's a color graph used to find out
connections between SCCs. There was a rare case which only manifested when
a cycle in the object graph points to another cycle that points to a bridge
object. We only recognized direct bridge pointers but not pointers to other
non-bridge SCCs that in turn point to bridges and where we already calculated
the xrefs. These xrefs were then lost.

* Add test case to sgen-bridge-pathologies and add an assert to catch the original bug

* Add review

---------

Co-authored-by: Vlad Brezae <[email protected]>

* [SGen/Tarjan] Handle edge case with node heaviness changing due to deduplication (#113044)

* [SGen/Tarjan] Handle edge case with node heaviness changing due to deduplication

Do early deduplication

Fix Windows build

Add test cases to sgen-bridge-pathologies

* Move test code

* Remove old code

* Add extra check (no change to functionality)

* Disable test on wasm

---------

Co-authored-by: Vlad Brezae <[email protected]>
Co-authored-by: Alexander Köplinger <[email protected]>
filipnavara pushed a commit to filipnavara/runtime that referenced this pull request Apr 15, 2025
* [mono][sgen] Fix DUMP_GRAPH debug option build for tarjan bridge

* [mono][sgen] Don't create ScanData* during debug dumping of SCCs

It serves no purpose and it would later crash the runtime since we didn't patch the lockword back in place.

* [mono][sgen] Fix some null deref crashes in DUMP_GRAPH debug option

* [mono][tests] Add bridge tests

These are ported from some of the bridge tests we had on mono/mono. In order to test them we compare between the output of the new and the tarjan bridge.
@github-actions github-actions bot locked and limited conversation to collaborators Apr 24, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants