Use SOS to dump managed stack traces from a dump on Windows #82867

jkoritzinsky · 2023-03-01T23:56:00Z

Download SOS as part of our Helix jobs and load it into cdb to dump managed stack traces when a test crashes or times out.

Also introduces a synthetic test failures to validate behavior (will be removed before merging).

ghost · 2023-03-01T23:56:06Z

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

ghost · 2023-03-01T23:59:25Z

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

Issue Details

Download SOS as part of our Helix jobs and load it into cdb to dump managed stack traces when a test crashes or times out.

Also introduces a synthetic test failures to validate behavior (will be removed before merging).

Author:	jkoritzinsky
Assignees:	jkoritzinsky
Labels:	`area-Infrastructure`
Milestone:	-

stephentoub · 2023-03-02T01:27:13Z

src/tests/Common/helixpublishwitharcade.proj

Before dotnet-sos existed, we used clrmd directly in RemoteExecutor:
https://github.com/dotnet/arcade/blob/bdc59254cf108e1d48451dc43bb9ebc331cdca7b/src/Microsoft.DotNet.RemoteExecutor/src/RemoteInvokeHandle.cs#L177-L225
We might want to standardize on one or the other.

ClrMD there is doing process attach only for timeouts, whereas we're handling both timeouts and crashes here, so it's a little different. Definitely worth considering consolidation though.

There were some talks about standardizing this sort of mechanism (dumping stack traces from a dump to stdout for the build analysis tooling), but with the recent changes to the EngSrv teams, I don't know how much of that work will still happen.

Right, I understand they're handling different cases. But they can both handle both, so it's overkill to have two different tools used for the same purpose. Simply suggesting we choose one and stick with it.

I'll look at the RemoteExecutor implementation and see if we can consolidate something useful.

Cool, thanks.

We'll need to use something like this pattern to get both native and managed stack traces, and we'll likely want to use this model as well to solve #83047 (which focuses more on crashes than timeouts) if we don't move to using dotnet test for the libraries tests in Helix. The RemoteExecutor version has a much better UX around the output though as it has more structure to work with (from CLRMD's APIs).

src/tests/Common/Coreclr.TestWrapper/CoreclrTestWrapperLib.cs

jkoritzinsky · 2023-03-07T19:17:19Z

Marking this ready for review as the change works (see the logs for the CoreCLR windows runtime tests), but marked as no-merge as I need to remove the induced test failure.

@hoyosjs for review.

src/tests/Common/helixpublishwitharcade.proj

hoyosjs · 2023-03-15T20:56:57Z

@jkoritzinsky this looks good now. It's not loading the PDBs for managed for for line numbers, but that's separate.

jkoritzinsky · 2023-03-16T20:14:34Z

Networking failures are known issues, wasm timeout is unrelated. Merging this in. I'll file an issue to follow-up on providing a consistent story for dumping stacks on crashes.

ghost assigned jkoritzinsky Mar 1, 2023

jkoritzinsky added the area-Infrastructure label Mar 1, 2023

stephentoub reviewed Mar 2, 2023

View reviewed changes

danmoseley reviewed Mar 2, 2023

View reviewed changes

src/tests/Common/Coreclr.TestWrapper/CoreclrTestWrapperLib.cs Outdated Show resolved Hide resolved

runfoapp bot mentioned this pull request Mar 2, 2023

Test failure: System.Security.Cryptography.X509Certificates.Tests.CertificateCreation.CertificateRequestChainTests/CreateChain_Hybrid #25979

Closed

jkoritzinsky force-pushed the sos branch from f0c5718 to 4108e0d Compare March 2, 2023 18:21

runfoapp bot mentioned this pull request Mar 2, 2023

Long Running Test: Interop/MonoAPI/MonoMono/PInvokeDetach/PInvokeDetach.sh #73040

Closed

jkoritzinsky added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Mar 7, 2023

jkoritzinsky marked this pull request as ready for review March 7, 2023 19:16

hoyosjs approved these changes Mar 15, 2023

View reviewed changes

src/tests/Common/helixpublishwitharcade.proj Outdated Show resolved Hide resolved

build-analysis bot mentioned this pull request Mar 15, 2023

[release/6.0] Doublelinklist GC failures on Mono #83245

Closed

jkoritzinsky added 2 commits March 15, 2023 17:27

Download and install sos to use it to dump managed stacks.

c61415f

Specify target arch

c70f311

jkoritzinsky force-pushed the sos branch from abb89b7 to c70f311 Compare March 16, 2023 00:36

jkoritzinsky removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Mar 16, 2023

jkoritzinsky mentioned this pull request Mar 16, 2023

Crashes when initializing System.Net.Security.Native #83540

Closed

jkoritzinsky merged commit ef43a7b into dotnet:main Mar 16, 2023

jkoritzinsky deleted the sos branch March 16, 2023 20:14

am11 mentioned this pull request Mar 29, 2023

Add mariner build images dotnet/dotnet-buildtools-prereqs-docker#832

Merged

ghost locked as resolved and limited conversation to collaborators Apr 16, 2023

Use SOS to dump managed stack traces from a dump on Windows #82867

Use SOS to dump managed stack traces from a dump on Windows #82867

Uh oh!

Conversation

jkoritzinsky commented Mar 1, 2023

Uh oh!

ghost commented Mar 1, 2023

Uh oh!

ghost commented Mar 1, 2023

Uh oh!

stephentoub Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

jkoritzinsky Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

jkoritzinsky Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

stephentoub Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

jkoritzinsky Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

stephentoub Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

jkoritzinsky Mar 7, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jkoritzinsky commented Mar 7, 2023

Uh oh!

Uh oh!

hoyosjs commented Mar 15, 2023

Uh oh!

jkoritzinsky commented Mar 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants