Skip to content

Conversation

@jkoritzinsky
Copy link
Member

Download SOS as part of our Helix jobs and load it into cdb to dump managed stack traces when a test crashes or times out.

Also introduces a synthetic test failures to validate behavior (will be removed before merging).

@ghost
Copy link

ghost commented Mar 1, 2023

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@ghost
Copy link

ghost commented Mar 1, 2023

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

Issue Details

Download SOS as part of our Helix jobs and load it into cdb to dump managed stack traces when a test crashes or times out.

Also introduces a synthetic test failures to validate behavior (will be removed before merging).

Author: jkoritzinsky
Assignees: jkoritzinsky
Labels:

area-Infrastructure

Milestone: -

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClrMD there is doing process attach only for timeouts, whereas we're handling both timeouts and crashes here, so it's a little different. Definitely worth considering consolidation though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were some talks about standardizing this sort of mechanism (dumping stack traces from a dump to stdout for the build analysis tooling), but with the recent changes to the EngSrv teams, I don't know how much of that work will still happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I understand they're handling different cases. But they can both handle both, so it's overkill to have two different tools used for the same purpose. Simply suggesting we choose one and stick with it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look at the RemoteExecutor implementation and see if we can consolidate something useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to use something like this pattern to get both native and managed stack traces, and we'll likely want to use this model as well to solve #83047 (which focuses more on crashes than timeouts) if we don't move to using dotnet test for the libraries tests in Helix. The RemoteExecutor version has a much better UX around the output though as it has more structure to work with (from CLRMD's APIs).

@jkoritzinsky
Copy link
Member Author

Marking this ready for review as the change works (see the logs for the CoreCLR windows runtime tests), but marked as no-merge as I need to remove the induced test failure.

@hoyosjs for review.

@hoyosjs
Copy link
Member

hoyosjs commented Mar 15, 2023

@jkoritzinsky this looks good now. It's not loading the PDBs for managed for for line numbers, but that's separate.

@jkoritzinsky jkoritzinsky removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Mar 16, 2023
@jkoritzinsky
Copy link
Member Author

Networking failures are known issues, wasm timeout is unrelated. Merging this in. I'll file an issue to follow-up on providing a consistent story for dumping stacks on crashes.

@jkoritzinsky jkoritzinsky merged commit ef43a7b into dotnet:main Mar 16, 2023
@jkoritzinsky jkoritzinsky deleted the sos branch March 16, 2023 20:14
@ghost ghost locked as resolved and limited conversation to collaborators Apr 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants