-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Use SOS to dump managed stack traces from a dump on Windows #82867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label. |
|
Tagging subscribers to this area: @dotnet/runtime-infrastructure Issue DetailsDownload SOS as part of our Helix jobs and load it into cdb to dump managed stack traces when a test crashes or times out. Also introduces a synthetic test failures to validate behavior (will be removed before merging).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before dotnet-sos existed, we used clrmd directly in RemoteExecutor:
https://github.com/dotnet/arcade/blob/bdc59254cf108e1d48451dc43bb9ebc331cdca7b/src/Microsoft.DotNet.RemoteExecutor/src/RemoteInvokeHandle.cs#L177-L225
We might want to standardize on one or the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClrMD there is doing process attach only for timeouts, whereas we're handling both timeouts and crashes here, so it's a little different. Definitely worth considering consolidation though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were some talks about standardizing this sort of mechanism (dumping stack traces from a dump to stdout for the build analysis tooling), but with the recent changes to the EngSrv teams, I don't know how much of that work will still happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I understand they're handling different cases. But they can both handle both, so it's overkill to have two different tools used for the same purpose. Simply suggesting we choose one and stick with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look at the RemoteExecutor implementation and see if we can consolidate something useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to use something like this pattern to get both native and managed stack traces, and we'll likely want to use this model as well to solve #83047 (which focuses more on crashes than timeouts) if we don't move to using dotnet test for the libraries tests in Helix. The RemoteExecutor version has a much better UX around the output though as it has more structure to work with (from CLRMD's APIs).
|
Marking this ready for review as the change works (see the logs for the CoreCLR windows runtime tests), but marked as no-merge as I need to remove the induced test failure. @hoyosjs for review. |
|
@jkoritzinsky this looks good now. It's not loading the PDBs for managed for for line numbers, but that's separate. |
|
Networking failures are known issues, wasm timeout is unrelated. Merging this in. I'll file an issue to follow-up on providing a consistent story for dumping stacks on crashes. |
Download SOS as part of our Helix jobs and load it into cdb to dump managed stack traces when a test crashes or times out.
Also introduces a synthetic test failures to validate behavior (will be removed before merging).