Skip to content

Disable core dumps when running libraries tests on macOS#65405

Merged
elinor-fung merged 1 commit intodotnet:mainfrom
elinor-fung:macNoDump
Feb 16, 2022
Merged

Disable core dumps when running libraries tests on macOS#65405
elinor-fung merged 1 commit intodotnet:mainfrom
elinor-fung:macNoDump

Conversation

@elinor-fung
Copy link
Member

@elinor-fung elinor-fung commented Feb 15, 2022

We had a single PR - with a real bug crashing all the tests - take out the whole queue. Disabling core dumps for test runs on macOS entirely until we have some better story.

cc @dotnet/area-infrastructure-libraries @danmoseley @stephentoub

@ghost
Copy link

ghost commented Feb 15, 2022

Tagging subscribers to this area: @dotnet/area-infrastructure-libraries
See info in area-owners.md if you want to be subscribed.

Issue Details

@dotnet/area-infrastructure-libraries

Author: elinor-fung
Assignees: -
Labels:

area-Infrastructure-libraries

Milestone: -

# See discussions in:
# https://github.com/dotnet/core-eng/issues/15333
# https://github.com/dotnet/core-eng/issues/15597
ulimit -c 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will disable system cores - are there any places where we hook up the runtimes coredump features like we do for the coreclr style tests? I am not aware of any, and they shouldn't be as bad in terms of size. Those could let us still get dumps reasonably.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find that for libraries tests.

I did query for dump uploads from runtime/non-libraries tests which I believe use createdump and they still look in the 6 GB range?

WorkItems
| where QueueName startswith "osx.1200.amd64"
| where Started > ago(10d)
| join (Files | where FileName startswith "core" ) on WorkItemId
| join kind=inner Jobs on JobId
| project
  Queued,
  WorkItemFriendlyName, ExitCode, QueueName,
  PhaseName = tostring(parse_json(Properties)["System.PhaseName"]),
  Pipeline = tostring(parse_json(Properties).DefinitionName),
  FileName,
  SizeBytesLong,
  Source,
  Build = tostring(parse_json(Properties).BuildNumber)
| where Pipeline == "runtime"
| where PhaseName !startswith("libraries")
| order by Queued desc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's two things: we should remove this https://github.com/dotnet/runtime/blob/512a9ffcb8aa99ffc25fac76dc0c431543e45ba9/src/tests/Common/Coreclr.TestWrapper/CoreclrTestWrapperLib.cs#L223 and yeah, we are using --with-heap. That's pretty much grab everything in the managed process. It gives us the best results, but if the size is too large maybe we should consider dropping to --normal or --triage to at least get stacks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if the libraries tests could automatically wrap in coredump. I can't remember ever needing a dump from any process other than the test process (or possibly a child). So we could leave system dumps off at that point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also update here and set DbgMiniDumpType, I expect:

<DbgEnableMiniDump Condition="'$(TargetsWindows)' != 'true'">1</DbgEnableMiniDump> <!-- Enable minidumps for all scenarios -->
<DbgEnableElfDumpOnMacOS Condition="'$(TargetsOSX)' == 'true'">1</DbgEnableElfDumpOnMacOS> <!-- Enable minidumps for OSX -->
<DbgMiniDumpName Condition="'$(TargetsWindows)' != 'true'">$HELIX_DUMP_FOLDER/coredump.%d.dmp</DbgMiniDumpName>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #65422

@elinor-fung elinor-fung merged commit 03494f3 into dotnet:main Feb 16, 2022
@elinor-fung elinor-fung deleted the macNoDump branch February 16, 2022 02:51
@danmoseley
Copy link
Member

We're expecting to enable this again pretty soon - matter of days, hopefully? Just waiting on @ChadNedzlek to put in some throttling?

@elinor-fung
Copy link
Member Author

Even with the throttling, we would still have caused a lot of stress - it would have self-resolved eventually, but since the number of work items is more than the number of machines, a single PR causing wide-spread crashes could take out the entire queue for hours (there was a solid First Responders thread on this).

Related: https://github.com/dotnet/core-eng/issues/15598

@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants