Skip to content

Attempt to reduce test build cost#74471

Merged
davidwrighton merged 2 commits intodotnet:mainfrom
davidwrighton:try_to_make_tests_faster
Aug 25, 2022
Merged

Attempt to reduce test build cost#74471
davidwrighton merged 2 commits intodotnet:mainfrom
davidwrighton:try_to_make_tests_faster

Conversation

@davidwrighton
Copy link
Member

@davidwrighton davidwrighton commented Aug 24, 2022

An evaluation of the cost of "Copy native test components to test output folder" indicates that there are 2 major performance bottlenecks in that routine.

There are the findings I had

  1. The routine spends significant time loading every test project into msbuild to determine what to do. This processing takes as much as 400ms per test on hardware which has slow IO performance.
  2. On Unix platforms, when producing the test script binaries, we make the scripts executable by using chmod. This ALSO takes as long as 400ms on the slowest platforms.
  3. Actually copying the native binaries around is extremely cheap and fast, and the cost to generate the test scripts is also fairly low.

This change addresses both of these issues.

  1. For the issue that we look into each test project to determine what to do, I observed that for tests which are pri1 tests, when running a pri0 only job, we don't need to even examine any of the tests marked as exclusively pri1. This is a bit of a hack, but it does seem to work quite well.
  2. For the chmod issue, since we don't actually maintain the executable bit when sent to helix, I set an msbuild property when generating the scripts during the "Copy native test components to test output folder" to just not set that flag.

Results:
Mac OSX test run jobs take ~8 minutes to run this portion of the build, down from ~27 minutes. Performance improvements are visible on all architectures/OS pairings, but they aren't as dramatic.

@ghost
Copy link

ghost commented Aug 24, 2022

Tagging subscribers to this area: @hoyosjs
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author: davidwrighton
Assignees: -
Labels:

area-Infrastructure-coreclr

Milestone: -

@davidwrighton davidwrighton marked this pull request as ready for review August 24, 2022 22:14
@davidwrighton
Copy link
Member Author

@jkoritzinsky I think this is still running the same set of tests, but I don't actually know how to validate that. Do you?

@jkoritzinsky
Copy link
Member

I don't think I have a script for verifying that today. @trylek might have one.

I know that Helix is a little unreliable for recording successful tests in Kusto, but otherwise that might be a good place to look.

@hoyosjs
Copy link
Member

hoyosjs commented Aug 24, 2022

My best guess is the AzDO API for the test run, since that's where Helix reports to and then there's back ingestion into Kusto

<!-- Fast filter to avoid files with Pri1 projects, this isn't completely accurate, but its close. Any Pri1 tests not caught by this will be skipped,
and any tests intended as Pri0 will end up running as Pri1, which isn't terrible. This trick allows us to avoid loading each project into msbuild
and evaluating it, which is a substantial improvement to the performance of the "Copy native test components to test output folder" task in our test runs -->
<Pri1Projects Include="@(AllProjects)" Condition="$([System.Text.RegularExpressions.Regex]::IsMatch($([System.IO.File]::ReadAllText('%(AllProjects.FullPath)')), '&lt;CLRTestPriority&gt;1&lt;/CLRTestPriority&gt;'))" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful!

Copy link
Member

@trylek trylek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, thanks David! I suspected there may be some low hanging fruits. One suggestion down on my quality week list :-).

@trylek
Copy link
Member

trylek commented Aug 24, 2022

I don't have any targeted script for evaluating the set of tests run. AzDO UI does report the total number of tests so comparing it to neighboring runs without this change should give us a pretty decent indication whether this is basically fine or whether there's some big problem lurking somewhere.

@trylek
Copy link
Member

trylek commented Aug 24, 2022

When merging the JIT/Methodical tests, I used to compare the XUnit xml outputs from the non-merged and the merged runs until I hit a match but I no longer remember whether I had any automation for this.

@davidwrighton
Copy link
Member Author

I've validated that AzDO thinks we run the same number of tests.

@davidwrighton davidwrighton merged commit 8d4364c into dotnet:main Aug 25, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Sep 24, 2022
@davidwrighton davidwrighton deleted the try_to_make_tests_faster branch April 13, 2023 18:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants