Add support for (experimental) profile-based trimming #108049

MichalStrehovsky · 2024-09-20T06:18:39Z

This was my hackathon project of this year. Since it’s in a pretty leaf-y location within the product, I think it would be fine to check it in under unsupported switches for experimentation.

The question I was trying to answer is "How can we identify code that is statically reachable but is not needed at runtime (and we might be able to get rid of it by reorganizing the code a bit)?". The answer to that is profiling and profile-based code generation.

The idea is simple:

We build a special version of the program that keeps track of what methods executed.
At process termination, this data is written out to a file.
We then recompile the program, passing in the method list as one of the inputs.
When compiling a method that was in the list, compile as usual.
When compiling a method that wasn’t in the list, replace it with a failfast.
We still run the usual dependency analysis so the failfast methods are going to stop graph expansion at method boundaries. One could do better than this (cut off at basic block boundaries) but that’s a lot more work with a small benefit.

The profiling file format is simple: list of assembly MVIDs, followed by an array of bools. Index within the array corresponds to a MethodDef token within the assembly. So if bool at index N of assembly A is true, method in assembly with MVID A at RID N is reachable. Using MVIDs (GUID) instead of assembly name helps identify mismatches between profile data and input assemblies. Best to build with -p:Deterministic=true.

Usage

Build the app with -p:_InstrumentReachability=true. Run the app and exercise all the necessary codepaths. This produces a reach.mprof file in the current directory. You can run the app multiple times, the profile data will get merged automatically.

Then build the app again with -p:_ReachabilityInstrumentationFile=reach.mprof. Output of this build will be a profile-based version.

Just to give an idea of how small things will be:

dotnet new webapiaot: 4.1 MB (down from 8.7 MB)
Hello World with OptimizationPreference=Size, StackTraceSupport=false, UseSystemResourceKeys=true: 425 kB

The app may or may not actually work. Hello world works fine. Webapiaot hits an issue where our ability to optimize things better in the profile-based version leads to new codepaths being executed. It can be worked around by forcing something that triggers the optimization into profile data. For example, for webapiaot this helps:

typeof(Holder).GetCustomAttribute<MyAttribute>().TheType.IsConstructedGenericType.ToString();
class MyAttribute : Attribute { public MyAttribute(Type t) => TheType = t; public Type TheType; }
class Gen<T> { }
[My(typeof(Gen<>))]
class Holder { }

(We need to call IsConstructedGenericType on a type that got its MethodTable optimized away.)

It is not strictly necessary for the outputs to be runnable for this to be useful. The idea is to generate DGML/MSTAT of the unprofiled version, then DGML/MSTAT of the profiled version, and diff them. The diff might highlight parts of the app that could maybe be removed by reorganizing the code.

Not everything will be removable. For example, in the Hello World, most of exception handling is gone, but one can still run into exceptions at runtime (I just didn’t when I profiled it).

Implementation

There are 3 parts: compiler component to generate instrumented outputs, CoreLib change to save profile data on app exist, and compiler component to consume profile data.

Generating profile data

ReachabilityInstrumentationProvider is the main workhorse. It’s an ILProvider that wraps whatever IL we got from the input assembly and prefixes each method body with two instructions: ldc.i4.1 followed by stsfld. The stsfld targets a compiler-generated RVA static field. The compiler lays out all these fields in a way that their in-memory positions correspond to profile data positions (so saving profile data just means copying a memory range to a file).

ReachabilityDataBlobNode is responsible for creating the data blob an laying out all the RVA static fields that the code refers to.

Last but not least, InitializeMethod generates a small stub that informs corelib where to find the data blob at runtime. We hook up InitializeMethod into StartupCodeMain.

Saving profile data

Within CoreLib, we define two methods - one is called at startup and informs us where profile data blob lives. The other is the very last managed method executed. It saves the data blob to a file.

Consuming profile data

This is another ILProvider that either returns the underlying IL unmodified, or replaces it with a failfast call.

Cc @dotnet/ilc-contrib

This was my hackathon project of this year. Since it’s in a pretty leaf-y location within the product, I think it would be fine to check it in under unsupported switches for experimentation. The question I was trying to answer is "How can we identify code that is statically reachable but is not needed at runtime (and we might be able to get rid of it by reorganizing the code a bit)?". The answer to that is profiling and profile-based code generation. The idea is simple: * We build a special version of the program that keeps track of what methods executed. * At process termination, this data is written out to a file. * We then recompile the program, passing in the method list as one of the inputs. * When compiling a method that was in the list, compile as usual. * When compiling a method that wasn’t in the list, replace it with a failfast. * We still run the usual dependency analysis so the failfast methods are going to stop graph expansion at method boundaries. One could do better than this (cut off at basic block boundaries) but that’s a lot more work with a small benefit. The profiling file format is simple: list of assembly MVIDs, followed by an array of bools. Index within the array corresponds to a MethodDef token within the assembly. So if bool at index N of assembly A is true, method in assembly with MVID A at RID N is reachable. Using MVIDs (GUID) instead of assembly name helps identify mismatches between profile data and input assemblies. Best to build with `-p:Deterministic=true`. # Usage Build the app with `-p:_InstrumentReachability=true`. Run the app and exercise all the necessary codepaths. This produces a reach.mprof file in the current directory. You can run the app multiple times, the profile data will get merged automatically. Then build the app again with `-p:_ReachabilityInstrumentationFile=reach.mprof`. Output of this build will be a profile-based version. Just to give an idea of how small things will be: * dotnet new webapiaot: 4.1 MB (down from 8.7 MB) * Hello World with OptimizationPreference=Size, StackTraceSupport=false, UseSystemResourceKeys=true: 425 kB The app may or may not actually work. Hello world works fine. Webapiaot hits an issue where our ability to optimize things better in the profile-based version leads to new codepaths being executed. It can be worked around by forcing something that triggers the optimization into profile data. For example, for webapiaot this helps: ```csharp typeof(Holder).GetCustomAttribute<MyAttribute>().TheType.IsConstructedGenericType.ToString(); class MyAttribute : Attribute { public MyAttribute(Type t) => TheType = t; public Type TheType; } class Gen<T> { } [My(typeof(Gen<>))] class Holder { } ``` (We need to call `IsConstructedGenericType` on a type that got its MethodTable optimized away.) It is not strictly necessary for the outputs to be runnable for this to be useful. The idea is to generate DGML/MSTAT of the unprofiled version, then DGML/MSTAT of the profiled version, and diff them. The diff might highlight parts of the app that could maybe be removed by reorganizing the code. Not everything will be removable. For example, in the Hello World, most of exception handling is gone, but one can still run into exceptions at runtime (I just didn’t when I profiled it). # Implementation There are 3 parts: compiler component to generate instrumented outputs, CoreLib change to save profile data on app exist, and compiler component to consume profile data. ## Generating profile data ReachabilityInstrumentationProvider is the main workhorse. It’s an ILProvider that wraps whatever IL we got from the input assembly and prefixes each method body with two instructions: ldc.i4.1 followed by stsfld. The stsfld targets a compiler-generated RVA static field. The compiler lays out all these fields in a way that their in-memory positions correspond to profile data positions (so saving profile data just means copying a memory range to a file). ReachabilityDataBlobNode is responsible for creating the data blob an laying out all the RVA static fields that the code refers to. Last but not least, InitializeMethod generates a small stub that informs corelib where to find the data blob at runtime. We hook up InitializeMethod into StartupCodeMain. ## Saving profile data Within CoreLib, we define two methods - one is called at startup and informs us where profile data blob lives. The other is the very last managed method executed. It saves the data blob to a file. ## Consuming profile data This is another ILProvider that either returns the underlying IL unmodified, or replaces it with a failfast call.

dotnet-policy-service · 2024-09-20T06:19:06Z

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

EgorBo · 2024-09-20T11:29:05Z

Do you plan to use MIBC for this? (so not only you can extract the reachability info, but also improve performance with PGO)

kekekeks · 2024-09-20T11:35:30Z

Will profile be only available with NAOT builds? i. e. would one still have to make sure that the trimmed app still runs with NAOT first to collect the profile?

I was thinking about generating such profile for non-corlib stuff using CoreCLR profiling APIs while forcing various codepaths to think that the code runs with NAOT by IL-patching IsDynamicCodeSupported and RuntimeInformation.FrameworkDescription.

kekekeks · 2024-09-20T11:39:14Z

I guess two-stage profiling would be quite useful. A CoreCLR-based one to make the app to run with NAOT in the first place (without having to spend several hours on adjusting trimming configuration by trial and error) and then the precise one from instrumented NativeAOT build to produce a smaller binary.

GerardSmit · 2024-09-20T12:12:01Z

You can run the app multiple times, the profile data will get merged automatically.

Does this mean that:

Create a web application
Create a new controller
Visit the controller, generate the reach.mprof-file and commit it to git
Create a new controller
Only visit the new controller and let the profile data merge into the reach.mprof-file

That both controllers still get included (including all the underlying actions, which could run more code like DB access)?
Or do you need to recreate the reach.mprof-file every time (visit both controllers before publishing)

MichalStrehovsky · 2024-09-20T14:14:57Z

Do you plan to use MIBC for this? (so not only you can extract the reachability info, but also improve performance with PGO)

This only logs what methods run, no basic blocks. I went for 20% of effort and 80% of effect. If we ever have proper profile collection, half of this could probably be deleted, we wouldn't need the instrumentation and corelib part of this.

You can run the app multiple times, the profile data will get merged automatically.

Does this mean that:

You can run it multiple times, you cannot recompile it. The file format uses tokens and MVIDs. If you change stuff, they get shuffled and update is rejected.

To be very clear, the only purpose of this is to:

Find out the best possible scenario when it comes to size (how much more one could save in the very ideal and unrealistic case)
Find out if there are any things that could be factored differently so that regular trimming can get rid of them.

Do not ever ship anything compiled like this, it explodes randomly (e.g. if you never profiled contended lock situation and a lock in your app becomes contended, the app will just crash). I even did the extra effort to start the MSBuild properties that activate this with underscores to deter anyone from checking in code that has this.

EgorBo · 2024-09-20T14:30:13Z

This only logs what methods run, no basic blocks.

MIBC is expected to collect data about basic blocks, it's just that by default it may skip some blocks/methods to make it lightweight. That can be changed via

DOTNET_JitEdgeProfiling=0
DOTNET_JitMinimalJitProfiling=0

I went for 20% of effort and 80% of effect.

Understandable

agocke

We did a group code review and I asked most questions there, so LGTM thanks

agocke · 2024-11-08T21:38:55Z

src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/ReachabilityInstrumentationProvider.cs

+            public override TypeSystemContext Context
+            {
+                get
+                {
+                    return _context;
+                }
+            }


FYI you could use an auto-property to make these shorter if you want, e.g.

Suggested change

public override TypeSystemContext Context

{

get

{

return _context;

}

}

public override TypeSystemContext Context { get; }

MichalStrehovsky · 2024-11-11T14:50:26Z

/ba-g build took too long and timed out in an unrelated leg

* Add support for (experimental) profile-based trimming This was my hackathon project of this year. Since it’s in a pretty leaf-y location within the product, I think it would be fine to check it in under unsupported switches for experimentation. The question I was trying to answer is "How can we identify code that is statically reachable but is not needed at runtime (and we might be able to get rid of it by reorganizing the code a bit)?". The answer to that is profiling and profile-based code generation. The idea is simple: * We build a special version of the program that keeps track of what methods executed. * At process termination, this data is written out to a file. * We then recompile the program, passing in the method list as one of the inputs. * When compiling a method that was in the list, compile as usual. * When compiling a method that wasn’t in the list, replace it with a failfast. * We still run the usual dependency analysis so the failfast methods are going to stop graph expansion at method boundaries. One could do better than this (cut off at basic block boundaries) but that’s a lot more work with a small benefit. The profiling file format is simple: list of assembly MVIDs, followed by an array of bools. Index within the array corresponds to a MethodDef token within the assembly. So if bool at index N of assembly A is true, method in assembly with MVID A at RID N is reachable. Using MVIDs (GUID) instead of assembly name helps identify mismatches between profile data and input assemblies. Best to build with `-p:Deterministic=true`. # Usage Build the app with `-p:_InstrumentReachability=true`. Run the app and exercise all the necessary codepaths. This produces a reach.mprof file in the current directory. You can run the app multiple times, the profile data will get merged automatically. Then build the app again with `-p:_ReachabilityInstrumentationFile=reach.mprof`. Output of this build will be a profile-based version. Just to give an idea of how small things will be: * dotnet new webapiaot: 4.1 MB (down from 8.7 MB) * Hello World with OptimizationPreference=Size, StackTraceSupport=false, UseSystemResourceKeys=true: 425 kB The app may or may not actually work. Hello world works fine. Webapiaot hits an issue where our ability to optimize things better in the profile-based version leads to new codepaths being executed. It can be worked around by forcing something that triggers the optimization into profile data. For example, for webapiaot this helps: ```csharp typeof(Holder).GetCustomAttribute<MyAttribute>().TheType.IsConstructedGenericType.ToString(); class MyAttribute : Attribute { public MyAttribute(Type t) => TheType = t; public Type TheType; } class Gen<T> { } [My(typeof(Gen<>))] class Holder { } ``` (We need to call `IsConstructedGenericType` on a type that got its MethodTable optimized away.) It is not strictly necessary for the outputs to be runnable for this to be useful. The idea is to generate DGML/MSTAT of the unprofiled version, then DGML/MSTAT of the profiled version, and diff them. The diff might highlight parts of the app that could maybe be removed by reorganizing the code. Not everything will be removable. For example, in the Hello World, most of exception handling is gone, but one can still run into exceptions at runtime (I just didn’t when I profiled it). # Implementation There are 3 parts: compiler component to generate instrumented outputs, CoreLib change to save profile data on app exist, and compiler component to consume profile data. ## Generating profile data ReachabilityInstrumentationProvider is the main workhorse. It’s an ILProvider that wraps whatever IL we got from the input assembly and prefixes each method body with two instructions: ldc.i4.1 followed by stsfld. The stsfld targets a compiler-generated RVA static field. The compiler lays out all these fields in a way that their in-memory positions correspond to profile data positions (so saving profile data just means copying a memory range to a file). ReachabilityDataBlobNode is responsible for creating the data blob an laying out all the RVA static fields that the code refers to. Last but not least, InitializeMethod generates a small stub that informs corelib where to find the data blob at runtime. We hook up InitializeMethod into StartupCodeMain. ## Saving profile data Within CoreLib, we define two methods - one is called at startup and informs us where profile data blob lives. The other is the very last managed method executed. It saves the data blob to a file. ## Consuming profile data This is another ILProvider that either returns the underlying IL unmodified, or replaces it with a failfast call.

MichalStrehovsky added the area-NativeAOT-coreclr label Sep 20, 2024

MichalStrehovsky self-assigned this Sep 20, 2024

Fixes

3f1af03

am11 added the Hackathon Issues picked for Hackathon label Sep 21, 2024

build-analysis bot mentioned this pull request Sep 22, 2024

Test failure: Assertion failed 'likelihood <= 1.0' #108100

Closed

agocke approved these changes Nov 8, 2024

View reviewed changes

MichalStrehovsky added 2 commits November 11, 2024 12:55

Merge branch 'main' into instrtrim

1599456

Feedback

70acb34

MichalStrehovsky merged commit 2c29c1d into dotnet:main Nov 11, 2024
86 of 88 checks passed

MichalStrehovsky deleted the instrtrim branch November 11, 2024 14:50

build-analysis bot mentioned this pull request Nov 11, 2024

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

matouskozak mentioned this pull request Nov 15, 2024

[Perf] Windows/x64: 4 Improvements on 11/13/2024 8:07:55 AM dotnet/perf-autofiling-issues#44853

Closed

github-actions bot locked and limited conversation to collaborators Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for (experimental) profile-based trimming #108049

Add support for (experimental) profile-based trimming #108049

Uh oh!

MichalStrehovsky commented Sep 20, 2024

Uh oh!

dotnet-policy-service bot commented Sep 20, 2024

Uh oh!

EgorBo commented Sep 20, 2024

Uh oh!

kekekeks commented Sep 20, 2024

Uh oh!

kekekeks commented Sep 20, 2024

Uh oh!

GerardSmit commented Sep 20, 2024 •

edited

Loading

Uh oh!

MichalStrehovsky commented Sep 20, 2024

Uh oh!

EgorBo commented Sep 20, 2024 •

edited

Loading

Uh oh!

agocke left a comment

Uh oh!

agocke Nov 8, 2024

Uh oh!

MichalStrehovsky commented Nov 11, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Add support for (experimental) profile-based trimming #108049

Add support for (experimental) profile-based trimming #108049

Uh oh!

Conversation

MichalStrehovsky commented Sep 20, 2024

Usage

Implementation

Generating profile data

Saving profile data

Consuming profile data

Uh oh!

dotnet-policy-service bot commented Sep 20, 2024

Uh oh!

EgorBo commented Sep 20, 2024

Uh oh!

kekekeks commented Sep 20, 2024

Uh oh!

kekekeks commented Sep 20, 2024

Uh oh!

GerardSmit commented Sep 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MichalStrehovsky commented Sep 20, 2024

Uh oh!

EgorBo commented Sep 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agocke left a comment

Choose a reason for hiding this comment

Uh oh!

agocke Nov 8, 2024

Choose a reason for hiding this comment

Uh oh!

MichalStrehovsky commented Nov 11, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

GerardSmit commented Sep 20, 2024 •

edited

Loading

EgorBo commented Sep 20, 2024 •

edited

Loading