-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Add support for (experimental) profile-based trimming #108049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This was my hackathon project of this year. Since it’s in a pretty leaf-y location within the product, I think it would be fine to check it in under unsupported switches for experimentation.
The question I was trying to answer is "How can we identify code that is statically reachable but is not needed at runtime (and we might be able to get rid of it by reorganizing the code a bit)?". The answer to that is profiling and profile-based code generation.
The idea is simple:
* We build a special version of the program that keeps track of what methods executed.
* At process termination, this data is written out to a file.
* We then recompile the program, passing in the method list as one of the inputs.
* When compiling a method that was in the list, compile as usual.
* When compiling a method that wasn’t in the list, replace it with a failfast.
* We still run the usual dependency analysis so the failfast methods are going to stop graph expansion at method boundaries. One could do better than this (cut off at basic block boundaries) but that’s a lot more work with a small benefit.
The profiling file format is simple: list of assembly MVIDs, followed by an array of bools. Index within the array corresponds to a MethodDef token within the assembly. So if bool at index N of assembly A is true, method in assembly with MVID A at RID N is reachable. Using MVIDs (GUID) instead of assembly name helps identify mismatches between profile data and input assemblies. Best to build with `-p:Deterministic=true`.
# Usage
Build the app with `-p:_InstrumentReachability=true`. Run the app and exercise all the necessary codepaths. This produces a reach.mprof file in the current directory. You can run the app multiple times, the profile data will get merged automatically.
Then build the app again with `-p:_ReachabilityInstrumentationFile=reach.mprof`. Output of this build will be a profile-based version.
Just to give an idea of how small things will be:
* dotnet new webapiaot: 4.1 MB (down from 8.7 MB)
* Hello World with OptimizationPreference=Size, StackTraceSupport=false, UseSystemResourceKeys=true: 425 kB
The app may or may not actually work. Hello world works fine. Webapiaot hits an issue where our ability to optimize things better in the profile-based version leads to new codepaths being executed. It can be worked around by forcing something that triggers the optimization into profile data. For example, for webapiaot this helps:
```csharp
typeof(Holder).GetCustomAttribute<MyAttribute>().TheType.IsConstructedGenericType.ToString();
class MyAttribute : Attribute { public MyAttribute(Type t) => TheType = t; public Type TheType; }
class Gen<T> { }
[My(typeof(Gen<>))]
class Holder { }
```
(We need to call `IsConstructedGenericType` on a type that got its MethodTable optimized away.)
It is not strictly necessary for the outputs to be runnable for this to be useful. The idea is to generate DGML/MSTAT of the unprofiled version, then DGML/MSTAT of the profiled version, and diff them. The diff might highlight parts of the app that could maybe be removed by reorganizing the code.
Not everything will be removable. For example, in the Hello World, most of exception handling is gone, but one can still run into exceptions at runtime (I just didn’t when I profiled it).
# Implementation
There are 3 parts: compiler component to generate instrumented outputs, CoreLib change to save profile data on app exist, and compiler component to consume profile data.
## Generating profile data
ReachabilityInstrumentationProvider is the main workhorse. It’s an ILProvider that wraps whatever IL we got from the input assembly and prefixes each method body with two instructions: ldc.i4.1 followed by stsfld. The stsfld targets a compiler-generated RVA static field. The compiler lays out all these fields in a way that their in-memory positions correspond to profile data positions (so saving profile data just means copying a memory range to a file).
ReachabilityDataBlobNode is responsible for creating the data blob an laying out all the RVA static fields that the code refers to.
Last but not least, InitializeMethod generates a small stub that informs corelib where to find the data blob at runtime. We hook up InitializeMethod into StartupCodeMain.
## Saving profile data
Within CoreLib, we define two methods - one is called at startup and informs us where profile data blob lives. The other is the very last managed method executed. It saves the data blob to a file.
## Consuming profile data
This is another ILProvider that either returns the underlying IL unmodified, or replaces it with a failfast call.
|
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas |
|
Do you plan to use MIBC for this? (so not only you can extract the reachability info, but also improve performance with PGO) |
|
Will profile be only available with NAOT builds? i. e. would one still have to make sure that the trimmed app still runs with NAOT first to collect the profile? I was thinking about generating such profile for non-corlib stuff using CoreCLR profiling APIs while forcing various codepaths to think that the code runs with NAOT by IL-patching |
|
I guess two-stage profiling would be quite useful. A CoreCLR-based one to make the app to run with NAOT in the first place (without having to spend several hours on adjusting trimming configuration by trial and error) and then the precise one from instrumented NativeAOT build to produce a smaller binary. |
Does this mean that:
That both controllers still get included (including all the underlying actions, which could run more code like DB access)? |
This only logs what methods run, no basic blocks. I went for 20% of effort and 80% of effect. If we ever have proper profile collection, half of this could probably be deleted, we wouldn't need the instrumentation and corelib part of this.
You can run it multiple times, you cannot recompile it. The file format uses tokens and MVIDs. If you change stuff, they get shuffled and update is rejected. To be very clear, the only purpose of this is to:
Do not ever ship anything compiled like this, it explodes randomly (e.g. if you never profiled contended lock situation and a lock in your app becomes contended, the app will just crash). I even did the extra effort to start the MSBuild properties that activate this with underscores to deter anyone from checking in code that has this. |
MIBC is expected to collect data about basic blocks, it's just that by default it may skip some blocks/methods to make it lightweight. That can be changed via
Understandable |
agocke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did a group code review and I asked most questions there, so LGTM thanks
| public override TypeSystemContext Context | ||
| { | ||
| get | ||
| { | ||
| return _context; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI you could use an auto-property to make these shorter if you want, e.g.
| public override TypeSystemContext Context | |
| { | |
| get | |
| { | |
| return _context; | |
| } | |
| } | |
| public override TypeSystemContext Context { get; } |
|
/ba-g build took too long and timed out in an unrelated leg |
* Add support for (experimental) profile-based trimming
This was my hackathon project of this year. Since it’s in a pretty leaf-y location within the product, I think it would be fine to check it in under unsupported switches for experimentation.
The question I was trying to answer is "How can we identify code that is statically reachable but is not needed at runtime (and we might be able to get rid of it by reorganizing the code a bit)?". The answer to that is profiling and profile-based code generation.
The idea is simple:
* We build a special version of the program that keeps track of what methods executed.
* At process termination, this data is written out to a file.
* We then recompile the program, passing in the method list as one of the inputs.
* When compiling a method that was in the list, compile as usual.
* When compiling a method that wasn’t in the list, replace it with a failfast.
* We still run the usual dependency analysis so the failfast methods are going to stop graph expansion at method boundaries. One could do better than this (cut off at basic block boundaries) but that’s a lot more work with a small benefit.
The profiling file format is simple: list of assembly MVIDs, followed by an array of bools. Index within the array corresponds to a MethodDef token within the assembly. So if bool at index N of assembly A is true, method in assembly with MVID A at RID N is reachable. Using MVIDs (GUID) instead of assembly name helps identify mismatches between profile data and input assemblies. Best to build with `-p:Deterministic=true`.
# Usage
Build the app with `-p:_InstrumentReachability=true`. Run the app and exercise all the necessary codepaths. This produces a reach.mprof file in the current directory. You can run the app multiple times, the profile data will get merged automatically.
Then build the app again with `-p:_ReachabilityInstrumentationFile=reach.mprof`. Output of this build will be a profile-based version.
Just to give an idea of how small things will be:
* dotnet new webapiaot: 4.1 MB (down from 8.7 MB)
* Hello World with OptimizationPreference=Size, StackTraceSupport=false, UseSystemResourceKeys=true: 425 kB
The app may or may not actually work. Hello world works fine. Webapiaot hits an issue where our ability to optimize things better in the profile-based version leads to new codepaths being executed. It can be worked around by forcing something that triggers the optimization into profile data. For example, for webapiaot this helps:
```csharp
typeof(Holder).GetCustomAttribute<MyAttribute>().TheType.IsConstructedGenericType.ToString();
class MyAttribute : Attribute { public MyAttribute(Type t) => TheType = t; public Type TheType; }
class Gen<T> { }
[My(typeof(Gen<>))]
class Holder { }
```
(We need to call `IsConstructedGenericType` on a type that got its MethodTable optimized away.)
It is not strictly necessary for the outputs to be runnable for this to be useful. The idea is to generate DGML/MSTAT of the unprofiled version, then DGML/MSTAT of the profiled version, and diff them. The diff might highlight parts of the app that could maybe be removed by reorganizing the code.
Not everything will be removable. For example, in the Hello World, most of exception handling is gone, but one can still run into exceptions at runtime (I just didn’t when I profiled it).
# Implementation
There are 3 parts: compiler component to generate instrumented outputs, CoreLib change to save profile data on app exist, and compiler component to consume profile data.
## Generating profile data
ReachabilityInstrumentationProvider is the main workhorse. It’s an ILProvider that wraps whatever IL we got from the input assembly and prefixes each method body with two instructions: ldc.i4.1 followed by stsfld. The stsfld targets a compiler-generated RVA static field. The compiler lays out all these fields in a way that their in-memory positions correspond to profile data positions (so saving profile data just means copying a memory range to a file).
ReachabilityDataBlobNode is responsible for creating the data blob an laying out all the RVA static fields that the code refers to.
Last but not least, InitializeMethod generates a small stub that informs corelib where to find the data blob at runtime. We hook up InitializeMethod into StartupCodeMain.
## Saving profile data
Within CoreLib, we define two methods - one is called at startup and informs us where profile data blob lives. The other is the very last managed method executed. It saves the data blob to a file.
## Consuming profile data
This is another ILProvider that either returns the underlying IL unmodified, or replaces it with a failfast call.
This was my hackathon project of this year. Since it’s in a pretty leaf-y location within the product, I think it would be fine to check it in under unsupported switches for experimentation.
The question I was trying to answer is "How can we identify code that is statically reachable but is not needed at runtime (and we might be able to get rid of it by reorganizing the code a bit)?". The answer to that is profiling and profile-based code generation.
The idea is simple:
The profiling file format is simple: list of assembly MVIDs, followed by an array of bools. Index within the array corresponds to a MethodDef token within the assembly. So if bool at index N of assembly A is true, method in assembly with MVID A at RID N is reachable. Using MVIDs (GUID) instead of assembly name helps identify mismatches between profile data and input assemblies. Best to build with
-p:Deterministic=true.Usage
Build the app with
-p:_InstrumentReachability=true. Run the app and exercise all the necessary codepaths. This produces a reach.mprof file in the current directory. You can run the app multiple times, the profile data will get merged automatically.Then build the app again with
-p:_ReachabilityInstrumentationFile=reach.mprof. Output of this build will be a profile-based version.Just to give an idea of how small things will be:
The app may or may not actually work. Hello world works fine. Webapiaot hits an issue where our ability to optimize things better in the profile-based version leads to new codepaths being executed. It can be worked around by forcing something that triggers the optimization into profile data. For example, for webapiaot this helps:
(We need to call
IsConstructedGenericTypeon a type that got its MethodTable optimized away.)It is not strictly necessary for the outputs to be runnable for this to be useful. The idea is to generate DGML/MSTAT of the unprofiled version, then DGML/MSTAT of the profiled version, and diff them. The diff might highlight parts of the app that could maybe be removed by reorganizing the code.
Not everything will be removable. For example, in the Hello World, most of exception handling is gone, but one can still run into exceptions at runtime (I just didn’t when I profiled it).
Implementation
There are 3 parts: compiler component to generate instrumented outputs, CoreLib change to save profile data on app exist, and compiler component to consume profile data.
Generating profile data
ReachabilityInstrumentationProvider is the main workhorse. It’s an ILProvider that wraps whatever IL we got from the input assembly and prefixes each method body with two instructions: ldc.i4.1 followed by stsfld. The stsfld targets a compiler-generated RVA static field. The compiler lays out all these fields in a way that their in-memory positions correspond to profile data positions (so saving profile data just means copying a memory range to a file).
ReachabilityDataBlobNode is responsible for creating the data blob an laying out all the RVA static fields that the code refers to.
Last but not least, InitializeMethod generates a small stub that informs corelib where to find the data blob at runtime. We hook up InitializeMethod into StartupCodeMain.
Saving profile data
Within CoreLib, we define two methods - one is called at startup and informs us where profile data blob lives. The other is the very last managed method executed. It saves the data blob to a file.
Consuming profile data
This is another ILProvider that either returns the underlying IL unmodified, or replaces it with a failfast call.
Cc @dotnet/ilc-contrib