Skip to content

Conversation

@MichalStrehovsky
Copy link
Member

This was my hackathon project of this year. Since it’s in a pretty leaf-y location within the product, I think it would be fine to check it in under unsupported switches for experimentation.

The question I was trying to answer is "How can we identify code that is statically reachable but is not needed at runtime (and we might be able to get rid of it by reorganizing the code a bit)?". The answer to that is profiling and profile-based code generation.

The idea is simple:

  • We build a special version of the program that keeps track of what methods executed.
  • At process termination, this data is written out to a file.
  • We then recompile the program, passing in the method list as one of the inputs.
  • When compiling a method that was in the list, compile as usual.
  • When compiling a method that wasn’t in the list, replace it with a failfast.
  • We still run the usual dependency analysis so the failfast methods are going to stop graph expansion at method boundaries. One could do better than this (cut off at basic block boundaries) but that’s a lot more work with a small benefit.

The profiling file format is simple: list of assembly MVIDs, followed by an array of bools. Index within the array corresponds to a MethodDef token within the assembly. So if bool at index N of assembly A is true, method in assembly with MVID A at RID N is reachable. Using MVIDs (GUID) instead of assembly name helps identify mismatches between profile data and input assemblies. Best to build with -p:Deterministic=true.

Usage

Build the app with -p:_InstrumentReachability=true. Run the app and exercise all the necessary codepaths. This produces a reach.mprof file in the current directory. You can run the app multiple times, the profile data will get merged automatically.

Then build the app again with -p:_ReachabilityInstrumentationFile=reach.mprof. Output of this build will be a profile-based version.

Just to give an idea of how small things will be:

  • dotnet new webapiaot: 4.1 MB (down from 8.7 MB)
  • Hello World with OptimizationPreference=Size, StackTraceSupport=false, UseSystemResourceKeys=true: 425 kB

The app may or may not actually work. Hello world works fine. Webapiaot hits an issue where our ability to optimize things better in the profile-based version leads to new codepaths being executed. It can be worked around by forcing something that triggers the optimization into profile data. For example, for webapiaot this helps:

typeof(Holder).GetCustomAttribute<MyAttribute>().TheType.IsConstructedGenericType.ToString();
class MyAttribute : Attribute { public MyAttribute(Type t) => TheType = t; public Type TheType; }
class Gen<T> { }
[My(typeof(Gen<>))]
class Holder { }

(We need to call IsConstructedGenericType on a type that got its MethodTable optimized away.)

It is not strictly necessary for the outputs to be runnable for this to be useful. The idea is to generate DGML/MSTAT of the unprofiled version, then DGML/MSTAT of the profiled version, and diff them. The diff might highlight parts of the app that could maybe be removed by reorganizing the code.

Not everything will be removable. For example, in the Hello World, most of exception handling is gone, but one can still run into exceptions at runtime (I just didn’t when I profiled it).

Implementation

There are 3 parts: compiler component to generate instrumented outputs, CoreLib change to save profile data on app exist, and compiler component to consume profile data.

Generating profile data

ReachabilityInstrumentationProvider is the main workhorse. It’s an ILProvider that wraps whatever IL we got from the input assembly and prefixes each method body with two instructions: ldc.i4.1 followed by stsfld. The stsfld targets a compiler-generated RVA static field. The compiler lays out all these fields in a way that their in-memory positions correspond to profile data positions (so saving profile data just means copying a memory range to a file).

ReachabilityDataBlobNode is responsible for creating the data blob an laying out all the RVA static fields that the code refers to.

Last but not least, InitializeMethod generates a small stub that informs corelib where to find the data blob at runtime. We hook up InitializeMethod into StartupCodeMain.

Saving profile data

Within CoreLib, we define two methods - one is called at startup and informs us where profile data blob lives. The other is the very last managed method executed. It saves the data blob to a file.

Consuming profile data

This is another ILProvider that either returns the underlying IL unmodified, or replaces it with a failfast call.

Cc @dotnet/ilc-contrib

This was my hackathon project of this year. Since it’s in a pretty leaf-y location within the product, I think it would be fine to check it in under unsupported switches for experimentation.

The question I was trying to answer is "How can we identify code that is statically reachable but is not needed at runtime (and we might be able to get rid of it by reorganizing the code a bit)?". The answer to that is profiling and profile-based code generation.

The idea is simple:

* We build a special version of the program that keeps track of what methods executed.
* At process termination, this data is written out to a file.
* We then recompile the program, passing in the method list as one of the inputs.
* When compiling a method that was in the list, compile as usual.
* When compiling a method that wasn’t in the list, replace it with a failfast.
* We still run the usual dependency analysis so the failfast methods are going to stop graph expansion at method boundaries. One could do better than this (cut off at basic block boundaries) but that’s a lot more work with a small benefit.

The profiling file format is simple: list of assembly MVIDs, followed by an array of bools. Index within the array corresponds to a MethodDef token within the assembly. So if bool at index N of assembly A is true, method in assembly with MVID A at RID N is reachable. Using MVIDs (GUID) instead of assembly name helps identify mismatches between profile data and input assemblies. Best to build with `-p:Deterministic=true`.

# Usage

Build the app with `-p:_InstrumentReachability=true`. Run the app and exercise all the necessary codepaths. This produces a reach.mprof file in the current directory. You can run the app multiple times, the profile data will get merged automatically.

Then build the app again with `-p:_ReachabilityInstrumentationFile=reach.mprof`. Output of this build will be a profile-based version.

Just to give an idea of how small things will be:

* dotnet new webapiaot: 4.1 MB (down from 8.7 MB)
* Hello World with OptimizationPreference=Size, StackTraceSupport=false, UseSystemResourceKeys=true: 425 kB

The app may or may not actually work. Hello world works fine. Webapiaot hits an issue where our ability to optimize things better in the profile-based version leads to new codepaths being executed. It can be worked around by forcing something that triggers the optimization into profile data. For example, for webapiaot this helps:

```csharp
typeof(Holder).GetCustomAttribute<MyAttribute>().TheType.IsConstructedGenericType.ToString();
class MyAttribute : Attribute { public MyAttribute(Type t) => TheType = t; public Type TheType; }
class Gen<T> { }
[My(typeof(Gen<>))]
class Holder { }
```

(We need to call `IsConstructedGenericType` on a type that got its MethodTable optimized away.)

It is not strictly necessary for the outputs to be runnable for this to be useful. The idea is to generate DGML/MSTAT of the unprofiled version, then DGML/MSTAT of the profiled version, and diff them. The diff might highlight parts of the app that could maybe be removed by reorganizing the code.

Not everything will be removable. For example, in the Hello World, most of exception handling is gone, but one can still run into exceptions at runtime (I just didn’t when I profiled it).

# Implementation

There are 3 parts: compiler component to generate instrumented outputs, CoreLib change to save profile data on app exist, and compiler component to consume profile data.

## Generating profile data

ReachabilityInstrumentationProvider is the main workhorse. It’s an ILProvider that wraps whatever IL we got from the input assembly and prefixes each method body with two instructions: ldc.i4.1 followed by stsfld. The stsfld targets a compiler-generated RVA static field. The compiler lays out all these fields in a way that their in-memory positions correspond to profile data positions (so saving profile data just means copying a memory range to a file).

ReachabilityDataBlobNode is responsible for creating the data blob an laying out all the RVA static fields that the code refers to.

Last but not least, InitializeMethod generates a small stub that informs corelib where to find the data blob at runtime. We hook up InitializeMethod into StartupCodeMain.

## Saving profile data

Within CoreLib, we define two methods - one is called at startup and informs us where profile data blob lives. The other is the very last managed method executed. It saves the data blob to a file.

## Consuming profile data

This is another ILProvider that either returns the underlying IL unmodified, or replaces it with a failfast call.
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

@EgorBo
Copy link
Member

EgorBo commented Sep 20, 2024

Do you plan to use MIBC for this? (so not only you can extract the reachability info, but also improve performance with PGO)

@kekekeks
Copy link

Will profile be only available with NAOT builds? i. e. would one still have to make sure that the trimmed app still runs with NAOT first to collect the profile?

I was thinking about generating such profile for non-corlib stuff using CoreCLR profiling APIs while forcing various codepaths to think that the code runs with NAOT by IL-patching IsDynamicCodeSupported and RuntimeInformation.FrameworkDescription.

@kekekeks
Copy link

I guess two-stage profiling would be quite useful. A CoreCLR-based one to make the app to run with NAOT in the first place (without having to spend several hours on adjusting trimming configuration by trial and error) and then the precise one from instrumented NativeAOT build to produce a smaller binary.

@GerardSmit
Copy link
Contributor

GerardSmit commented Sep 20, 2024

You can run the app multiple times, the profile data will get merged automatically.

Does this mean that:

  1. Create a web application
  2. Create a new controller
  3. Visit the controller, generate the reach.mprof-file and commit it to git
  4. Create a new controller
  5. Only visit the new controller and let the profile data merge into the reach.mprof-file

That both controllers still get included (including all the underlying actions, which could run more code like DB access)?
Or do you need to recreate the reach.mprof-file every time (visit both controllers before publishing)

@MichalStrehovsky
Copy link
Member Author

Do you plan to use MIBC for this? (so not only you can extract the reachability info, but also improve performance with PGO)

This only logs what methods run, no basic blocks. I went for 20% of effort and 80% of effect. If we ever have proper profile collection, half of this could probably be deleted, we wouldn't need the instrumentation and corelib part of this.

You can run the app multiple times, the profile data will get merged automatically.

Does this mean that:

You can run it multiple times, you cannot recompile it. The file format uses tokens and MVIDs. If you change stuff, they get shuffled and update is rejected.


To be very clear, the only purpose of this is to:

  1. Find out the best possible scenario when it comes to size (how much more one could save in the very ideal and unrealistic case)
  2. Find out if there are any things that could be factored differently so that regular trimming can get rid of them.

Do not ever ship anything compiled like this, it explodes randomly (e.g. if you never profiled contended lock situation and a lock in your app becomes contended, the app will just crash). I even did the extra effort to start the MSBuild properties that activate this with underscores to deter anyone from checking in code that has this.

@EgorBo
Copy link
Member

EgorBo commented Sep 20, 2024

This only logs what methods run, no basic blocks.

MIBC is expected to collect data about basic blocks, it's just that by default it may skip some blocks/methods to make it lightweight. That can be changed via

DOTNET_JitEdgeProfiling=0
DOTNET_JitMinimalJitProfiling=0

I went for 20% of effort and 80% of effect.

Understandable

@am11 am11 added the Hackathon Issues picked for Hackathon label Sep 21, 2024
Copy link
Member

@agocke agocke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did a group code review and I asked most questions there, so LGTM thanks

Comment on lines 240 to 246
public override TypeSystemContext Context
{
get
{
return _context;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI you could use an auto-property to make these shorter if you want, e.g.

Suggested change
public override TypeSystemContext Context
{
get
{
return _context;
}
}
public override TypeSystemContext Context { get; }

@MichalStrehovsky
Copy link
Member Author

/ba-g build took too long and timed out in an unrelated leg

@MichalStrehovsky MichalStrehovsky merged commit 2c29c1d into dotnet:main Nov 11, 2024
86 of 88 checks passed
@MichalStrehovsky MichalStrehovsky deleted the instrtrim branch November 11, 2024 14:50
mikelle-rogers pushed a commit to mikelle-rogers/runtime that referenced this pull request Dec 10, 2024
* Add support for (experimental) profile-based trimming

This was my hackathon project of this year. Since it’s in a pretty leaf-y location within the product, I think it would be fine to check it in under unsupported switches for experimentation.

The question I was trying to answer is "How can we identify code that is statically reachable but is not needed at runtime (and we might be able to get rid of it by reorganizing the code a bit)?". The answer to that is profiling and profile-based code generation.

The idea is simple:

* We build a special version of the program that keeps track of what methods executed.
* At process termination, this data is written out to a file.
* We then recompile the program, passing in the method list as one of the inputs.
* When compiling a method that was in the list, compile as usual.
* When compiling a method that wasn’t in the list, replace it with a failfast.
* We still run the usual dependency analysis so the failfast methods are going to stop graph expansion at method boundaries. One could do better than this (cut off at basic block boundaries) but that’s a lot more work with a small benefit.

The profiling file format is simple: list of assembly MVIDs, followed by an array of bools. Index within the array corresponds to a MethodDef token within the assembly. So if bool at index N of assembly A is true, method in assembly with MVID A at RID N is reachable. Using MVIDs (GUID) instead of assembly name helps identify mismatches between profile data and input assemblies. Best to build with `-p:Deterministic=true`.

# Usage

Build the app with `-p:_InstrumentReachability=true`. Run the app and exercise all the necessary codepaths. This produces a reach.mprof file in the current directory. You can run the app multiple times, the profile data will get merged automatically.

Then build the app again with `-p:_ReachabilityInstrumentationFile=reach.mprof`. Output of this build will be a profile-based version.

Just to give an idea of how small things will be:

* dotnet new webapiaot: 4.1 MB (down from 8.7 MB)
* Hello World with OptimizationPreference=Size, StackTraceSupport=false, UseSystemResourceKeys=true: 425 kB

The app may or may not actually work. Hello world works fine. Webapiaot hits an issue where our ability to optimize things better in the profile-based version leads to new codepaths being executed. It can be worked around by forcing something that triggers the optimization into profile data. For example, for webapiaot this helps:

```csharp
typeof(Holder).GetCustomAttribute<MyAttribute>().TheType.IsConstructedGenericType.ToString();
class MyAttribute : Attribute { public MyAttribute(Type t) => TheType = t; public Type TheType; }
class Gen<T> { }
[My(typeof(Gen<>))]
class Holder { }
```

(We need to call `IsConstructedGenericType` on a type that got its MethodTable optimized away.)

It is not strictly necessary for the outputs to be runnable for this to be useful. The idea is to generate DGML/MSTAT of the unprofiled version, then DGML/MSTAT of the profiled version, and diff them. The diff might highlight parts of the app that could maybe be removed by reorganizing the code.

Not everything will be removable. For example, in the Hello World, most of exception handling is gone, but one can still run into exceptions at runtime (I just didn’t when I profiled it).

# Implementation

There are 3 parts: compiler component to generate instrumented outputs, CoreLib change to save profile data on app exist, and compiler component to consume profile data.

## Generating profile data

ReachabilityInstrumentationProvider is the main workhorse. It’s an ILProvider that wraps whatever IL we got from the input assembly and prefixes each method body with two instructions: ldc.i4.1 followed by stsfld. The stsfld targets a compiler-generated RVA static field. The compiler lays out all these fields in a way that their in-memory positions correspond to profile data positions (so saving profile data just means copying a memory range to a file).

ReachabilityDataBlobNode is responsible for creating the data blob an laying out all the RVA static fields that the code refers to.

Last but not least, InitializeMethod generates a small stub that informs corelib where to find the data blob at runtime. We hook up InitializeMethod into StartupCodeMain.

## Saving profile data

Within CoreLib, we define two methods - one is called at startup and informs us where profile data blob lives. The other is the very last managed method executed. It saves the data blob to a file.

## Consuming profile data

This is another ILProvider that either returns the underlying IL unmodified, or replaces it with a failfast call.
@github-actions github-actions bot locked and limited conversation to collaborators Dec 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-NativeAOT-coreclr Hackathon Issues picked for Hackathon

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants