-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
NativeAOT authoring observations
As I'm continuing to iterate to shave off overhead from the new serializer for Npgsql, and slowly getting more familiar with the code patterns that work well with NativeAOT I wanted to report on my findings so far. I will be using this issue to centralize future observations of a similar nature as well.
Async
One class of issues in particular I want to highlight is the cost of everything relating to async code. Having any of these methods in generic types is one major source of bloat due to their inherent IL type + codegen explosion` multiplied by the number of canonical instantiations.
It might be worthwhile for the C# and runtime team to take a look at what can be done to reduce its footprint, as this interaction is supremely problematic for more casual authors. If there is no significant improvement to be made here it could help to expose relevant tools for authors to reduce it on a case by case basis.
Some of those tools could be for instance having public methods on ValueTask and ValueTask to move from its generic form to non generic and back, allowing us external authors to support pooled IValueTaskSource(T) while doing so.
Tasks have inherent support of up/downcasting to do this. For ValueTask these apis could be used in the same way as up/downcasting Tasks, for the purpose of pushing this bulky async codegen to non generic types. There already is the internal method ValueTask.DangerousCreateFromTypedValueTask as a validation of this being useful internally.
Async Types
Next up I started looking at the general cost of even having apis returning ValueTask types, it's not cheap... Not only do you pay for the Task types but also for ValueTask<T>, IValueTaskSource<T>, ValueTask<T>.ValueTaskSourceAsTask, ValueTaskAwaiter<T> and others.
I have two links here, one before dropping about 20(?) mostly reference type instantiations of ValueTask<T> and one report after doing so. The difference is a hefty 80kb, with methods taking up just 24kb of that difference. The remainder is mostly types and type dictionary metadata.
Before: https://github.com/NinoFloris/Slon/actions/runs/4507705639
After: https://github.com/NinoFloris/Slon/actions/runs/4507808228
The mstats are attached if you want to explore the System.Private.Corelib types in more detail.
Reference Types
Onto the next papercut, reference type instantiations in particular. Now their methods are shared via __Canon, however all their concrete types are still required to exist in full, take a look:
These all add to the binary size, but I'm not sure what they're exactly uniquely adding. Could these EETypes/method tables (what is the correct name of this?) be shared in any way via __Canon? Only keeping a concrete generic context around per reference type instantiation? I understand the latter would always be needed to have correct type testing etc. inside methods.
Improving this situation seems like it may also reduce the previously discussed ValueTask<T> type bloat?
Value Type Code Sharing
Finally I would urge runtime devs to consider what it would take to share code across same size/layout value types, a la gc shapes in golang. For a type like int that could allow eligible code for say List<T> to be reused across int, uint, enums,ValueTuple<int>, DateOnly and other types wrapping a single int. The same would go for other primitives like long that have a lot of representationally isomorphic types to share code with.
I understand this cuts into the ability to do runtime intrinsic optimizations (like uints never being negative values etc). I also see how this may complicate codegen as this sharing is only possible when the different instantiations don't actually produce different bodies (i.e. int.Equals will produce different code than DateOnly.Equals), however there will be many methods that don't depend on the generic type's methods at all, just their data representation. For an initial good enough experiment it might just be sufficient to add a stage that aliases same-type method instantiations by their body being byte for byte identical?
Such a stage may also help with sharing code across all instantiations when it doesn't make use of the generic context at all.
IIRC there is already some global deduplication mode (which impacts stack traces) but only sharing per generic type seems to be more suitable to be enabled by default?
I can see the theoretical version of these things working but I'm obviously not sure what this would mean more concretely. (and the practical problems flowing from this, which I'm surely glossing over here)
Conclusion
If we're really serious about NativeAOT being 'effortlessly' competitive (so no crazy authoring) these issues must be explored. If only just to understand the problematic elements better.
All in all it's been challenging to keep size down to acceptable levels in this particular area of generics, async and serializer-like code.
@DamianEdwards Is there a world in which we drive dotnet/aspnetcore#45910 stage 2 efforts across internal and external collaborators more effectively than just github issues? Is that something you're open to?
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
