[NativeAOT-LLVM] Add a portable implementation of RhGetCodeTarget#2333
Conversation
| } | ||
| } | ||
|
|
||
| internal sealed class LlvmMethodBodyNode : LLVMMethodCodeNode, IMethodBodyNode |
There was a problem hiding this comment.
I deleted the LlvmMethodBodyNode/LLVMMethodCodeNode fork, it did not look to serve any meaningful purpose. MethodCodeNode does not use such pattern either.
| public bool IsSpecialUnboxingThunk => ((CompilerTypeSystemContext)Method.Context).IsSpecialUnboxingThunk(_method); | ||
|
|
||
| public override IEnumerable<DependencyListEntry> GetStaticDependencies(NodeFactory factory) | ||
| public ISymbolNode GetUnboxingThunkTarget(NodeFactory factory) |
There was a problem hiding this comment.
ISpecialUnboxThunkNode implementation is C&P from MethodCodeNode.
...ILCompiler.Compiler/Compiler/DependencyAnalysis/Target_Wasm/UnboxingStubTargetMappingNode.cs
Outdated
Show resolved
Hide resolved
|
@dotnet/nativeaot-llvm |
|
This may be interesting to upstream to address dotnet/runtime#73906 |
| uint32_t instantiatingStubCount = allMappingsCount - simpleStubCount; | ||
|
|
||
| if (!initMap(m_unboxingStubTargetsMap, &allMappings->Data[0], simpleStubCount) || | ||
| !initMap(m_unboxingAndInstantiatingStubTargetsMap, &allMappings->Data[simpleStubCount], instantiatingStubCount)) |
There was a problem hiding this comment.
The core runtime should not need this mapping for anything. Is that correct?
Would it make more sense to initialize this map lazily and do the whole thing in C#, like we do that for other maps that are needed by reflection?
There was a problem hiding this comment.
The core runtime should not need this mapping for anything. Is that correct?
Yes. It's only consumed at the CoreLib layer and above.
Would it make more sense to initialize this map lazily and do the whole thing in C#, like we do that for other maps that are needed by reflection?
I considered it. The main problem is that the functionality is used by delegate equality, which seems low-level enough to be non-trivial to initialize robustly in managed code.
Additionally, the SHash closed hashing table implementation is very nicely fitted to this use case, a managed dictionary would use 3x+ more memory.
There was a problem hiding this comment.
Does it need to be used by the delegate equality in the first place? It looks like that it is a left-over from some flavor of multi-module mode that allowed multiple unboxing stubs to exist for given method. It should not be a thing in current native AOT. You can try to upstream removing it from delegate equality.
Even if it still needed for some reason, delegate equality should not be used by low-level runtime code where the lazy initialization would lead to problems.
A reverse map like this one is a common problem for reflection-like functionality. I think it would be fine to implement this reverse map in a similar way. We have a few of those. For example, look for FunctionPointersToOffsets. (This one is also more complicated than it needs to be due to support for multimodule flavors that we do not care about anymore.)
In fact, it looks like this reverse map was implemented in managed code at some point. There are unused UnboxingAndInstantiatingStubMap and struct UnboxingAndInstantiatingStubMapEntry that one would need to implement reverse maps like this.
There was a problem hiding this comment.
Does it need to be used by the delegate equality in the first place? It looks like that it is a left-over from some flavor of multi-module mode that allowed multiple unboxing stubs to exist for given method. It should not be a thing in current native AOT. You can try to upstream removing it from delegate equality.
Doing some research, agree it looks unnecessary.
I will pursue its removal and a managed implementation for these maps.
...ILCompiler.Compiler/Compiler/DependencyAnalysis/Target_Wasm/UnboxingStubTargetMappingNode.cs
Outdated
Show resolved
Hide resolved
...ILCompiler.Compiler/Compiler/DependencyAnalysis/Target_Wasm/UnboxingStubTargetMappingNode.cs
Outdated
Show resolved
Hide resolved
| uint32_t instantiatingStubCount = allMappingsCount - simpleStubCount; | ||
|
|
||
| if (!initMap(m_unboxingStubTargetsMap, &allMappings->Data[0], simpleStubCount) || | ||
| !initMap(m_unboxingAndInstantiatingStubTargetsMap, &allMappings->Data[simpleStubCount], instantiatingStubCount)) |
There was a problem hiding this comment.
Does it need to be used by the delegate equality in the first place? It looks like that it is a left-over from some flavor of multi-module mode that allowed multiple unboxing stubs to exist for given method. It should not be a thing in current native AOT. You can try to upstream removing it from delegate equality.
Even if it still needed for some reason, delegate equality should not be used by low-level runtime code where the lazy initialization would lead to problems.
A reverse map like this one is a common problem for reflection-like functionality. I think it would be fine to implement this reverse map in a similar way. We have a few of those. For example, look for FunctionPointersToOffsets. (This one is also more complicated than it needs to be due to support for multimodule flavors that we do not care about anymore.)
In fact, it looks like this reverse map was implemented in managed code at some point. There are unused UnboxingAndInstantiatingStubMap and struct UnboxingAndInstantiatingStubMapEntry that one would need to implement reverse maps like this.
5d120c4 to
ec99e4e
Compare
|
@dotnet/nativeaot-llvm PTAL |
In this implementation:
1) The compiler generates an array of tuples { Stub, Target }.
This is done for both simple and instantiating stubs.
2) At runtime startup, pointers into this array are stored in
two hash tables, with the stubs as keys. This allow for more
or less efficient target retrieval.
Costs:
1) The size of the array is ~5KB for the reflection test, which
has ~500 simple unboxing stubs and ~100 instantiating ones.
Thus for a medium-sized program, overhead on the order of
~50KB is expected.
2) The hash table construction startup penalty should be on the
order of 10us for the reflection test and scaling linearly for
larger programs, which seems acceptable.
This reverts commit 82c5c95.
dotnet/corert@08d78ae The original motivation for this was handling import stubs: ``` Function pointer equality comparison was not handling cross-module pointers correctly when optimizations were enabled (causes target pointers to be wrapped in jump stubs sometimes). The delegate equality comparison was hitting this bug. ``` We do not have import stubs anymore and unwrapping unboxing stubs serves no purpose here. Microbenchmarks of delegate equality show ~3x improvement with this change: ``` Bench_DelegateEquality_Positive_OpenStatic<10000000>() took: 355 ms Bench_DelegateEquality_Positive_ClosedStatic<10000000>() took: 367 ms Bench_DelegateEquality_Positive_ClosedInstance<10000000>() took: 371 ms Bench_DelegateEquality_Positive_OpenStatic<10000000>() took: 121 ms Bench_DelegateEquality_Positive_ClosedStatic<10000000>() took: 120 ms Bench_DelegateEquality_Positive_ClosedInstance<10000000>() took: 122 ms ``` Additionally, there is some desire to upstream changes for a portable RhGetCodeTarget implementation. Not having to deal with it at this relatively low-level layer will make things more robust.
In this implementation:
1) The compiler generates an array of tuples { Stub, Target }.
This is done for both simple and instantiating stubs.
2) At runtime, this information is sorted, with lookups done
via binary search.
Costs:
1) The size of the array is ~5KB for the HelloWasm test, which
has ~400 simple unboxing stubs and ~100 instantiating ones.
For a medium-sized application that has ~50K stubs, that's
400K on 32 bit (and double that on 64 bit).
2) Binary search is not "cheap" and, unfortunately, we will often
hit it on the startup path because of ComputeLdftnReverseLookup.
Fortunately, we should rarely hit in steady state, as it will
only be used for Delegate.MethodInfo and Delegate.CreateDelegate
(the latter - in rare scenarios).
3) Managed code that supports all of this clocks in at about 5K,
which is rather large, but is mostly a reflection of how our
codegen is not that great.
Code structure choices:
1) ILC - layered under MetadataManager like the other maps used by
reflection.
2) Runtime - layered in CoreLib for consistency with the existing
(non-portable) implementation.
ec99e4e to
42bf2dd
Compare
In this implementation:
Costs:
Code structure choices:
Contributes to #2307.