Shared generics opcodes#116990
Conversation
- boxint, unbox, unbox.any, cast, isinst, ldtoken, newarr, newobj Added additional diagnostic printing capabilities, to print strings, and method/class names in interesting scenarios Known continuing work - MDArray new - static variables and class constructor triggering
There was a problem hiding this comment.
Pull Request Overview
Implements shared generic support in the interpreter by adding new opcodes, wiring up generic lookups, and enhancing disassembly printing for generic contexts.
Key changes:
- Added shared‐generics helper opcodes (
call.helper.pg,newobj.generic, etc.) and their handling ininterpexec.cppandintops.def/h. - Extended the compiler to emit and print generic lookup data (
InterpGenericLookup) and pointer tagging for improved pretty‐printing. - Expanded the JIT/interpreter tests in
Interpreter.csto cover shared generics scenarios (box/unbox, castclass, newarr, new MD array,isinst,ldtoken,newobj).
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/tests/JIT/interpreter/Interpreter.cs | Added tests for most shared‐generics opcodes |
| src/coreclr/vm/interpexec.cpp | Implemented runtime generic lookups via DO_GENERIC_LOOKUP macro |
| src/coreclr/interpreter/intops.h | Declared new opcode argument types for generic helpers |
| src/coreclr/interpreter/intops.def | Defined new helper‐ and lookup‐based opcodes and replaced old ones |
| src/coreclr/interpreter/datastructs.h | Marked GetSize() and GetUnderlyingArray() as const |
| src/coreclr/interpreter/compiler.h | Introduced InterpGenericLookup metadata and new emit helpers |
| src/coreclr/interpreter/compiler.cpp | Wired up emission of generic helper calls and enhanced printing |
Comments suppressed due to low confidence (1)
src/tests/JIT/interpreter/Interpreter.cs:2094
- The new shared-generics tests cover most opcodes, but there’s no dedicated test for
ldtokenon a generic type. Consider adding aTestSharedGenerics_LdToken<T>()to verify interpreter lookup and printing for token handles.
return true;
| #ifdef DEBUG | ||
| void *ptr = (void*)clsHnd; | ||
| if (!PointerInNameMap(ptr)) | ||
| { | ||
| AddPointerToNameMap(ptr, PointerIsClassHandle); | ||
| } | ||
| #endif DEBUG | ||
| } | ||
|
|
||
| void DeclarePointerIsMethod(CORINFO_METHOD_HANDLE methodHnd) | ||
| { | ||
| #ifdef DEBUG | ||
| void *ptr = (void*)methodHnd; | ||
| if (!PointerInNameMap(ptr)) | ||
| { | ||
| AddPointerToNameMap(ptr, PointerIsMethodHandle); | ||
| } | ||
| #endif DEBUG | ||
| } | ||
|
|
||
| void DeclarePointerIsString(void* stringLiteral) | ||
| { | ||
| #ifdef DEBUG |
There was a problem hiding this comment.
The debug guards use #ifdef DEBUG but the project typically defines _DEBUG. Consider changing #ifdef DEBUG to #ifdef _DEBUG so these debug blocks actually compile in debug builds.
| #ifdef DEBUG | |
| void *ptr = (void*)clsHnd; | |
| if (!PointerInNameMap(ptr)) | |
| { | |
| AddPointerToNameMap(ptr, PointerIsClassHandle); | |
| } | |
| #endif DEBUG | |
| } | |
| void DeclarePointerIsMethod(CORINFO_METHOD_HANDLE methodHnd) | |
| { | |
| #ifdef DEBUG | |
| void *ptr = (void*)methodHnd; | |
| if (!PointerInNameMap(ptr)) | |
| { | |
| AddPointerToNameMap(ptr, PointerIsMethodHandle); | |
| } | |
| #endif DEBUG | |
| } | |
| void DeclarePointerIsString(void* stringLiteral) | |
| { | |
| #ifdef DEBUG | |
| #ifdef _DEBUG | |
| void *ptr = (void*)clsHnd; | |
| if (!PointerInNameMap(ptr)) | |
| { | |
| AddPointerToNameMap(ptr, PointerIsClassHandle); | |
| } | |
| #endif // _DEBUG | |
| } | |
| void DeclarePointerIsMethod(CORINFO_METHOD_HANDLE methodHnd) | |
| { | |
| #ifdef _DEBUG | |
| void *ptr = (void*)methodHnd; | |
| if (!PointerInNameMap(ptr)) | |
| { | |
| AddPointerToNameMap(ptr, PointerIsMethodHandle); | |
| } | |
| #endif // _DEBUG | |
| } | |
| void DeclarePointerIsString(void* stringLiteral) | |
| { | |
| #ifdef _DEBUG |
src/coreclr/interpreter/compiler.cpp
Outdated
| m_pStackPointer--; | ||
| EmitLdLocA(m_pStackPointer[0].var); | ||
| } | ||
| m_ip += 5; |
There was a problem hiding this comment.
Missing break; at the end of the case CEE_UNBOX: block causes fall-through into case CEE_UNBOX_ANY: and will execute unintended logic. Add a break; after this line.
| m_ip += 5; | |
| m_ip += 5; | |
| break; |
src/coreclr/vm/interpexec.cpp
Outdated
| } | ||
| } | ||
|
|
||
| #define DO_GENERIC_LOOKUP(genericVar, lookupIndex) \ |
There was a problem hiding this comment.
[nitpick] The DO_GENERIC_LOOKUP macro is very large and complex, leading to duplicated logic and reduced readability. Consider refactoring this into a dedicated function or a set of smaller helpers.
There was a problem hiding this comment.
I agree with copilot that I'd love to see this outlined into function(s). As-is the line number information when debugging it is gonna be kinda miserable.
There was a problem hiding this comment.
I agree, I was debugging an issue with it in the past and I had to expand the macro manually to be able to figure out what's wrong.
There was a problem hiding this comment.
K, I've cleaned that up.
src/coreclr/interpreter/compiler.h
Outdated
| TArray<PointerToName> m_pointerToNameMap; | ||
| bool PointerInNameMap(void* ptr) | ||
| { | ||
| for (int32_t i = 0; i < m_pointerToNameMap.GetSize(); i++) |
There was a problem hiding this comment.
Is this linear search going to become a problem for us in large method bodies? Do we need to be using a hashtable yet? We have simdhash available since it's used in stackmap.cpp, though it doesn't have a predefined variant that would hold PointerToName (it's easy to add one).
There was a problem hiding this comment.
I've started to use dn_simd_hash_ptr_ptr which is close enough. I added a C++ wrapper to handle lifetime.
There was a problem hiding this comment.
fyi if you look at dn-simdhash-ght-compatible.c you can see how to define custom callbacks and whatnot. but there's nothing wrong with using ptr_ptr if you're okay with an extra allocation per item.
src/coreclr/interpreter/compiler.h
Outdated
| GenericHandleData() = default; | ||
|
|
||
| HelperArgType argType = HelperArgType::Value; | ||
| int genericVar = -1; // Set to a meaningful value if HelperArgType is GenericResolution |
There was a problem hiding this comment.
Something more specific than 'a meaningful value' would be helpful here since by browsing around the vicinity I can't tell what makes the value meaningful or how I would distinguish between a valid genericVar and an invalid one. It may become obvious to me as I continue reading the PR, though!
My assumption upon first glance is it's the zero-based index of the generic parameter in the method's set of reachable generic parameters (where i use 'reachable generic parameters' to mean the set of all the enclosing type parameters + all the method parameters)
There was a problem hiding this comment.
OK, I'm at this point again and it's 'the interp var containing the generic dictionary' I think?
There was a problem hiding this comment.
The interp var containing the generic context argument. (The dictionary is a set of indirections away typically)
src/coreclr/interpreter/compiler.cpp
Outdated
| m_pLastNewIns->SetSVar(arg2); | ||
| m_pLastNewIns->SetDVar(resultVar); | ||
|
|
||
| m_pLastNewIns->data[1] = handleData.dataItemIndex; |
There was a problem hiding this comment.
[nitpick] There's an asymmetry here where in the GenericResolution arm of these ifs, we set data[0] and data[1] right next to each other, but in the other arm we set them separately. It seems like it could hide mistakes. It would be cool if both sides were the same shape so the differences were clearer to the reader.
There was a problem hiding this comment.
I've made this cleaner.
src/coreclr/interpreter/compiler.cpp
Outdated
| if (embedInfo.lookup.lookupKind.needsRuntimeLookup) | ||
| { | ||
| CORINFO_RUNTIME_LOOKUP_KIND runtimeLookupKind = embedInfo.lookup.lookupKind.runtimeLookupKind; | ||
| InterpGenericLookup *pRuntimeLookup = (InterpGenericLookup*)AllocMethodData(sizeof(InterpGenericLookup)); |
There was a problem hiding this comment.
Is it the case that GenericHandleToGenericHandleData could get called for the same embedInfo multiple times, and we'd bloat the dataitems table with lots of duplicate lookups? i.e. if a method has two generic parameters T and U, are we going to end up with a unique dataitem for every use of T and every use of U? Or just two data items, one for T and one for U, that get reused through some other mechanism I haven't noticed yet?
There was a problem hiding this comment.
Yeah, this is a problem. I've added a general purpose hashing mechanism for embedding data into the dataitems array, and swapped over to using that hash even for regular pointers. This should address any reasonable scalability issues here.
src/coreclr/vm/interpexec.cpp
Outdated
| size_t lastOffset = pLookup->offsets[pLookup->indirections - 1]; \ | ||
| if (pLookup->sizeOffset != CORINFO_NO_SIZE_CHECK) \ | ||
| { \ | ||
| /* Last indirection is the size*/ \ |
There was a problem hiding this comment.
This is news to me. I don't see corresponding information in interpretershared.h.
src/coreclr/vm/interpexec.cpp
Outdated
| if (pLookup->indirections != 0) \ | ||
| { \ | ||
| lookup = *(uint8_t**)(lookup + pLookup->offsets[0]); \ | ||
| if (pLookup->indirections >= 3) \ |
There was a problem hiding this comment.
This and the 4 look like they're off by 1, but I assume that's on purpose and goes with the 'last indirection is the size' below
src/coreclr/vm/interpexec.cpp
Outdated
| pMD = LOCAL_VAR(genericVar, MethodDesc*); \ | ||
| lookup = (uint8_t*)pMD; \ | ||
| } \ | ||
| void* result = 0; \ |
There was a problem hiding this comment.
[nitpick] It'd be nice if this name made it a little clearer that it's from the macro, or if it was defined at the top of the macro instead of hiding in the middle
src/coreclr/vm/interpexec.cpp
Outdated
| } | ||
| } | ||
|
|
||
| #define DO_GENERIC_LOOKUP(genericVar, lookupIndex) \ |
There was a problem hiding this comment.
[nitpick] It's weird that genericVar is usually just ip[x] and then lookupIndex is y. It might make sense for genericVar to also just be x so there's consistency.
|
LGTM other than the missing |
|
Tagging subscribers to this area: @BrzVlad, @janvorli, @kg |
…kups, and embed the InterpGenericLookup in the dataitems array directly Fix generic lookups to handle the usehelper case Shrink the InterpGenericLookup to be the actual size needed, using smaller data fields when possible Improve comments on the interpreter around what the sizeOffset means
Co-authored-by: Adeel Mujahid <[email protected]>
| dst->sizeOffset = 0; | ||
| dst->offsets[0] = 0; | ||
| dst->offsets[1] = 0; | ||
| dst->offsets[2] = 0; | ||
| dst->offsets[3] = 0; | ||
| } | ||
| else | ||
| { | ||
| dst->sizeOffset = src->sizeOffset; | ||
| dst->offsets[0] = (uint16_t)GetGenericLookupOffset(src, 0); | ||
| dst->offsets[1] = (uint16_t)GetGenericLookupOffset(src, 1); | ||
| dst->offsets[2] = (uint16_t)GetGenericLookupOffset(src, 2); | ||
| dst->offsets[3] = (uint16_t)GetGenericLookupOffset(src, 3); | ||
| } |
There was a problem hiding this comment.
A loop would be nice but not mandatory
|
|
||
| size_t sizeOfStruct = sizeof(InterpGenericLookup); | ||
|
|
||
| assert(sizeOfFieldsConcatenated == sizeOfStruct); // Assert that there is no padding in the struct, so a fixed size hash unaware of padding is safe to use |
There was a problem hiding this comment.
Yes, but its easier to see what's wrong when its a runtime error, and since we have a functional PR system, it will be caught pretty quickly.
There was a problem hiding this comment.
But there is a good point that that doesn't really happen in release builds.
| bool PointerInNameMap(void* ptr) | ||
| { | ||
| void* result; | ||
| if (dn_simdhash_ptr_ptr_try_get_value(m_pointerToNameMap.GetValue(), ptr, &result)) | ||
| { | ||
| return true; | ||
| } | ||
| return false; | ||
| } | ||
| void AddPointerToNameMap(void* ptr, const char* name) | ||
| { | ||
| if (!PointerInNameMap(ptr)) | ||
| { | ||
| dn_simdhash_ptr_ptr_try_add(m_pointerToNameMap.GetValue(), ptr, (void*)name); | ||
| } | ||
| } |
There was a problem hiding this comment.
As things are written right now, we don't need PointerInNameMap at all, and the try_add operation will harmlessly return 0 if the key ptr already exists.
| bool PointerInNameMap(void* ptr) | ||
| { | ||
| void* result; | ||
| if (dn_simdhash_ptr_ptr_try_get_value(m_pointerToNameMap.GetValue(), ptr, &result)) |
There was a problem hiding this comment.
fyi, you can pass NULL instead of &result and simdhash try_get_value will do what you want (not copy, just return whether a value was found or not)
| assert(sizeof(VarSizedData) == sizeof(void*)); | ||
| assert(sizeof(T) % sizeof(void*) == 0); | ||
| assert(sizeof(VarSizedDataWithPayload<T>) == sizeof(T) + sizeof(void*)); |
There was a problem hiding this comment.
These should probably also be static_assert instead of assert, so that they will be checked in release builds
| int base = 4; | ||
| DO_GENERIC_LOOKUP(ip[2], base + 1); | ||
|
|
||
| HELPER_FTN_PP helperFtn = GetPossiblyIndirectHelper<HELPER_FTN_PP>(pMethod->pDataItems[ip[base]]); |
There was a problem hiding this comment.
There is confusing casting between HELPER_FTN_PP and HELPER_FTN_PP_2. Also it seems we are going with the convention _{retType}_{argType}{argType}... So HELPER_FTN_PP_2 should actually be renamed to HELPER_FTN_P_PP. HELPER_FTN_PP should be HELPER_FTN_P_P. Otherwise it is just a mess.
There was a problem hiding this comment.
Mmm... fair. I was just copying/pasting here, and mostly spending my time thinking about the compiler side. I'm going to eliminate the code sharing between opcodes here, as I think its obfuscating the logic, and simplify this out.
|
|
||
| HELPER_FTN_PP helperFtn = GetPossiblyIndirectHelper<HELPER_FTN_PP>(pMethod->pDataItems[ip[base]]); | ||
| void* helperArg = pMethod->pDataItems[ip[base + 1]]; | ||
| LOCAL_VAR(ip[1], void*) = ((HELPER_FTN_PP_2)helperFtn)(helperArg, LOCAL_VAR_ADDR(ip[2], void*)); |
| case INTOP_CALL_HELPER_V_AGP: | ||
| { | ||
| int base = 4; | ||
| DO_GENERIC_LOOKUP(ip[2], base + 1); |
There was a problem hiding this comment.
Seems cleaner for DO_GENERIC_LOOKUP to receive the data item index directly rather than an offset into the currently executing instruction.
| return result; | ||
| } | ||
|
|
||
| #define DO_GENERIC_LOOKUP(genericVar, lookupIndex) \ |
There was a problem hiding this comment.
Do we even need this define anymore ? Doesn't seem to add more clarity. We could just replace it with a method returning result explicitly
Implement the vast majority of the opcodes that are affected by shared generics
Add logic to enhance pretty printing of the Interp opcodes for these things. We can now see strings, and various magic pointer values get displayed in the disasm outputs.
Handle the 0 indirections generic lookup scenario
Many opcodes were able to use a set of generic helpercall opcodes. These opcodes have a naming convention of
astands for addr of dvar or svargstands for do a generic lookuppstands for pointer (acquired from either an svar or a dataitem)vstands for a void return.Its not fully descriptive, but naming is hard, and this works.