-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Cleanup LOCKADD handling #18267
Cleanup LOCKADD handling #18267
Conversation
|
@CarolEidt I've yet to convince myself that adding an indirection to interlocked nodes is a step forward. But even without that there's still room for improvement, especially around |
|
jit-diff summary: |
|
There are 2 types of diffs: - mov rax, rdx
+ mov eax, edxThis is because codegen previously called - mov rdx, rbx
lock
- add dword ptr [rcx], edx
+ add dword ptr [rcx], ebxThis is because |
LOCKADD nodes are generated rather early and there's no reason for that: * The CORINFO_INTRINSIC_InterlockedAdd32/64 intrinsics are not actually used. Even if they would be used we can still import them as XADD nodes and rely on lowering to generate LOCKADD when needed. * gtExtractSideEffList transforms XADD into LOCKADD but this can be done in lowering. LOCKADD is an XARCH specific optimization after all. Additionally: * Avoid the need for special handling in LSRA by making GT_LOCKADD a "no value" oper. * Split LOCKADD codegen from XADD/XCHG codegen, attempting to use the same code for all 3 just makes things more complex. * The address is always in a register so there's no real need to create an indir node on the fly, the relevant emitter functions can be called directly. The last point above is actually a CQ issue - we always generate `add [reg], imm`, more complex address modes are not used. Unfortunately this problem starts early, when the importer spills the address to a local variable. If that ever gets fixed then we'll could probably generate a contained LEA in lowering.
|
This is a much more elegant solution than I was envisioning - I love it when the best solution actually simplifies things instead of adding more complexity. |
CarolEidt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Up to you, this change does not prevent adding an indir if we find that's valuable. That said, I went again through #14547 (where the issue started) and it looks like I've made a mistake. I dropped a |
| noway_assert(addrReg != targetReg); | ||
| GenTree* addr = node->gtGetOp1(); | ||
| GenTree* data = node->gtGetOp2(); | ||
| emitAttr size = emitTypeSize(data->TypeGet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must be emitActualTypeSize, otherwise the assert bellow will likely fail if the data operand is a small int indir (or any other node that may be small int).
We have a test for such a situation but it uses exchange rather than add so it does not catch this: https://github.com/dotnet/coreclr/blob/master/tests/src/JIT/Regression/JitBlue/GitHub_10714/GitHub_10714.cs (funny, I added that but I don't remember it).
| node->ClearUnusedValue(); | ||
| // Make sure the types are identical, since the node type is changed to VOID | ||
| // CodeGen relies on op2's type to determine the instruction size. | ||
| assert(node->gtGetOp2()->TypeGet() == node->TypeGet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be genActualType(gtGetOp2()->TypeGet()) == node->TypeGet().
|
Fix for actual type mistake available in #18303. The issue is mostly theoretical, in practice it doesn't seem possible to hit this. |
Port #18267 fix value numbering when selecting a constant to release/2.2
Cleanup LOCKADD handling Commit migrated from dotnet/coreclr@e8661fe
LOCKADDnodes are generated rather early and there's no reason for that:CORINFO_INTRINSIC_InterlockedAdd32/64intrinsics are not actually used. Even if they would be used we can still import them asXADDnodes and rely on lowering to generateLOCKADDwhen needed.gtExtractSideEffListtransformsXADDintoLOCKADDbut this can be done in lowering.LOCKADDis an XARCH specific optimization after all.Additionally:
LOCKADDa "no value" oper.LOCKADDcodegen fromXADD/XCHGcodegen, attempting to use the same code for all 3 just makes things more complex.The last point above is actually a CQ issue - we always generate
add [reg], imm, more complex address modes are not used. Unfortunately this problem starts early, when the importer spills the address to a local variable. If that ever gets fixed then we'll could probably generate a containedLEAin lowering.Contributes to #14557