Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@mikedn
Copy link

@mikedn mikedn commented Jun 3, 2018

LOCKADD nodes are generated rather early and there's no reason for that:

  • The CORINFO_INTRINSIC_InterlockedAdd32/64 intrinsics are not actually used. Even if they would be used we can still import them as XADD nodes and rely on lowering to generate LOCKADD when needed.
  • gtExtractSideEffList transforms XADD into LOCKADD but this can be done in lowering. LOCKADD is an XARCH specific optimization after all.

Additionally:

  • Avoid the need for special handling in LSRA by making LOCKADD a "no value" oper.
  • Split LOCKADD codegen from XADD/XCHG codegen, attempting to use the same code for all 3 just makes things more complex.
  • The address is always in a register so there's no real need to create an indir node on the fly, the relevant emitter functions can be called directly.

The last point above is actually a CQ issue - we always generate add [reg], imm, more complex address modes are not used. Unfortunately this problem starts early, when the importer spills the address to a local variable. If that ever gets fixed then we'll could probably generate a contained LEA in lowering.

Contributes to #14557

@mikedn
Copy link
Author

mikedn commented Jun 4, 2018

@CarolEidt I've yet to convince myself that adding an indirection to interlocked nodes is a step forward. But even without that there's still room for improvement, especially around LOCKADD. There's no need to generate it early, it should be GTK_NOVALUE, there's no need to use the indirForm hack etc.

@mikedn
Copy link
Author

mikedn commented Jun 4, 2018

jit-diff summary:

Total bytes of diff: -22 (0.00% of base)
    diff is an improvement.
Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes
Top file improvements by size (bytes):
         -15 : System.Private.CoreLib.dasm (0.00% of base)
          -3 : System.IO.Pipes.dasm (-0.01% of base)
          -2 : System.Linq.Parallel.dasm (0.00% of base)
          -1 : System.Collections.Concurrent.dasm (0.00% of base)
          -1 : System.Net.Security.dasm (0.00% of base)
5 total files with size differences (5 improved, 0 regressed), 125 unchanged.
Top method improvements by size (bytes):
          -6 : System.Private.CoreLib.dasm - MemoryFailPoint:.ctor(int):this
          -3 : System.IO.Pipes.dasm - PipeCompletionSource`1:RegisterForCancellation(struct):this (3 methods)
          -3 : System.Private.CoreLib.dasm - SpinLock:EnterSpin(int):this
          -3 : System.Private.CoreLib.dasm - MemoryFailPoint:Dispose(bool):this
          -1 : System.Collections.Concurrent.dasm - WorkStealingQueue:TryLocalPop(byref):bool:this
11 total methods with size differences (11 improved, 0 regressed), 145751 unchanged.

@mikedn
Copy link
Author

mikedn commented Jun 4, 2018

There are 2 types of diffs:

-       mov      rax, rdx
+       mov      eax, edx

This is because codegen previously called inst_RV_RV without a type so it default to TYP_I_IMPL even when TYP_INT would have sufficed.

-       mov      rdx, rbx
        lock     
-       add      dword ptr [rcx], edx
+       add      dword ptr [rcx], ebx

This is because BuildNode was creating a definition for LOCKADD even if one wasn't actually needed.

LOCKADD nodes are generated rather early and there's no reason for that:
* The CORINFO_INTRINSIC_InterlockedAdd32/64 intrinsics are not actually used. Even if they would be used we can still import them as XADD nodes and rely on lowering to generate LOCKADD when needed.
* gtExtractSideEffList transforms XADD into LOCKADD but this can be done in lowering. LOCKADD is an XARCH specific optimization after all.

Additionally:
* Avoid the need for special handling in LSRA by making GT_LOCKADD a "no value" oper.
* Split LOCKADD codegen from XADD/XCHG codegen, attempting to use the same code for all 3 just makes things more complex.
* The address is always in a register so there's no real need to create an indir node on the fly, the relevant emitter functions can be called directly.

The last point above is actually a CQ issue - we always generate `add [reg], imm`, more complex address modes are not used. Unfortunately this problem starts early, when the importer spills the address to a local variable. If that ever gets fixed then we'll could probably generate a contained LEA in lowering.
@mikedn mikedn changed the title [WIP] Cleanup LOCKADD handling Cleanup LOCKADD handling Jun 4, 2018
@CarolEidt
Copy link

This is a much more elegant solution than I was envisioning - I love it when the best solution actually simplifies things instead of adding more complexity.
I don't see any reason not to #14557 after this - do you?

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CarolEidt CarolEidt merged commit e8661fe into dotnet:master Jun 4, 2018
@mikedn
Copy link
Author

mikedn commented Jun 5, 2018

I don't see any reason not to #14557 after this - do you?

Up to you, this change does not prevent adding an indir if we find that's valuable.

That said, I went again through #14547 (where the issue started) and it looks like I've made a mistake. I dropped a getActualType from LOCKADD data node but that was supposed to be done only for XADD/XCHG. Strange that no test failed, I'll have to investigate.

noway_assert(addrReg != targetReg);
GenTree* addr = node->gtGetOp1();
GenTree* data = node->gtGetOp2();
emitAttr size = emitTypeSize(data->TypeGet());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be emitActualTypeSize, otherwise the assert bellow will likely fail if the data operand is a small int indir (or any other node that may be small int).

We have a test for such a situation but it uses exchange rather than add so it does not catch this: https://github.com/dotnet/coreclr/blob/master/tests/src/JIT/Regression/JitBlue/GitHub_10714/GitHub_10714.cs (funny, I added that but I don't remember it).

node->ClearUnusedValue();
// Make sure the types are identical, since the node type is changed to VOID
// CodeGen relies on op2's type to determine the instruction size.
assert(node->gtGetOp2()->TypeGet() == node->TypeGet());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be genActualType(gtGetOp2()->TypeGet()) == node->TypeGet().

@mikedn
Copy link
Author

mikedn commented Jun 5, 2018

Fix for actual type mistake available in #18303. The issue is mostly theoretical, in practice it doesn't seem possible to hit this.

AndyAyersMS added a commit that referenced this pull request Nov 28, 2018
Port #18267 fix value numbering when selecting a constant to release/2.2
@mikedn mikedn deleted the lockadd3 branch March 9, 2019 20:37
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants