Skip to content

Conversation

@kunalspathak
Copy link
Contributor

In #68085, I relaxed the heuristics under which we would use SIMD registers for copy block unroll code. The code generated uses temporary registers to addresses to copy (could be GC references) and they are not reported. As such, we should mark the unroll code region as GC non-interruptible to make sure GC doesn't kick in and collect the wrong objects.

Note: We were marking gtBlkOpGcUnsafe only for situation where destination address was local, but never checked about the source address. This PR expands it to situation when either src/dst addresses are non-local (on heap).

Fixes: #68530

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 11, 2022
@ghost ghost assigned kunalspathak May 11, 2022
@ghost
Copy link

ghost commented May 11, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

In #68085, I relaxed the heuristics under which we would use SIMD registers for copy block unroll code. The code generated uses temporary registers to addresses to copy (could be GC references) and they are not reported. As such, we should mark the unroll code region as GC non-interruptible to make sure GC doesn't kick in and collect the wrong objects.

Note: We were marking gtBlkOpGcUnsafe only for situation where destination address was local, but never checked about the source address. This PR expands it to situation when either src/dst addresses are non-local (on heap).

Fixes: #68530

Author: kunalspathak
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@kunalspathak kunalspathak marked this pull request as ready for review May 12, 2022 04:53
@kunalspathak
Copy link
Contributor Author

@dotnet/jit-contrib @BruceForstall

@BruceForstall
Copy link
Contributor

BruceForstall commented May 12, 2022

Related, but not necessary for fixing the bug: genCodeForCpObj() handles unrolling for structs with pointers where the dst address is on the stack. In that case, no write barrier helper is called and we only use ldp/stp, ldr/str sequence. However, we still mark the dst register as a byref for GC, which seems unnecessary.

But when is that CpObj code even used (currently)? It looks like only if the copy is larger than the unroll limit (64).

@AndyAyersMS
Copy link
Member

How hard is it to properly report the GC references?

I suppose these copies can't be all that big (64K max?) but my recollection from past conversations on things like this is that we really shouldn't block GC for more than say 1000 iterations, and this seems like it could end up running longer.

@BruceForstall
Copy link
Contributor

BruceForstall commented May 12, 2022

We only unroll for 64 or 128 bytes in the case we disable GC.

Although for BLK for anything that isn't unrolled we call memcpy, and that prevents GC also.

@BruceForstall
Copy link
Contributor

How hard is it to properly report the GC references?

The problem is the unrolling code wants to create a "byref" like pointer that is based on an actual object/byref pointer, but doesn't actually point to or within the object. So we can't report it to the GC because it wouldn't get updated properly.

if (blkNode->OperIs(GT_STORE_OBJ))
{
if (!blkNode->AsObj()->GetLayout()->HasGCPtr())
if (!blkNode->AsObj()->GetLayout()->HasGCPtr() || (isDstAddrLocal && (size <= copyBlockUnrollLimit)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a bug: before, if (isDstAddrLocal && (size <= copyBlockUnrollLimit)), we would set gtBlkOpGcUnsafe=true. Now, we won't unless !isSrcAddrLocal.

So, if both isSrcAddrLocal and isDstAddrLocal are true, we won't disable GC. However, the thing we are copying contains GC pointers in this case, and the GC pointers loaded into temporary registers aren't reported, so we have a GC hole.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. I missed that case.

{
blkNode->SetOper(GT_STORE_BLK);
}
else if (isDstAddrLocal && (size <= copyBlockUnrollLimit))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we don't need the isDstAddrLocal check: why can't we use this code path even if copying with src=heap, dst=heap, or src=stack, dst=heap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to consolidate the blkNode->SetOper(GT_STORE_BLK) case, but in general, we need to disable GC for following unroll cases:

  • isDstAddrLocal = true (as of today)
  • isDstAddrLocal = false (if dst is on heap)
  • isSrcAddrLocal = false (if src is on heap)
  • isSrcAddrLocal = true (since we want to disable GC regardless if dst is on stack/heap, src state won't matter).

So basically, if we are doing GT_STORE_BLK and unrolling, we need to disable GC. With that said, I think we should disable the GC inside genCodeForCpBlkUnroll() only for cases where we will be doing the adjustments i.e. srcOffsetAdjustment != 0 || dstOffsetAdjustment != 0 to limit it to scenarios that are really affected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With that being the case, I missed a case for objects that has GC pointers. Fixed in #70053

{
blkNode->gtBlkOpKind = GenTreeBlk::BlkOpKindUnroll;

if (!isSrcAddrLocal || !isDstAddrLocal)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these changes are specific to arm64. Shouldn't it be ifdef'ed so it doesn't affect arm32?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hhm, today blkNode->gtBlkOpGcUnsafe = true is marked for both arm/arm64. This PR just extends it to more scenarios?

@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels May 12, 2022
@kunalspathak
Copy link
Contributor Author

Failure is #69136

@kunalspathak kunalspathak dismissed BruceForstall’s stale review May 17, 2022 22:20

Need to merge to unblock clean CI runs

@kunalspathak kunalspathak merged commit 1a8f089 into dotnet:main May 17, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jul 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test failure JIT/Directed/VectorABI/VectorMgdMgdStatic_r/VectorMgdMgdStatic_r.sh

3 participants