-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Arm64: Make unroll code for cpblk non-interruptible #69202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsIn #68085, I relaxed the heuristics under which we would use SIMD registers for copy block unroll code. The code generated uses temporary registers to addresses to copy (could be GC references) and they are not reported. As such, we should mark the unroll code region as GC non-interruptible to make sure GC doesn't kick in and collect the wrong objects. Note: We were marking Fixes: #68530
|
|
@dotnet/jit-contrib @BruceForstall |
|
Related, but not necessary for fixing the bug: genCodeForCpObj() handles unrolling for structs with pointers where the dst address is on the stack. In that case, no write barrier helper is called and we only use ldp/stp, ldr/str sequence. However, we still mark the dst register as a byref for GC, which seems unnecessary. But when is that CpObj code even used (currently)? It looks like only if the copy is larger than the unroll limit (64). |
|
How hard is it to properly report the GC references? I suppose these copies can't be all that big (64K max?) but my recollection from past conversations on things like this is that we really shouldn't block GC for more than say 1000 iterations, and this seems like it could end up running longer. |
|
We only unroll for 64 or 128 bytes in the case we disable GC. Although for BLK for anything that isn't unrolled we call memcpy, and that prevents GC also. |
The problem is the unrolling code wants to create a "byref" like pointer that is based on an actual object/byref pointer, but doesn't actually point to or within the object. So we can't report it to the GC because it wouldn't get updated properly. |
| if (blkNode->OperIs(GT_STORE_OBJ)) | ||
| { | ||
| if (!blkNode->AsObj()->GetLayout()->HasGCPtr()) | ||
| if (!blkNode->AsObj()->GetLayout()->HasGCPtr() || (isDstAddrLocal && (size <= copyBlockUnrollLimit))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a bug: before, if (isDstAddrLocal && (size <= copyBlockUnrollLimit)), we would set gtBlkOpGcUnsafe=true. Now, we won't unless !isSrcAddrLocal.
So, if both isSrcAddrLocal and isDstAddrLocal are true, we won't disable GC. However, the thing we are copying contains GC pointers in this case, and the GC pointers loaded into temporary registers aren't reported, so we have a GC hole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. I missed that case.
| { | ||
| blkNode->SetOper(GT_STORE_BLK); | ||
| } | ||
| else if (isDstAddrLocal && (size <= copyBlockUnrollLimit)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we don't need the isDstAddrLocal check: why can't we use this code path even if copying with src=heap, dst=heap, or src=stack, dst=heap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to consolidate the blkNode->SetOper(GT_STORE_BLK) case, but in general, we need to disable GC for following unroll cases:
- isDstAddrLocal = true (as of today)
- isDstAddrLocal = false (if dst is on heap)
- isSrcAddrLocal = false (if src is on heap)
- isSrcAddrLocal = true (since we want to disable GC regardless if dst is on stack/heap, src state won't matter).
So basically, if we are doing GT_STORE_BLK and unrolling, we need to disable GC. With that said, I think we should disable the GC inside genCodeForCpBlkUnroll() only for cases where we will be doing the adjustments i.e. srcOffsetAdjustment != 0 || dstOffsetAdjustment != 0 to limit it to scenarios that are really affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With that being the case, I missed a case for objects that has GC pointers. Fixed in #70053
src/coreclr/jit/lowerarmarch.cpp
Outdated
| { | ||
| blkNode->gtBlkOpKind = GenTreeBlk::BlkOpKindUnroll; | ||
|
|
||
| if (!isSrcAddrLocal || !isDstAddrLocal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these changes are specific to arm64. Shouldn't it be ifdef'ed so it doesn't affect arm32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hhm, today blkNode->gtBlkOpGcUnsafe = true is marked for both arm/arm64. This PR just extends it to more scenarios?
|
Failure is #69136 |
Need to merge to unblock clean CI runs
In #68085, I relaxed the heuristics under which we would use SIMD registers for copy block unroll code. The code generated uses temporary registers to addresses to copy (could be GC references) and they are not reported. As such, we should mark the unroll code region as GC non-interruptible to make sure GC doesn't kick in and collect the wrong objects.
Note: We were marking
gtBlkOpGcUnsafeonly for situation where destination address was local, but never checked about the source address. This PR expands it to situation when either src/dst addresses are non-local (on heap).Fixes: #68530