-
Notifications
You must be signed in to change notification settings - Fork 5.3k
JIT: Run single EH region repair pass after layout #108634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/azp run runtime-coreclr outerloop, Antigen, Fuzzlyn |
|
Azure Pipelines successfully started running 3 pipeline(s). |
|
No diffs. Antigen and Fuzzlyn runs didn't hit anything related. Outerloop failures are #104927. cc @dotnet/jit-contrib, @AndyAyersMS PTAL. This doesn't look like a surgical fix, but I think it's preferable to trying to patch up the edge cases of the old implementation. That approach felt pattern-matchy and thus fragile, and keeping the temporary end block of a parent EH region up-to-date during moves in |
src/coreclr/jit/jiteh.cpp
Outdated
| { | ||
| assert(compHndBBtabCount != 0); | ||
| bool* const tryEndsSet = new (this, CMK_Generic) bool[compHndBBtabCount * 2]{}; | ||
| bool* const hndEndsSet = tryEndsSet + compHndBBtabCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we could use bitvectors here instead, and avoid heap allocations altogether, at the expense (?) of indexing speed. Any preference here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could instead to walk the EH table first and null out ebdTryLast and ebdHndLast, then use the null values there as your "not yet set" indicators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'm guessing it's faster/easier to vectorize zeroing out an array versus nulling out each clause's end pointers, though the reduction in local state seems ideal.
AndyAyersMS
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you run diffs you also need to look at all the zero-sized diffs to see if there are any EH clause diffs.
I don't know if you can see those in the CI reports.
src/coreclr/jit/jiteh.cpp
Outdated
| { | ||
| assert(compHndBBtabCount != 0); | ||
| bool* const tryEndsSet = new (this, CMK_Generic) bool[compHndBBtabCount * 2]{}; | ||
| bool* const hndEndsSet = tryEndsSet + compHndBBtabCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could instead to walk the EH table first and null out ebdTryLast and ebdHndLast, then use the null values there as your "not yet set" indicators?
I ran asmdiffs locally, and I didn't see anything reported, so I'm assuming this is truly a no-diff change? If we saw a change in the EH clauses due to different block placement, we'd see codegen diffs too, and if we saw the clauses change without any blocks moving, then that would probably be a correctness issue and trigger asserts, right? In other words, for a given block layout, there's only one correct set of EH clauses, so any diffs for the latter are wrong. |
|
Maybe another way of putting it -- do you see SPMI diffs on the repro case (with release compilers)? Presumably you should? Or else does the release JIT get the right results anyway? |
I see. With Checked compilers, I can see the EH clauses are now correct post-layout. Here's the problematic one: And the correct one (notice the end block for T2 being pushed back by one): I don't think we can dump EH tables in Release builds by default, though I can tweak this locally and take a look. Are you asking about this to evaluate the risk of this bug in product builds? |
In part, yes. It is good at this stage to be able to say what might happen if we don't fix this. |
|
Also note SPMI doesn't rely on dumping, it is comparing the binary data the JIT sends back to the VM (which includes the EH info), so release mode diffs "work" but may be harder to understand. |
|
Got it, I created a collection of the repro, and replayed it with Release builds. I got a zero-sized diff, so without the fix, it looks like we're propagating the incorrect EH clause info to the VM. For this particular method, that doesn't seem to affect behavior at runtime, but I can imagine this breaking stack walking for methods that throw in inconvenient places. |
|
@AndyAyersMS since we're seeing diffs with Release builds, do you think a .NET 9 fix is justified, or would you like to see a repro where stack walking breaks from this? I can try to put something together. |
|
I'd say we should go ahead and backport, given that we know of a case where we got things wrong. |
|
/backport to release/9.0 |
|
Started backporting to release/9.0: https://github.com/dotnet/runtime/actions/runs/11257587584 |
|
@amanasifkhalid backporting to release/9.0 failed, the patch most likely resulted in conflicts: $ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch
Applying: Add EH region ends pass
Applying: Remove old EH fixup logic
error: sha1 information is lacking or useless (src/coreclr/jit/compiler.h).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0002 Remove old EH fixup logic
Error: The process '/usr/bin/git' failed with exit code 128Please backport manually! |
|
@amanasifkhalid an error occurred while backporting to release/9.0, please check the run log for details! Error: git am failed, most likely due to a merge conflict. |
Fixes dotnet#108608. To simplify EH maintenance logic, leave EH regions in a deconstructed phase during block reordering, and correct the end block pointers of each region in a pass over the block list after. By walking the list backwards, we can short-circuit finding the end of each region, and we don't run into the issue of an EH region not being able to determine if it's at the end of its parent region due to some awkward placement of a sibling region.
Fixes #108608. To simplify EH maintenance logic, leave EH regions in a deconstructed phase during block reordering, and correct the end block pointers of each region in a pass over the block list after. By walking the list backwards, we can short-circuit finding the end of each region, and we don't run into the issue of an EH region not being able to determine if it's at the end of its parent region due to some awkward placement of a sibling region.