Skip to content

Conversation

@amanasifkhalid
Copy link
Contributor

@amanasifkhalid amanasifkhalid commented Oct 8, 2024

Fixes #108608. To simplify EH maintenance logic, leave EH regions in a deconstructed phase during block reordering, and correct the end block pointers of each region in a pass over the block list after. By walking the list backwards, we can short-circuit finding the end of each region, and we don't run into the issue of an EH region not being able to determine if it's at the end of its parent region due to some awkward placement of a sibling region.

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 8, 2024
@amanasifkhalid
Copy link
Contributor Author

/azp run runtime-coreclr outerloop, Antigen, Fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@amanasifkhalid amanasifkhalid marked this pull request as ready for review October 8, 2024 14:16
@amanasifkhalid
Copy link
Contributor Author

No diffs. Antigen and Fuzzlyn runs didn't hit anything related. Outerloop failures are #104927.

cc @dotnet/jit-contrib, @AndyAyersMS PTAL. This doesn't look like a surgical fix, but I think it's preferable to trying to patch up the edge cases of the old implementation. That approach felt pattern-matchy and thus fragile, and keeping the temporary end block of a parent EH region up-to-date during moves in fgMoveColdBlocks incurred diffs, since we use these temporary end pointers as insertion points for cold blocks. The diffs looked like an improvement, but we probably want to avoid codegen churn for .NET 9 at this point, and I'm planning on making fgMoveColdBlocks more EH-agnostic for .NET 10 anyway. I hope the simplicity of the new approach justifies its case for .NET 9, though I'm open to suggestions for more focused fixes -- either way, this is the approach I'm thinking of taking for .NET 10.

{
assert(compHndBBtabCount != 0);
bool* const tryEndsSet = new (this, CMK_Generic) bool[compHndBBtabCount * 2]{};
bool* const hndEndsSet = tryEndsSet + compHndBBtabCount;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could use bitvectors here instead, and avoid heap allocations altogether, at the expense (?) of indexing speed. Any preference here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could instead to walk the EH table first and null out ebdTryLast and ebdHndLast, then use the null values there as your "not yet set" indicators?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'm guessing it's faster/easier to vectorize zeroing out an array versus nulling out each clause's end pointers, though the reduction in local state seems ideal.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you run diffs you also need to look at all the zero-sized diffs to see if there are any EH clause diffs.

I don't know if you can see those in the CI reports.

{
assert(compHndBBtabCount != 0);
bool* const tryEndsSet = new (this, CMK_Generic) bool[compHndBBtabCount * 2]{};
bool* const hndEndsSet = tryEndsSet + compHndBBtabCount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could instead to walk the EH table first and null out ebdTryLast and ebdHndLast, then use the null values there as your "not yet set" indicators?

@amanasifkhalid
Copy link
Contributor Author

When you run diffs you also need to look at all the zero-sized diffs to see if there are any EH clause diffs.

I ran asmdiffs locally, and I didn't see anything reported, so I'm assuming this is truly a no-diff change? If we saw a change in the EH clauses due to different block placement, we'd see codegen diffs too, and if we saw the clauses change without any blocks moving, then that would probably be a correctness issue and trigger asserts, right? In other words, for a given block layout, there's only one correct set of EH clauses, so any diffs for the latter are wrong.

@AndyAyersMS
Copy link
Member

Maybe another way of putting it -- do you see SPMI diffs on the repro case (with release compilers)? Presumably you should? Or else does the release JIT get the right results anyway?

@amanasifkhalid
Copy link
Contributor Author

Maybe another way of putting it -- do you see SPMI diffs on the repro case (with release compilers)? Presumably you should? Or else does the release JIT get the right results anyway?

I see. With Checked compilers, I can see the EH clauses are now correct post-layout. Here's the problematic one:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight   [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1    [000..019)-> BB02(0.5),BB11(0.5)     ( cond )                     i
BB02 [0037]  1       BB01                  8    [0B3..0CF)-> BB03(0.5),BB11(0.5)     ( cond )                     i bwd
BB03 [0003]  1  2    BB02                  2    [01A..01E)-> BB04(1),BB06(0)         ( cond ) T2      try {       i keep loophead hascall bwd
BB04 [0042]  2  2    BB03,BB06             2    [01E..???)-> BB05(1)                 (always) T2                  i keep hascall bwd
BB05 [0014]  4  2    BB13,BB05,BB04,BB08 256    [026..07E)-> BB07(0.5),BB05(0.5)     ( cond ) T2                  i loophead nullcheck bwd bwd-src
BB06 [0044]  1  2    BB03                  0    [01E..01E)-> BB04(1)                 (always) T2                  rare internal
BB07 [0011]  1  1    BB05                128    [???..???)-> BB08(1),BB09(0)         ( cond ) T1      try {       i keep hascall bwd
BB08 [0045]  2  1    BB07,BB09           128    [057..???)-> BB05(1)                 (always) T1                  i keep hascall bwd
BB09 [0047]  1  1    BB07                  0    [???..???)-> BB08(1)                 (always) T1      } }         rare internal
BB10 [0008]  0  0                          0    [03E..04F)                           (throw ) T0      try { }     i rare keep bwd
BB11 [0041]  2       BB01,BB02             1    [019..0D0)                           (return)                     i
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ funclets follow
BB12 [0009]  1  2  0                       0    [04F..057)                           (throw ) T2 H0 F fault { }   i rare keep flet nullcheck bwd
BB13 [0012]  1  2  1                       0    [061..064)-> BB05(1)                 ( cret ) T2 H1 F catch { }   i rare keep xentry flet bwd
BB14 [0016]  1     2                       0    [080..095)                           (falret)    H2 F fault { }   i rare keep xentry flet bwd
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

***************  Exception Handling table
index  eTry, eHnd
  0  ::   2        - Try at BB10..BB10 [03E..04F), Fault   at BB12..BB12 [04F..057)
  1  ::   2        - Try at BB07..BB09 [058..061), Handler at BB13..BB13 [061..064)
  2  ::            - Try at BB03..BB09 [01B..080), Fault   at BB14..BB14 [080..095)

And the correct one (notice the end block for T2 being pushed back by one):

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight   [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1    [000..019)-> BB02(0.5),BB11(0.5)     ( cond )                     i
BB02 [0037]  1       BB01                  8    [0B3..0CF)-> BB03(0.5),BB11(0.5)     ( cond )                     i bwd
BB03 [0003]  1  2    BB02                  2    [01A..01E)-> BB04(1),BB06(0)         ( cond ) T2      try {       i keep loophead hascall bwd
BB04 [0042]  2  2    BB03,BB06             2    [01E..???)-> BB05(1)                 (always) T2                  i keep hascall bwd
BB05 [0014]  4  2    BB13,BB05,BB04,BB08 256    [026..07E)-> BB07(0.5),BB05(0.5)     ( cond ) T2                  i loophead nullcheck bwd bwd-src
BB06 [0044]  1  2    BB03                  0    [01E..01E)-> BB04(1)                 (always) T2                  rare internal
BB07 [0011]  1  1    BB05                128    [???..???)-> BB08(1),BB09(0)         ( cond ) T1      try {       i keep hascall bwd
BB08 [0045]  2  1    BB07,BB09           128    [057..???)-> BB05(1)                 (always) T1                  i keep hascall bwd
BB09 [0047]  1  1    BB07                  0    [???..???)-> BB08(1)                 (always) T1      }           rare internal
BB10 [0008]  0  0                          0    [03E..04F)                           (throw ) T0      try { } }   i rare keep bwd
BB11 [0041]  2       BB01,BB02             1    [019..0D0)                           (return)                     i
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ funclets follow
BB12 [0009]  1  2  0                       0    [04F..057)                           (throw ) T2 H0 F fault { }   i rare keep flet nullcheck bwd
BB13 [0012]  1  2  1                       0    [061..064)-> BB05(1)                 ( cret ) T2 H1 F catch { }   i rare keep xentry flet bwd
BB14 [0016]  1     2                       0    [080..095)                           (falret)    H2 F fault { }   i rare keep xentry flet bwd
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

***************  Exception Handling table
index  eTry, eHnd
  0  ::   2        - Try at BB10..BB10 [03E..04F), Fault   at BB12..BB12 [04F..057)
  1  ::   2        - Try at BB07..BB09 [058..061), Handler at BB13..BB13 [061..064)
  2  ::            - Try at BB03..BB10 [01B..080), Fault   at BB14..BB14 [080..095)

I don't think we can dump EH tables in Release builds by default, though I can tweak this locally and take a look. Are you asking about this to evaluate the risk of this bug in product builds?

@AndyAyersMS
Copy link
Member

I don't think we can dump EH tables in Release builds by default, though I can tweak this locally and take a look. Are you asking about this to evaluate the risk of this bug in product builds?

In part, yes. It is good at this stage to be able to say what might happen if we don't fix this.

@AndyAyersMS
Copy link
Member

Also note SPMI doesn't rely on dumping, it is comparing the binary data the JIT sends back to the VM (which includes the EH info), so release mode diffs "work" but may be harder to understand.

@amanasifkhalid
Copy link
Contributor Author

amanasifkhalid commented Oct 8, 2024

Got it, I created a collection of the repro, and replayed it with Release builds. I got a zero-sized diff, so without the fix, it looks like we're propagating the incorrect EH clause info to the VM. For this particular method, that doesn't seem to affect behavior at runtime, but I can imagine this breaking stack walking for methods that throw in inconvenient places.

@amanasifkhalid
Copy link
Contributor Author

@AndyAyersMS since we're seeing diffs with Release builds, do you think a .NET 9 fix is justified, or would you like to see a repro where stack walking breaks from this? I can try to put something together.

@AndyAyersMS
Copy link
Member

I'd say we should go ahead and backport, given that we know of a case where we got things wrong.

@amanasifkhalid amanasifkhalid merged commit 87273d6 into dotnet:main Oct 9, 2024
@amanasifkhalid amanasifkhalid deleted the eh-region-ends branch October 9, 2024 14:55
@amanasifkhalid
Copy link
Contributor Author

/backport to release/9.0

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2024

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2024

@amanasifkhalid backporting to release/9.0 failed, the patch most likely resulted in conflicts:

$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Applying: Add EH region ends pass
Applying: Remove old EH fixup logic
error: sha1 information is lacking or useless (src/coreclr/jit/compiler.h).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0002 Remove old EH fixup logic
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2024

@amanasifkhalid an error occurred while backporting to release/9.0, please check the run log for details!

Error: git am failed, most likely due to a merge conflict.

rzikm pushed a commit to rzikm/dotnet-runtime that referenced this pull request Oct 11, 2024
Fixes dotnet#108608. To simplify EH maintenance logic, leave EH regions in a deconstructed phase during block reordering, and correct the end block pointers of each region in a pass over the block list after. By walking the list backwards, we can short-circuit finding the end of each region, and we don't run into the issue of an EH region not being able to determine if it's at the end of its parent region due to some awkward placement of a sibling region.
@github-actions github-actions bot locked and limited conversation to collaborators Nov 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Assertion failed 'bbNumTryLast <= bbNumOuterTryLast' during 'Optimize layout'

2 participants