Ensure that register typing constraints are respected at join points#555
Ensure that register typing constraints are respected at join points#555mshinwell merged 1 commit intoocaml:4.03from
Conversation
| (* Float unboxing code must be sure to avoid this case. *) | ||
| assert false | ||
|
|
||
| let ge_component comp1 comp2 = |
There was a problem hiding this comment.
Couldn't this be implemented as comp1 = lub_component comp1 comp2? That would avoid having to check the code twice
There was a problem hiding this comment.
I tend to think it's clearer written out, actually.
|
Patch looks correct to me. |
|
Devil's advocate here: the bug has been here since we introduced GADTs in 4.00.0. Is it really urgent to fix it now? What's your assessment of the risk level for this change? |
|
My understanding is that the issue is sensibly more severe in presence of flambda than it was in previous releases, as it will aggressively inline these cases. |
|
I'm also kind of surprised this hasn't caused trouble before, given how long the GADT support has been merged for. It doesn't really seem an ideal change just before a release. On balance I think I'm pretty convinced about the soundness of the change, but I'm not completely sure about potential performance degradation. @xavierleroy : do you have an opinion on that? |
|
Can you characterize the problematic joint points? Are they exactly those that match on a GADT at the source level? If so, that would make it easier to reason about potential performance impact (most of the code doesn't use GADTs and would be unaffected, and I understand there are a few performance-sensitive uses of GADTs that could be inspected manually). Have you tried tracing the cases where the join decreases precision, for example on a compiler bootstrap? |
|
I'm not sure I have an exact characterisation of the problematic join points, but I think a GADT match must be involved, with one of the branches returning a result that forces the backend to allocate a register of type "Int" for that result (in fact one of type "Addr" would probably cause it to go wrong too) and another branch returning a value that might be boxed. In some cases the backend would then use the "Int" register as the result for the whole match, which might then go wrong if the live range for that register then extends forwards across a GC point (and if the boxed branch is taken in the match). For performance, it's actually cases that do not use GADTs, but might end up getting penalised, which we're worried about. One way we might be able to assess this is using OCamlPro's benchmarking suite which we were using for flambda, perhaps. |
|
@garrigue Do you have an opinion on this one? |
Actually one thing I am not sure about: Is Addr supposed to include Int? At the moment if one branch returns Addr and another returns Int then they will both be stored in an Addr register. Is that correct? If Addr is allowed to hold Int then it is not clear to me what the difference between Val and Addr is. |
|
Val is registered with the GC, Addr is not, as far as I recall. |
|
(I will check about having Int in Addr) |
|
Hmm, so I think if we use Addr occasionally rather than Int, it isn't that much of a big deal. It will inhibit CSE at certain points. That might be all, in fact, with the current code. |
|
Not to hijack the discussion, but could someone explain to me what the join points are? |
|
When two control flow paths come together, such as at the end of a conditional expression. |
|
Thanks. And normally join points are places where we may register GC roots? |
|
No. The roots are effectively "registered" at a point where one OCaml function may call another. The problem is that at the join point, when generating code, a decision has to be made about which register to use to hold the result of the conditional. In some cases the backend reuses the result register from one of the two code paths (and generates a move at the end of the other code path into that register). However this is wrong if the reused register will not be registered as a root (at any subsequent point where it might need to be, before it becomes dead) and the value being moved into it on the other code path may need to be registered. |
|
Awesome. Thanks again. |
asmcomp/cmm.ml
Outdated
| | Int, Val -> Val | ||
| | Addr, Int -> Addr | ||
| | Addr, Addr -> Addr | ||
| | Addr, Val -> Val |
There was a problem hiding this comment.
"Addr" denotes a derived pointer that might point in the middle of a heap block. The GC is unable to treat such derived pointers correctly: it cannot follow them and it cannot update them after moving the underlying object. This is why it is a compile-time fatal error to have a variable of type Addr live across a GC or call point.
Now, if you have a conditional where x : Addr in one branch and x : Val in the other, x may be a derived pointer and therefore must be given type Addr after the join point. If it dies before the next GC / call point, everything will be fine. Otherwise it is a serious code generation error and the compiler should bomb.
In summary: Addr, Val -> Addr and symmetrically.
There was a problem hiding this comment.
Ah yes, my mistake, the relation is indeed the other way around.
|
Patch updated following @xavierleroy 's comment |
|
Apart the Addr corner case, the patch looks good to me. I'm not worried at all about performance degradation: most C-- variables, and especially most of those involved in a join point, are of type Val already. |
|
One last, cosmetic suggestion. While checking the code I drew the lattice of types on a sheet of paper. What about including it in a comment as ASCII art? E.g. |
f522408 to
37de8a1
Compare
|
Good idea. I've added to the comment a note about [Addr] as well. |
| let allocate_garbage () = | ||
| for i = 0 to 100 do | ||
| ignore (Array.make 200 0.0) | ||
| done |
There was a problem hiding this comment.
Whether or not this function allocates anything on the minor heap is dependent on word size.
On 32-bit machines, a 200-float array is 400 words, bigger than Max_young_wosize, and nothing is allocated on the minor heap. Is this the intent?
|
Even in theory, what is the possible performance degradation of this patch? That the frame tables may be slightly larger? |
Additional roots for the GC to try to mark even though they are always integers. |
|
Yes, larger frame tables, and a bit more time spent in the GC scanning the stack. |
runtime: CAML_TRACE_VERSION is now set to a Multicore specific value
…10354 and PR#10387) (ocaml#555) * Refactor Proc.op_is_pure and fix Mach.operation_can_raise These two predicates test properties of Mach operations. For the standard operations, the results are independent of the target platform. For the platform-specific operations, results vary. The "is pure" predicate was implemented in a platform-specific file, $ARCH/proc.ml. The treatment of standard operations was duplicated. The "can raise" predicate was implemented in a platform-independent file, mach.ml. All specific operations were assumed not to raise, which is incorrect for ARM and ARM64 and their "shiftcheckbound" specific operation. (See ocaml#10339.) This commit refactors the two predicates as follows: - `Arch.operation_is_pure` and `Arch.operation_can_raise` predicates are added to each platform description file. They deal with specific operations only. - `Mach.operation_is_pure` was added and `Mach.operation_can_raise` was fixed to implement the test for standard operations and delegate to the Arch functions for specific operations. * Fix handling of exception-raising specific operations during spilling The ARM and ARM64 architectures have `Ishiftcheckbound` specific instructions that can raise an Invalid_argument exception. This exception can, in turn, be handled in the same function (see PR#10339 for an example). However, these specific instructions were not taken into account during insertion of spill code in asmcomp/spill.ml. The consequence is a variable that is reloaded from stack in the exception handler, yet has not been spilled to stack before. This commit fixes the issue by using the recently-fixed `Mach.operation_can_raise` predicate to handle operations during spill code insertion, instead of an ad-hoc pattern-matching. Fixes: PR#10339 * Fix handling of exception-raising specific operations during liveness analysis (ocaml#10387) This is a follow-up to 15e6354 and ocaml#10354. Use the `Mach.operation_can_raise` predicate instead of an ad-hoc pattern matching to spot operations where the variables live at entry to the enclosing exception handler must also be live. The ad-hoc pattern matching was missing the `Ishiftcheckbound` specific operations of ARM and ARM64. * Handle flambda-backend operations * Update cfg * Iprobe_is_enabled is pure * Exhaustive match * Fix after rebase * Fix arm64 after rebase and cleanup Co-authored-by: Xavier Leroy <[email protected]> Co-authored-by: Xavier Leroy <[email protected]>
fe8a98b flambda-backend: Save Mach as Cfg after Selection (ocaml#624) 2b205d8 flambda-backend: Clean up algorithms (ocaml#611) 524f0b4 flambda-backend: Initial refactoring of To_cmm (ocaml#619) 0bf75de flambda-backend: Refactor and correct the "is pure" and "can raise" (port upstream PR#10354 and PR#10387) (ocaml#555) d234bfd flambda-backend: Cpp mangling is now a configuration option (ocaml#614) 20fc614 flambda-backend: Check that stack frames are not too large (ocaml#10085) (ocaml#561) 5fc2e95 flambda-backend: Allow CSE of immutable loads across stores (port upstream PR#9562) (ocaml#562) 2a650de flambda-backend: Backport commit fc95347 from trunk (ocaml#584) 31651b8 flambda-backend: Improved ARM64 code generation (port upstream PR#9937) (ocaml#556) f0b6d68 flambda-backend: Simplify processing and remove dead code (error paths) in asmlink (port upstream PR#9943) (ocaml#557) 90c6746 flambda-backend: Improve code-generation for inlined comparisons (port upstream PR#10228) (ocaml#563) git-subtree-dir: ocaml git-subtree-split: fe8a98b
…port upstream PR#10354 and PR#10387) (ocaml#555) * Refactor Proc.op_is_pure and fix Mach.operation_can_raise These two predicates test properties of Mach operations. For the standard operations, the results are independent of the target platform. For the platform-specific operations, results vary. The "is pure" predicate was implemented in a platform-specific file, $ARCH/proc.ml. The treatment of standard operations was duplicated. The "can raise" predicate was implemented in a platform-independent file, mach.ml. All specific operations were assumed not to raise, which is incorrect for ARM and ARM64 and their "shiftcheckbound" specific operation. (See ocaml#10339.) This commit refactors the two predicates as follows: - `Arch.operation_is_pure` and `Arch.operation_can_raise` predicates are added to each platform description file. They deal with specific operations only. - `Mach.operation_is_pure` was added and `Mach.operation_can_raise` was fixed to implement the test for standard operations and delegate to the Arch functions for specific operations. * Fix handling of exception-raising specific operations during spilling The ARM and ARM64 architectures have `Ishiftcheckbound` specific instructions that can raise an Invalid_argument exception. This exception can, in turn, be handled in the same function (see PR#10339 for an example). However, these specific instructions were not taken into account during insertion of spill code in asmcomp/spill.ml. The consequence is a variable that is reloaded from stack in the exception handler, yet has not been spilled to stack before. This commit fixes the issue by using the recently-fixed `Mach.operation_can_raise` predicate to handle operations during spill code insertion, instead of an ad-hoc pattern-matching. Fixes: PR#10339 * Fix handling of exception-raising specific operations during liveness analysis (ocaml#10387) This is a follow-up to 15e6354 and ocaml#10354. Use the `Mach.operation_can_raise` predicate instead of an ad-hoc pattern matching to spot operations where the variables live at entry to the enclosing exception handler must also be live. The ad-hoc pattern matching was missing the `Ishiftcheckbound` specific operations of ARM and ARM64. * Handle flambda-backend operations * Update cfg * Iprobe_is_enabled is pure * Exhaustive match * Fix after rebase * Fix arm64 after rebase and cleanup Co-authored-by: Xavier Leroy <[email protected]> Co-authored-by: Xavier Leroy <[email protected]>
It has been known for some time that great care has to be taken when using Obj.magic to ensure that, at join points in the control flow graph, a register "known to hold only immediates" does not end up being used to hold a value that might be a pointer. If this happens, it will not be registered as a GC root, which can cause it to become invalid. (The typical fix is something like using None instead of "unit".)
After having fixed a piece of code having that problem this morning it became apparent that the following function, which does not use Obj.magic, suffers from the same problem:
If the function "f" is inlined into another function then the situation above can arise, as exhibited in the test case on this pull request, which segfaults.
The only reasonable solution it would appear at present is to ensure that at join points we respect the register typing. This may produce slightly worse code in some cases. For such cases the correct solution to solve this problem is probably the propagation of more information about what values might be held in which identifiers right through from the type checker.
I think this is probably serious enough to warrant putting in 4.03.