-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Fix delivery of effect-related exceptions, take 2 (#12486) #12535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
FYI, elsewhere with @dustanddreams we've refined the reproducer initially mentioned in #12486 (comment) Running with a reduced heap is sufficient to segfault reliably (5/5 runs on my local machine): (you don't even need the seed |
|
I've built a local opam switch with frame pointers enabled and can confirm that 10/10 runs of both |
|
@jmid can you also test with |
This is with the following @OlivierNicole @fabbing - I am |
I gave it a try but both |
runtime/amd64.S
Outdated
| jmp *(%rbx) | ||
| 2: TSAN_ENTER_FUNCTION(0) | ||
| 2: ENTER_FUNCTION | ||
| TSAN_SAVE_CALLER_REGS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lack of TSAN_SAVE/RESTORE_CALLER_REGS here was deliberate for performance reasons: we found out taht saving too many unnecessary registers would hurt performance with TSan.
Considering that what we need to do here is almost exactly what is done in caml_c_call prologue, and contrary to what I said over there, I suggest we go the same way you did with #12530 and rather jmp to caml_c_call prologue with the GCALL label.
There will be a small performance cost, and hopefully this won't happen often, but also has the big benefit of unifying the amd64 code with s390x.
@@ -1178,12 +1178,8 @@ CFI_STARTPROC
UPDATE_BASE_POINTER(%rcx)
SWITCH_OCAML_STACKS
jmp *(%rbx)
-2: ENTER_FUNCTION
- TSAN_SAVE_CALLER_REGS
- TSAN_ENTER_FUNCTION(0)
- TSAN_RESTORE_CALLER_REGS
- LEA_VAR(caml_raise_continuation_already_resumed, %rax)
- jmp LBL(caml_c_call)
+2: LEA_VAR(caml_raise_continuation_already_resumed, %rax)
+ jmp GCALL(caml_c_call)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there used to be TSAN_ENTER_FUNCTION(0) in this code path, I only added ENTER_FUNCTION to set up the stack frame correctly and TSAN_{SAVE,RESTORE}_CALLER_REGS to wrap the TSAN_ENTER_FUNCTION call.
However I notice that caml_ml_array_bound_error also uses TSAN_ENTER_FUNCTION without the SAVE/RESTORE dance.
I am pushing a new version of this code which mimics caml_ml_array_bound_error in this case.
|
If we want to add a regression test case I've cooked up the following, which crashes my local open Effect
type _ t += Yield : unit t
let rec burn l =
if List.hd l > 12 then ()
else
burn (l @ l |> List.map (fun x -> x + 1))
let foo l =
burn l;
perform Yield
let bar i = foo [i]
let _ =
for _ = 1 to 10_000 do
try bar 8
with Unhandled _ -> ()
done |
gasche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that the change makes the code strictly more correct, so I am approving. (It only makes a different with frame pointers enabled.)
I think it would be useful if the TSan people could leave an explicit trace in the code of the register-saving decisions: it would have made the work on this PR easier, and it may show up again in the future. (It will also be useful documentation if someone less knowledgeable tries to port the TSan runtime support to another backend.)
|
@dustanddreams could you add the testcase proposed by @jmid (it's a good testcase because it runs instantly) to the testsuite, for example as a new program (To run a new test file you can just run |
Sure, let me know if the file looks good to you. |
|
We've been able to reproduce the crash on |
| @@ -0,0 +1,37 @@ | |||
| (* TEST | |||
| frame_pointers; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really want to restrict the test to only setups with frame pointers enabled? The current fix is for a bug that only occurs with frame pointers, but I assume that the test may have caught the non-frame-pointers-specific s390x issue, and could be helpful in the test harness of new backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(To clarify: my gut feeling is that we do not want the frame_pointers condition here, and I am waiting to see what @dustanddreams thinks before merging, with or without the condition.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right; although the code was only incorrect with frame pointers enabled, this was not the case on s390x. But then, should it really belong to the tests/frame-pointers directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then, should it really belong to the tests/frame-pointers directory?
Honestly I don't really care, please choose the test directory that you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to effects, then.
I agree; I intend to do it in our fix PR. |
|
We may have found one of the issues, the one involving the combination of |
This repairs operation when the compiler is built with --enable-frame-pointers and either Effect.Unhandled or Effect.Continuation_already_resumed needs to be raised.
|
As a change for the fp and TSAN mode, it seems better to me to cherry-pick it to 5.1 (if there are no objections from reviewers) in prevision of the upcoming 5.1 release. |
|
That makes sense. Another fix for TSan + fp is #12561. |
|
TSan won't be part of new 5.1.x, will it? |
|
Ah, yes, it had slipped my mind that TSan won’t be in 5.1.1. What @fabbing said. |
Fix delivery of effect-related exceptions, take 2 (ocaml#12486) (cherry picked from commit 01f737a)
This is similar to #12530, but on amd64, in order to repair operation when the runtime is built with
--enable-frame-pointers.Tail calls to
caml_c_callare adjusted to always haveENTER_FUNCTIONas well as TSan instrumentation, so that the stack layout matches what is expected.