-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Today we observed another segfault while running multicoretests.
This one is triggered on s390x using OCaml from the 5.1+trunk and 5.2+trunk opam packages (these follow the 5.1 and trunk branches AFAIU). The test in question does not involve parallelism, but is triggered while stress testing Effects.
A reproducible branch is available here: https://github.com/ocaml-multicore/multicoretests/commits/s390x-crash-repro
The test crashes 10/10 on both 5.1 and trunk so the issue seems deterministic.
The file consists of 4 positive tests and 4 negative ones and the crash consistently occurs in the 2nd negative test:
### OCaml runtime: debug mode ###
random seed: 260017153
generated error fail pass / total time test name
[ ] 0 0 0 0 / 20000 0.0s Lin DSL ref int test with Effect
[ ] 0 0 0 0 / 20000 0.0s Lin DSL ref int test with Effect (generating)
[✓] 20000 0 0 20000 / 20000 0.5s Lin DSL ref int test with Effect
[ ] 0 0 0 0 / 20000 0.0s Lin DSL ref int64 test with Effect
[✓] 20000 0 0 20000 / 20000 0.5s Lin DSL ref int64 test with Effect
[ ] 0 0 0 0 / 20000 0.0s Lin DSL CList int test with Effect
[✓] 20000 0 0 20000 / 20000 0.6s Lin DSL CList int test with Effect
[ ] 0 0 0 0 / 20000 0.0s Lin DSL CList int64 test with Effect
[✓] 20000 0 0 20000 / 20000 0.7s Lin DSL CList int64 test with Effect
[ ] 0 0 0 0 / 20000 0.0s negative Lin DSL ref int test with Effect
[✓] 1 0 1 0 / 20000 0.0s negative Lin DSL ref int test with Effect
[ ] 0 0 0 0 / 20000 0.0s negative Lin DSL ref int64 test with Effect[00] file runtime/fiber.c; line 250 ### Assertion failed: d
File "src/neg_tests/dune", line 17, characters 7-27:
17 | (name lin_tests_dsl_effect)
^^^^^^^^^^^^^^^^^^^^
(cd _build/default/src/neg_tests && ./lin_tests_dsl_effect.exe --verbose)
Command got signal ABRT.
The negative Lin DSL ref int64 test with Effect is just expected to raise an Unhandled exception and report it through the test runner, like the previous negative Lin DSL ref int test with Effect.
Above I've run the test under the debug runtime, where it consistently aborts while failing this assertion:
Line 250 in be72b7b
| CAMLassert(d); |
This could indicate an issue related to s390x frame descriptors.
If I comment out the first 4 tests the crash no longer happens, so I suspect these need to run to bring the heap to a particular shape. Commenting out the last two tests had the same effect.
To recreate the issue:
- clone and checkout the above repo branch
opam install dune qcheck-coredune build @ci -j1 --no-buffer