Skip to content

s390x segfault/abort on 5.1 and trunk when running Effect test #12486

@jmid

Description

@jmid

Today we observed another segfault while running multicoretests.
This one is triggered on s390x using OCaml from the 5.1+trunk and 5.2+trunk opam packages (these follow the 5.1 and trunk branches AFAIU). The test in question does not involve parallelism, but is triggered while stress testing Effects.

A reproducible branch is available here: https://github.com/ocaml-multicore/multicoretests/commits/s390x-crash-repro
The test crashes 10/10 on both 5.1 and trunk so the issue seems deterministic.
The file consists of 4 positive tests and 4 negative ones and the crash consistently occurs in the 2nd negative test:

### OCaml runtime: debug mode ###
random seed: 260017153
generated error  fail  pass / total     time test name

[ ]     0     0     0     0 / 20000     0.0s Lin DSL ref int test with Effect
[ ]     0     0     0     0 / 20000     0.0s Lin DSL ref int test with Effect (generating)
[✓] 20000     0     0 20000 / 20000     0.5s Lin DSL ref int test with Effect

[ ]     0     0     0     0 / 20000     0.0s Lin DSL ref int64 test with Effect
[✓] 20000     0     0 20000 / 20000     0.5s Lin DSL ref int64 test with Effect

[ ]     0     0     0     0 / 20000     0.0s Lin DSL CList int test with Effect
[✓] 20000     0     0 20000 / 20000     0.6s Lin DSL CList int test with Effect

[ ]     0     0     0     0 / 20000     0.0s Lin DSL CList int64 test with Effect
[✓] 20000     0     0 20000 / 20000     0.7s Lin DSL CList int64 test with Effect

[ ]     0     0     0     0 / 20000     0.0s negative Lin DSL ref int test with Effect
[✓]     1     0     1     0 / 20000     0.0s negative Lin DSL ref int test with Effect

[ ]     0     0     0     0 / 20000     0.0s negative Lin DSL ref int64 test with Effect[00] file runtime/fiber.c; line 250 ### Assertion failed: d
File "src/neg_tests/dune", line 17, characters 7-27:
17 |  (name lin_tests_dsl_effect)
            ^^^^^^^^^^^^^^^^^^^^
(cd _build/default/src/neg_tests && ./lin_tests_dsl_effect.exe --verbose)
Command got signal ABRT.

The negative Lin DSL ref int64 test with Effect is just expected to raise an Unhandled exception and report it through the test runner, like the previous negative Lin DSL ref int test with Effect.

Above I've run the test under the debug runtime, where it consistently aborts while failing this assertion:

CAMLassert(d);

This could indicate an issue related to s390x frame descriptors.

If I comment out the first 4 tests the crash no longer happens, so I suspect these need to run to bring the heap to a particular shape. Commenting out the last two tests had the same effect.

To recreate the issue:

  • clone and checkout the above repo branch
  • opam install dune qcheck-core
  • dune build @ci -j1 --no-buffer

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions