-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
The Gc torture test has come to a point, where it can trigger a Gc crashing bug with the bytecode interpreter. This can trigger on both on 5.2.0, on the 5.3 branch, and on trunk. I have not tried on 5.0.0 or 5.1.1 yet.
The test itself triggers list/string/array/bigarray allocations in both the parent domain and in two child domains and checks consistency of the results from Gc.quick_stat, etc. In between test iterations, we explicitly call Gc.major in single-domain mode in an attempt to reset the heap to a reasonable state. The test also uses a single call to a stub at start-up to support testing with OCAMLRUNPARAM="s=4096" set, but that can removed if unused.
This issue may or may not be the same bug observed in #13402.
As this test doesn't involve the Obj-using Domain.DLS I've gone for reporting it separately.
On GitHub actions CI I've observed these crashes with Ubuntu and glibc. For an example 5.3 crash see here:
https://github.com/ocaml-multicore/multicoretests/actions/runs/11074833760/job/30774502500?pr=469
Locally it seems easier to reproduce with musl C though.
To reproduce:
$ sudo apt-get install musl-tools
$ opam switch create . --empty
$ opam install . --inplace-build ocaml-option-musl ocaml-option-bytecode-only
$ opam install qcheck-core
$ git clone -b gc-test-musl-bytecode-repro https://github.com/ocaml-multicore/multicoretests.git
$ cd multicoretests
$ OCAMLRUNPARAM="o=20" dune exec src/gc/stm_tests_impl.exe --profile=debug-runtime -- -v
Above I suggest running with a reduced space_overhead and the debug runtime, but the bug can also trigger without these.
Here's a the result of a fresh run on my Linux box:
$ OCAMLRUNPARAM="o=20" dune exec src/gc/stm_tests_impl.exe --profile=debug-runtime -- -v
### OCaml runtime: debug mode ###
### set OCAMLRUNPARAM=v=0 to silence this message
Page size: 4096
random seed: 57333975
generated error fail pass / total time test name
[ ] 874 0 0 874 / 1000 500.8s STM implicit Gc test parallelBus error (core dumped)
with the following, not-so-descriptive stack trace:
(gdb) thread apply all bt
Thread 3 (LWP 1044096):
#0 0x00007fa68ff14b8c in memset () from /lib/ld-musl-x86_64.so.1
Backtrace stopped: Cannot access memory at address 0x7fa67cec89a8
Thread 2 (LWP 956593):
#0 0x00007fa68ff14bf8 in memset () from /lib/ld-musl-x86_64.so.1
#1 0x00007fa68ff05d35 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x00007fa67fc52b74 in ?? ()
#3 0x0000000000000000 in ?? ()
Thread 1 (LWP 956567):
#0 caml_interprete (prog=<optimized out>, prog_size=<optimized out>) at runtime/interp.c:829
#1 0x000060172030b6fd in caml_startup_code_exn (pooling=0, argv=0x7ffff57a4598, section_table_size=3833, section_table=0x60172032b020 <caml_sections> "\204\225\246\276", data_size=22927, data=0x60172032bf20 <caml_data> "\204\225\246\276", code_size=539264, code=0x6017203318c0 <caml_code>) at runtime/startup_byt.c:659
#2 caml_startup_code_exn (code=0x6017203318c0 <caml_code>, code_size=539264, data=0x60172032bf20 <caml_data> "\204\225\246\276", data_size=22927, section_table=0x60172032b020 <caml_sections> "\204\225\246\276", section_table_size=3833, pooling=0, argv=0x7ffff57a4598) at runtime/startup_byt.c:592
#3 0x000060172030b742 in caml_startup_code (code=code@entry=0x6017203318c0 <caml_code>, code_size=code_size@entry=539264, data=data@entry=0x60172032bf20 <caml_data> "\204\225\246\276", data_size=data_size@entry=22927, section_table=section_table@entry=0x60172032b020 <caml_sections> "\204\225\246\276", section_table_size=section_table_size@entry=3833, pooling=0, argv=0x7ffff57a4598) at runtime/startup_byt.c:673
#4 0x00006017202d5052 in main (argc=<optimized out>, argv=<optimized out>) at camlprim.c:26518