Skip to content

Merge of #13227 to trunk cause deadlocks on parallel Sys tests #13713

@jmid

Description

@jmid

The merge of #13227 seems to have introduced a regression causing a deadlock on parallel multicoretests of Sys.

$ dune exec src/sys/stm_tests.exe -- -v
random seed: 437123728
generated error fail pass / total     time test name
[✓] 1000    0    0 1000 / 1000     1.9s STM Sys test sequential
[ ]   60    0    0   60 /  200     5.0s STM Sys test parallel

This test worked fine up until yesterday's previous merge commit 86470c2 - Merge pull request #13694 from mndrix/manual-hash-variant.

Attaching gdb to the above reveals the following stack trace:

Stack trace:
Thread 4 (Thread 0x7c874f400640 (LWP 31382) "stm_tests.exe"):
#0  futex_wait (private=0, expected=2, futex_word=0x569f07463f30) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x569f07463f30, private=0) at ./nptl/lowlevellock.c:49
#2  0x00007c8765a98002 in lll_mutex_lock_optimized (mutex=0x569f07463f30) at ./nptl/pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=mutex@entry=0x569f07463f30) at ./nptl/pthread_mutex_lock.c:93
#4  0x0000569eedb75841 in caml_plat_lock_blocking (m=0x569f07463f30) at runtime/caml/platform.h:455
#5  backup_thread_func (v=0x569f07463e90) at runtime/domain.c:1088
#6  0x00007c8765a94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#7  0x00007c8765b26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 3 (Thread 0x7c8754e00640 (LWP 31381) "stm_tests.exe"):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x0000569eedb8f0e5 in caml_plat_futex_wait (undesired=2, ftx=0x569eedc72ae0 <stw_request>) at runtime/platform.c:307
#2  latchlike_wait (contested=2, unreleased=1, ftx=0x569eedc72ae0 <stw_request>) at runtime/platform.c:347
#3  caml_plat_latch_wait (latch=latch@entry=0x569eedc72ae0 <stw_request>) at runtime/platform.c:351
#4  0x0000569eedb7441e in caml_plat_barrier_wait (barrier=0x569eedc72ae0 <stw_request>) at runtime/caml/platform.h:320
#5  stw_wait_for_running (domain=0x7c8750002b80) at runtime/domain.c:1482
#6  stw_api_barrier (domain=domain@entry=0x7c8750002b80) at runtime/domain.c:1492
#7  0x0000569eedb75b68 in caml_try_run_on_all_domains_with_spin_work (sync=sync@entry=1, handler=handler@entry=0x569eedb8dbb0 <caml_stw_empty_minor_heap>, data=data@entry=0x0, leader_setup=leader_setup@entry=0x569eedb8c760 <caml_empty_minor_heap_setup>, enter_spin_callback=enter_spin_callback@entry=0x569eedb8c790 <caml_do_opportunistic_major_slice>, enter_spin_data=enter_spin_data@entry=0x0) at runtime/domain.c:1710
#8  0x0000569eedb8dd02 in caml_try_empty_minor_heap_on_all_domains () at runtime/minor_gc.c:859
#9  caml_empty_minor_heaps_once () at runtime/minor_gc.c:882
#10 0x0000569eedb762c2 in caml_domain_terminate (last=last@entry=false) at runtime/domain.c:2033
#11 0x0000569eedb76754 in domain_thread_func (v=<optimized out>) at runtime/domain.c:1267
#12 0x00007c8765a94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#13 0x00007c8765b26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 2 (Thread 0x7c874fe00640 (LWP 22201) "stm_tests.exe"):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x569f07463e74) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x569f07463e74) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x569f07463e74, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007c8765a93a41 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x569f07463e20, cond=0x569f07463e48) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=cond@entry=0x569f07463e48, mutex=mutex@entry=0x569f07463e20) at ./nptl/pthread_cond_wait.c:627
#5  0x0000569eedb8ef0d in caml_plat_wait (cond=cond@entry=0x569f07463e48, mut=mut@entry=0x569f07463e20) at runtime/platform.c:146
#6  0x0000569eedb7594f in backup_thread_func (v=0x569f07463d80) at runtime/domain.c:1091
#7  0x00007c8765a94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#8  0x00007c8765b26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 1 (Thread 0x7c8765d03740 (LWP 20146) "stm_tests.exe"):
#0  futex_wait (private=0, expected=2, futex_word=0x569f074bc530) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x569f074bc530, private=0) at ./nptl/lowlevellock.c:49
#2  0x00007c8765a972a3 in __pthread_mutex_cond_lock (mutex=mutex@entry=0x569f074bc530) at ../nptl/pthread_mutex_lock.c:93
#3  0x00007c8765a93934 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x569f074bc530, cond=0x569f074cee20) at ./nptl/pthread_cond_wait.c:616
#4  ___pthread_cond_wait (cond=cond@entry=0x569f074cee20, mutex=mutex@entry=0x569f074bc530) at ./nptl/pthread_cond_wait.c:627
#5  0x0000569eedb9539e in sync_condvar_wait (m=0x569f074bc530, c=0x569f074cee20) at runtime/sync_posix.h:116
#6  caml_ml_condition_wait (wcond=<optimized out>, wmut=<optimized out>) at runtime/sync.c:193
#7  <signal handler called>
#8  0x0000569eedb2d3be in camlStdlib__Domain$loop_752 () at domain.ml:292
#9  0x0000569eedb2c4ce in camlStdlib__Mutex$protect_277 () at mutex.ml:28
#10 0x0000569eedb2d35c in camlStdlib__Domain$join_749 () at domain.ml:297
#11 0x0000569eedab3bcf in camlSTM_domain$run_par_660 () at lib/STM_domain.ml:31
#12 0x0000569eedab3e44 in camlSTM_domain$agree_prop_par_715 () at lib/STM_domain.ml:39
#13 0x0000569eedab9b6a in camlUtil$fun_2471 () at lib/util.ml:5
#14 0x0000569eedacf4c0 in camlQCheck2$loop_3994 () at src/core/QCheck2.ml:1644
#15 0x0000569eedacf404 in camlQCheck2$run_law_3989 () at src/core/QCheck2.ml:1649
#16 0x0000569eedad01b1 in camlQCheck2$check_state_input_4057 () at src/core/QCheck2.ml:1771
#17 0x0000569eedad060e in camlQCheck2$check_cell_inner_9734 () at src/core/QCheck2.ml:1841
#18 0x0000569eedac1ad6 in camlQCheck_base_runner$aux_map_1444 () at src/runner/QCheck_base_runner.ml:422
#19 0x0000569eedb09333 in camlStdlib__List$map_334 () at list.ml:87
#20 0x0000569eedac13f3 in camlQCheck_base_runner$run_tests_inner_2444 () at src/runner/QCheck_base_runner.ml:431
#21 0x0000569eedac215d in camlQCheck_base_runner$run_tests_main_inner_3052 () at src/runner/QCheck_base_runner.ml:475
#22 0x0000569eedab348f in camlDune__exe__Stm_tests$entry () at src/sys/stm_tests.ml:329
#23 0x0000569eedaaa3f7 in caml_program ()
#24 <signal handler called>
#25 0x0000569eedb9a196 in caml_startup_common (pooling=<optimized out>, argv=0x7ffe7c62ec68) at runtime/startup_nat.c:127
#26 caml_startup_common (argv=0x7ffe7c62ec68, pooling=<optimized out>) at runtime/startup_nat.c:86
#27 0x0000569eedb9a20f in caml_startup_exn (argv=<optimized out>) at runtime/startup_nat.c:134
#28 caml_startup (argv=<optimized out>) at runtime/startup_nat.c:139
#29 caml_main (argv=<optimized out>) at runtime/startup_nat.c:146
#30 0x0000569eedaa9f72 in main (argc=<optimized out>, argv=<optimized out>) at runtime/main.c:37

To recreate the issue:

$ opam switch create . --empty
$ opam pin add -k path --inplace-build ocaml-variants.5.4.0+trunk-with-13227 .
$ opam install qcheck-core
$ git clone [email protected]:ocaml-multicore/multicoretests.git
$ cd multicoretests
$ dune exec src/sys/stm_tests.exe -- -v

CC: @gadmm @gasche

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions