Skip to content

Segfault on MacOSX with trunk #11226

@jmid

Description

@jmid

I've been chasing a segfault that is triggered on MacOSX. To setup and reproduce:

I can pretty consistently reproduce by running the following (9/10 times or so):

$ dune exec src/lazy/lazy_lin_test.exe -- -v -s 249901845
random seed: 249901845
generated error fail pass / total     time test name
[ ]   19    0    0   19 /  100    22.4s Linearizable lazy test with DomainSegmentation fault: 11

An attempt at reducing the problem is also available. This does not crash as consistently - but the code is a bit simpler and has fewer dependencies:

$ dune exec src/lazy/lazy_lin_reduced.exe
0 t                                 
1 t
2 t
3 t
4 t
5 t
6 CamlinternalLazy.Undefined
7 CamlinternalLazy.Undefined
8 Segmentation fault: 11

What (I think) I know so far:

Here's first the output of an lldb run without the debug runtime which stops with a EXC_BAD_ACCESS:

$ lldb _build/default/src/lazy/lazy_lin_reduced.exe
(lldb) target create "_build/default/src/lazy/lazy_lin_reduced.exe"
Current executable set to '/Users/jmi/software/ocaml-04-28-2022-11213/multicoretests/_build/default/src/lazy/lazy_lin_reduced.exe' (x86_64).
(lldb) run
Process 1503 launched: '/Users/jmi/software/ocaml-04-28-2022-11213/multicoretests/_build/default/src/lazy/lazy_lin_reduced.exe' (x86_64)
0 t
1 t
2 t
3 t
4 t
5 t
6 CamlinternalLazy.Undefined
7 CamlinternalLazy.Undefined
8 Process 1503 stopped
* thread #3, name = 'Domain3', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00000001000b7cf4 lazy_lin_reduced.exe`caml_c_call + 4
lazy_lin_reduced.exe`caml_c_call:
->  0x1000b7cf4 <+4>:  movq   %rsp, (%r10)
    0x1000b7cf7 <+7>:  movq   0x30(%r14), %r11
    0x1000b7cfb <+11>: movq   %rsp, 0x8(%r11)
    0x1000b7cff <+15>: movq   %r10, (%r11)
Target 0: (lazy_lin_reduced.exe) stopped.
(lldb) bt all
lazy_lin_reduced.exe was compiled with optimization - stepping may behave oddly; variables may not be available.
  thread #1, name = 'Domain0', queue = 'com.apple.main-thread'
    frame #0: 0x00007fff20608cce libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff2063be49 libsystem_pthread.dylib`_pthread_cond_wait + 1298
    frame #2: 0x00000001000b2fd8 lazy_lin_reduced.exe`caml_ml_condition_wait [inlined] sync_condvar_wait(c=0x0000000100515290, m=0x0000000100515250) at sync_posix.h:122:10 [opt]
    frame #3: 0x00000001000b2fcd lazy_lin_reduced.exe`caml_ml_condition_wait(wcond=<unavailable>, wmut=<unavailable>) at sync.c:172:13 [opt]
    frame #4: 0x00000001000b7d0b lazy_lin_reduced.exe`caml_c_call + 27
    frame #5: 0x00000001000556bc lazy_lin_reduced.exe`camlStdlib__Domain__loop_718 + 44
    frame #6: 0x000000010005565d lazy_lin_reduced.exe`camlStdlib__Domain__join_713 + 141
    frame #7: 0x000000010000837f lazy_lin_reduced.exe`camlDune__exe__Lazy_lin_reduced__lin_prop_domain_754 + 287
    frame #8: 0x000000010000907f lazy_lin_reduced.exe`camlUtil__repeat_268 + 95
    frame #9: 0x0000000100008522 lazy_lin_reduced.exe`camlDune__exe__Lazy_lin_reduced__exec_test_802 + 146
    frame #10: 0x000000010003b228 lazy_lin_reduced.exe`camlStdlib__List__map_483 + 56
    frame #11: 0x000000010003b23f lazy_lin_reduced.exe`camlStdlib__List__map_483 + 79
    frame #12: 0x000000010003b23f lazy_lin_reduced.exe`camlStdlib__List__map_483 + 79
    frame #13: 0x000000010003b23f lazy_lin_reduced.exe`camlStdlib__List__map_483 + 79
    frame #14: 0x000000010003b23f lazy_lin_reduced.exe`camlStdlib__List__map_483 + 79
    frame #15: 0x000000010003b23f lazy_lin_reduced.exe`camlStdlib__List__map_483 + 79
    frame #16: 0x000000010003b23f lazy_lin_reduced.exe`camlStdlib__List__map_483 + 79
    frame #17: 0x000000010003b23f lazy_lin_reduced.exe`camlStdlib__List__map_483 + 79
    frame #18: 0x000000010003b23f lazy_lin_reduced.exe`camlStdlib__List__map_483 + 79
    frame #19: 0x0000000100008fec lazy_lin_reduced.exe`camlDune__exe__Lazy_lin_reduced__entry + 2012
    frame #20: 0x0000000100002b8b lazy_lin_reduced.exe`caml_program + 747
    frame #21: 0x00000001000b7dc4 lazy_lin_reduced.exe`caml_start_program + 112
    frame #22: 0x00000001000b760b lazy_lin_reduced.exe`caml_main [inlined] caml_startup(argv=<unavailable>) at startup_nat.c:136:7 [opt]
    frame #23: 0x00000001000b7604 lazy_lin_reduced.exe`caml_main(argv=<unavailable>) at startup_nat.c:142:3 [opt]
    frame #24: 0x00000001000a787c lazy_lin_reduced.exe`main(argc=<unavailable>, argv=<unavailable>) at main.c:37:3 [opt]
    frame #25: 0x00007fff20656f3d libdyld.dylib`start + 1
  thread #2, name = 'Backup0'
    frame #0: 0x00000001000af98e lazy_lin_reduced.exe`pool_sweep(local=<unavailable>, plist=<unavailable>, sz=1, release_to_global_pool=1) at shared_heap.c:457:31 [opt]
    frame #1: 0x00000001000af524 lazy_lin_reduced.exe`caml_sweep(local=0x0000000111008200, work=512) at shared_heap.c:545:7 [opt]
    frame #2: 0x00000001000a89f0 lazy_lin_reduced.exe`major_collection_slice(howmuch=<unavailable>, participant_count=0, barrier_participants=0x0000000000000000, mode=Slice_opportunistic) at major_gc.c:1208:14 [opt]
    frame #3: 0x0000000100094e98 lazy_lin_reduced.exe`handle_incoming at domain.c:1248:9 [opt]
    frame #4: 0x0000000100094e59 lazy_lin_reduced.exe`handle_incoming(s=<unavailable>) at domain.c:305:5 [opt]
    frame #5: 0x00000001000970e2 lazy_lin_reduced.exe`backup_thread_func [inlined] caml_handle_incoming_interrupts at domain.c:318:3 [opt]
    frame #6: 0x00000001000970cd lazy_lin_reduced.exe`backup_thread_func(v=0x000000010014a810) at domain.c:956:13 [opt]
    frame #7: 0x00007fff2063b8fc libsystem_pthread.dylib`_pthread_start + 224
    frame #8: 0x00007fff20637443 libsystem_pthread.dylib`thread_start + 15
* thread #3, name = 'Domain3', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00000001000b7cf4 lazy_lin_reduced.exe`caml_c_call + 4
    frame #1: 0x000000010007beef lazy_lin_reduced.exe`camlStdlib__Format__buffered_out_flush_1279 + 111
    frame #2: 0x000000010007f83e lazy_lin_reduced.exe`camlStdlib__Format__flush_standard_formatters_2002 + 62
    frame #3: 0x0000000100055139 lazy_lin_reduced.exe`camlStdlib__Domain__new_exit_673 + 41
    frame #4: 0x00000001000554a7 lazy_lin_reduced.exe`camlStdlib__Domain__body_706 + 135
    frame #5: 0x00000001000b7dc4 lazy_lin_reduced.exe`caml_start_program + 112
    frame #6: 0x000000010009364e lazy_lin_reduced.exe`caml_callback_exn(closure=<unavailable>, arg=1) at callback.c:169:1 [opt]
    frame #7: 0x0000000100093af9 lazy_lin_reduced.exe`caml_callback(closure=<unavailable>, arg=1) at callback.c:253:34 [opt]
    frame #8: 0x0000000100096151 lazy_lin_reduced.exe`domain_thread_func(v=<unavailable>) at domain.c:1085:5 [opt]
    frame #9: 0x00007fff2063b8fc libsystem_pthread.dylib`_pthread_start + 224
    frame #10: 0x00007fff20637443 libsystem_pthread.dylib`thread_start + 15
  thread #4, name = 'Backup3'
    frame #0: 0x00007fff206084ba libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #1: 0x00007fff206392ab libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_wait + 76
    frame #2: 0x00007fff20637192 libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_slow + 204
    frame #3: 0x0000000100097078 lazy_lin_reduced.exe`backup_thread_func [inlined] caml_plat_lock(m=0x000000010014aca8) at platform.h:144:21 [opt]
    frame #4: 0x0000000100097070 lazy_lin_reduced.exe`backup_thread_func(v=0x000000010014abe8) at domain.c:975:9 [opt]
    frame #5: 0x00007fff2063b8fc libsystem_pthread.dylib`_pthread_start + 224
    frame #6: 0x00007fff20637443 libsystem_pthread.dylib`thread_start + 15
  thread #5, name = 'Domain2'
    frame #0: 0x00000001000969ca lazy_lin_reduced.exe`caml_try_run_on_all_domains_with_spin_work [inlined] caml_wait_interrupt_serviced at domain.c:342:14 [opt]
    frame #1: 0x00000001000969b7 lazy_lin_reduced.exe`caml_try_run_on_all_domains_with_spin_work(handler=(lazy_lin_reduced.exe`caml_stw_empty_minor_heap at minor_gc.c:721), data=<unavailable>, leader_setup=<unavailable>, enter_spin_callback=<unavailable>, enter_spin_data=0x0000000000000000) at domain.c:1429:5 [opt]
    frame #2: 0x00000001000acc1d lazy_lin_reduced.exe`caml_empty_minor_heaps_once [inlined] caml_try_stw_empty_minor_heap_on_all_domains at minor_gc.c:758:10 [opt]
    frame #3: 0x00000001000acbf1 lazy_lin_reduced.exe`caml_empty_minor_heaps_once at minor_gc.c:778:5 [opt]
    frame #4: 0x00000001000961d8 lazy_lin_reduced.exe`domain_thread_func [inlined] domain_terminate at domain.c:1654:5 [opt]
    frame #5: 0x0000000100096151 lazy_lin_reduced.exe`domain_thread_func(v=<unavailable>) at domain.c:1086:5 [opt]
    frame #6: 0x00007fff2063b8fc libsystem_pthread.dylib`_pthread_start + 224
    frame #7: 0x00007fff20637443 libsystem_pthread.dylib`thread_start + 15
  thread #6, name = 'Backup2'
    frame #0: 0x00007fff206084ba libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #1: 0x00007fff206392ab libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_wait + 76
    frame #2: 0x00007fff20637192 libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_slow + 204
    frame #3: 0x0000000100097078 lazy_lin_reduced.exe`backup_thread_func [inlined] caml_plat_lock(m=0x000000010014ab60) at platform.h:144:21 [opt]
    frame #4: 0x0000000100097070 lazy_lin_reduced.exe`backup_thread_func(v=0x000000010014aaa0) at domain.c:975:9 [opt]
    frame #5: 0x00007fff2063b8fc libsystem_pthread.dylib`_pthread_start + 224
    frame #6: 0x00007fff20637443 libsystem_pthread.dylib`thread_start + 15
(lldb) 

and here's another one with the debug runtime which stops with EXC_BREAKPOINT:

$ lldb _build/default/src/lazy/lazy_lin_reduced.exe
(lldb) target create "_build/default/src/lazy/lazy_lin_reduced.exe"
Current executable set to '/Users/jmi/software/ocaml-04-28-2022-11213/multicoretests/_build/default/src/lazy/lazy_lin_reduced.exe' (x86_64).
(lldb) run
Process 1548 launched: '/Users/jmi/software/ocaml-04-28-2022-11213/multicoretests/_build/default/src/lazy/lazy_lin_reduced.exe' (x86_64)
### OCaml runtime: debug mode ###
0 t
1 t
2 t
3 t
4 t
5 t
6 CamlinternalLazy.Undefined
7 CamlinternalLazy.Undefined
8 Process 1548 stopped
* thread #5, name = 'Domain3', stop reason = EXC_BREAKPOINT (code=EXC_I386_BPT, subcode=0x0)
    frame #0: 0x00000001000b9cf8 lazy_lin_reduced.exe`caml_c_call + 48
lazy_lin_reduced.exe`caml_c_call:
->  0x1000b9cf8 <+48>: movq   0x20(%r14), %r11
    0x1000b9cfc <+52>: movq   (%r11), %r11
    0x1000b9cff <+55>: cmpq   %r11, 0x8(%rsp)
    0x1000b9d04 <+60>: je     0x1000b9d07               ; <+63>
Target 0: (lazy_lin_reduced.exe) stopped.
(lldb) bt all
lazy_lin_reduced.exe was compiled with optimization - stepping may behave oddly; variables may not be available.
  thread #1, name = 'Domain0', queue = 'com.apple.main-thread'
    frame #0: 0x00000001000b1380 lazy_lin_reduced.exe`pool_sweep(local=<unavailable>, plist=<unavailable>, sz=2, release_to_global_pool=1) at shared_heap.c:0:5 [opt]
    frame #1: 0x00000001000b0d94 lazy_lin_reduced.exe`caml_sweep(local=0x0000000111808200, work=512) at shared_heap.c:545:7 [opt]
    frame #2: 0x00000001000a8e00 lazy_lin_reduced.exe`major_collection_slice(howmuch=<unavailable>, participant_count=0, barrier_participants=0x0000000000000000, mode=Slice_opportunistic) at major_gc.c:1208:14 [opt]
    frame #3: 0x0000000100094e18 lazy_lin_reduced.exe`handle_incoming at domain.c:1248:9 [opt]
    frame #4: 0x0000000100094dd6 lazy_lin_reduced.exe`handle_incoming(s=<unavailable>) at domain.c:305:5 [opt]
    frame #5: 0x0000000100097145 lazy_lin_reduced.exe`caml_handle_gc_interrupt [inlined] caml_handle_incoming_interrupts at domain.c:318:3 [opt]
    frame #6: 0x0000000100097130 lazy_lin_reduced.exe`caml_handle_gc_interrupt at domain.c:1531:5 [opt]
    frame #7: 0x00000001000b2649 lazy_lin_reduced.exe`caml_process_pending_actions at signals.c:236:3 [opt]
    frame #8: 0x00000001000b9785 lazy_lin_reduced.exe`caml_garbage_collection at signals_nat.c:104:7 [opt]
    frame #9: 0x00000001000b9ba1 lazy_lin_reduced.exe`caml_call_gc + 241
    frame #10: 0x00000001000552cd lazy_lin_reduced.exe`camlStdlib__Domain__join_713 + 173
    frame #11: 0x0000000100007fdd lazy_lin_reduced.exe`camlDune__exe__Lazy_lin_reduced__lin_prop_domain_754 + 301
    frame #12: 0x0000000100008ccf lazy_lin_reduced.exe`camlUtil__repeat_268 + 95
    frame #13: 0x0000000100008172 lazy_lin_reduced.exe`camlDune__exe__Lazy_lin_reduced__exec_test_802 + 146
    frame #14: 0x000000010003ae78 lazy_lin_reduced.exe`camlStdlib__List__map_483 + 56
    frame #15: 0x000000010003ae8f lazy_lin_reduced.exe`camlStdlib__List__map_483 + 79
    frame #16: 0x000000010003ae8f lazy_lin_reduced.exe`camlStdlib__List__map_483 + 79
    frame #17: 0x000000010003ae8f lazy_lin_reduced.exe`camlStdlib__List__map_483 + 79
    frame #18: 0x000000010003ae8f lazy_lin_reduced.exe`camlStdlib__List__map_483 + 79
    frame #19: 0x000000010003ae8f lazy_lin_reduced.exe`camlStdlib__List__map_483 + 79
    frame #20: 0x000000010003ae8f lazy_lin_reduced.exe`camlStdlib__List__map_483 + 79
    frame #21: 0x000000010003ae8f lazy_lin_reduced.exe`camlStdlib__List__map_483 + 79
    frame #22: 0x000000010003ae8f lazy_lin_reduced.exe`camlStdlib__List__map_483 + 79
    frame #23: 0x0000000100008c3c lazy_lin_reduced.exe`camlDune__exe__Lazy_lin_reduced__entry + 2012
    frame #24: 0x0000000100002610 lazy_lin_reduced.exe`caml_program + 752
    frame #25: 0x00000001000b9e02 lazy_lin_reduced.exe`caml_start_program + 150
    frame #26: 0x00000001000b955b lazy_lin_reduced.exe`caml_main [inlined] caml_startup(argv=<unavailable>) at startup_nat.c:136:7 [opt]
    frame #27: 0x00000001000b9554 lazy_lin_reduced.exe`caml_main(argv=<unavailable>) at startup_nat.c:142:3 [opt]
    frame #28: 0x00000001000a794c lazy_lin_reduced.exe`main(argc=<unavailable>, argv=<unavailable>) at main.c:37:3 [opt]
    frame #29: 0x00007fff20656f3d libdyld.dylib`start + 1
  thread #2, name = 'Backup0'
    frame #0: 0x00007fff206084ba libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #1: 0x00007fff206392ab libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_wait + 76
    frame #2: 0x00007fff20637192 libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_slow + 204
    frame #3: 0x0000000100097848 lazy_lin_reduced.exe`backup_thread_func [inlined] caml_plat_lock(m=0x000000010014e9f0) at platform.h:144:21 [opt]
    frame #4: 0x0000000100097839 lazy_lin_reduced.exe`backup_thread_func(v=0x000000010014e930) at domain.c:975:9 [opt]
    frame #5: 0x00007fff2063b8fc libsystem_pthread.dylib`_pthread_start + 224
    frame #6: 0x00007fff20637443 libsystem_pthread.dylib`thread_start + 15
  thread #3, name = 'Domain2'
    frame #0: 0x0000000100096eea lazy_lin_reduced.exe`caml_try_run_on_all_domains_with_spin_work [inlined] caml_wait_interrupt_serviced at domain.c:342:14 [opt]
    frame #1: 0x0000000100096ed7 lazy_lin_reduced.exe`caml_try_run_on_all_domains_with_spin_work(handler=(lazy_lin_reduced.exe`caml_stw_empty_minor_heap at minor_gc.c:721), data=<unavailable>, leader_setup=<unavailable>, enter_spin_callback=<unavailable>, enter_spin_data=0x00000001000adce0) at domain.c:1429:5 [opt]
    frame #2: 0x00000001000add86 lazy_lin_reduced.exe`caml_empty_minor_heaps_once [inlined] caml_try_stw_empty_minor_heap_on_all_domains at minor_gc.c:758:10 [opt]
    frame #3: 0x00000001000add5a lazy_lin_reduced.exe`caml_empty_minor_heaps_once at minor_gc.c:778:5 [opt]
    frame #4: 0x0000000100096434 lazy_lin_reduced.exe`domain_thread_func [inlined] domain_terminate at domain.c:1654:5 [opt]
    frame #5: 0x00000001000963ac lazy_lin_reduced.exe`domain_thread_func(v=<unavailable>) at domain.c:1086:5 [opt]
    frame #6: 0x00007fff2063b8fc libsystem_pthread.dylib`_pthread_start + 224
    frame #7: 0x00007fff20637443 libsystem_pthread.dylib`thread_start + 15
  thread #4, name = 'Backup2'
    frame #0: 0x00007fff206084ba libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #1: 0x00007fff206392ab libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_wait + 76
    frame #2: 0x00007fff20637192 libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_slow + 204
    frame #3: 0x0000000100097848 lazy_lin_reduced.exe`backup_thread_func [inlined] caml_plat_lock(m=0x000000010014ec80) at platform.h:144:21 [opt]
    frame #4: 0x0000000100097839 lazy_lin_reduced.exe`backup_thread_func(v=0x000000010014ebc0) at domain.c:975:9 [opt]
    frame #5: 0x00007fff2063b8fc libsystem_pthread.dylib`_pthread_start + 224
    frame #6: 0x00007fff20637443 libsystem_pthread.dylib`thread_start + 15
* thread #5, name = 'Domain3', stop reason = EXC_BREAKPOINT (code=EXC_I386_BPT, subcode=0x0)
  * frame #0: 0x00000001000b9cf8 lazy_lin_reduced.exe`caml_c_call + 48
    frame #1: 0x000000010002f69f lazy_lin_reduced.exe`camlStdlib__output_substring_258 + 79
    frame #2: 0x000000010007bb2e lazy_lin_reduced.exe`camlStdlib__Format__buffered_out_flush_1279 + 94
    frame #3: 0x000000010007f48e lazy_lin_reduced.exe`camlStdlib__Format__flush_standard_formatters_2002 + 62
    frame #4: 0x0000000100054d89 lazy_lin_reduced.exe`camlStdlib__Domain__new_exit_673 + 41
    frame #5: 0x00000001000550f7 lazy_lin_reduced.exe`camlStdlib__Domain__body_706 + 135
    frame #6: 0x00000001000b9e02 lazy_lin_reduced.exe`caml_start_program + 150
    frame #7: 0x000000010009334f lazy_lin_reduced.exe`caml_callback_exn(closure=<unavailable>, arg=1) at callback.c:169:1 [opt]
    frame #8: 0x00000001000938f9 lazy_lin_reduced.exe`caml_callback(closure=<unavailable>, arg=1) at callback.c:253:34 [opt]
    frame #9: 0x00000001000963ac lazy_lin_reduced.exe`domain_thread_func(v=<unavailable>) at domain.c:1085:5 [opt]
    frame #10: 0x00007fff2063b8fc libsystem_pthread.dylib`_pthread_start + 224
    frame #11: 0x00007fff20637443 libsystem_pthread.dylib`thread_start + 15
  thread #6, name = 'Backup3'
    frame #0: 0x00007fff206084ba libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #1: 0x00007fff206392ab libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_wait + 76
    frame #2: 0x00007fff20637192 libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_slow + 204
    frame #3: 0x0000000100097848 lazy_lin_reduced.exe`backup_thread_func [inlined] caml_plat_lock(m=0x000000010014edc8) at platform.h:144:21 [opt]
    frame #4: 0x0000000100097839 lazy_lin_reduced.exe`backup_thread_func(v=0x000000010014ed08) at domain.c:975:9 [opt]
    frame #5: 0x00007fff2063b8fc libsystem_pthread.dylib`_pthread_start + 224
    frame #6: 0x00007fff20637443 libsystem_pthread.dylib`thread_start + 15
(lldb) 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions