Skip to content

Update boehmgc-coroutine-sp-fallback.diff for darwin#6987

Merged
edolstra merged 1 commit intoNixOS:masterfrom
matthewbauer:update-boehmgc-coroutine-sp-fallback-for-darwin
Sep 2, 2022
Merged

Update boehmgc-coroutine-sp-fallback.diff for darwin#6987
edolstra merged 1 commit intoNixOS:masterfrom
matthewbauer:update-boehmgc-coroutine-sp-fallback-for-darwin

Conversation

@matthewbauer
Copy link
Member

@matthewbauer matthewbauer commented Sep 1, 2022

The darwin_stop_world implementation is slightly different. sp goes to
altstack_lo instead of lo in this case. Assuming that is an
implementation detail.

But the fix is the same, when we detect alstack_lo outside of the
expected stack range, we reset it to hi - stack_limit.

Here stack_limit is calculated with pthread_get_stacksize_np since
that is the BSD equivalent to pthread_attr_getstacksize.

Fixes #4919

cc @roberth @edolstra

The darwin_stop_world implementation is slightly different. sp goes to
altstack_lo instead of lo in this case. Assuming that is an
implementation detail.

But the fix is the same, when we detect alstack_lo outside of the
expected stack range, we reset it to hi - stack_limit.

Here stack_limit is calculated with pthread_get_stacksize_np since
that is the BSD equivalent to pthread_attr_getstacksize.
@edolstra
Copy link
Member

edolstra commented Sep 1, 2022

Isn't this potentially bad for performance, since (if I read it correctly) it causes Boehm to scan the entire stack? Especially on 64-bit systems, the stack can potentially be very large, and reading it can cause uninitialized parts of the stack to become committed. (Nix uses a 64 MB stack, see src/nix/main.cc.)

But maybe it's not too bad.

@matthewbauer
Copy link
Member Author

matthewbauer commented Sep 1, 2022

Isn't this potentially bad for performance, since (if I read it correctly) it causes Boehm to scan the entire stack? Especially on 64-bit systems, the stack can potentially be very large, and reading it can cause uninitialized parts of the stack to become committed. (Nix uses a 64 MB stack, see src/nix/main.cc.)

Yeah maybe need to clarify with Robert on that, not really sure of the ramifications. I think this is definitely not an ideal solution since it relies on some guesswork of where the stack is. But it definitely beats the segfault you'd otherwise get!

@roberth
Copy link
Member

roberth commented Sep 1, 2022

since (if I read it correctly) it causes Boehm to scan the entire stack?

You've read correctly, but this penalty is only paid when GC happens while a coroutine is active, which seems to be only during filterSource, which might be rare.
This a known flaw and it's been present in linux builds since we have the current gc/coroutine integration #4944.

slightly different. sp goes to
altstack_lo instead of lo in this case. Assuming that is an
implementation detail.

The altstack is for signal handling, so I'm almost certain that this is a wrong assumption.

On reading the file more thoroughly it seems that we might need to support segmented stacks on darwin? I'm not sure, because that would also affect our stack protection page solution (detectStackOverflow() and such). If we need it, we'll have to replace the simple "put the stack in a gc region" by a solution that hooks into the gc and calls GC_push_all_stack_sections like crystal-lang anyway.

@roberth
Copy link
Member

roberth commented Sep 1, 2022

It seems that things can be made simpler by having coroutine building blocks directly in bdwgc. I've (re)started the discussion here bdwgc/bdwgc#362 (comment).
We might want to pursue this PR's quick-ish "good enough" solution in the meanwhile, so as not to leave macOS users behind.

Another potentially "good enough" idea is to disable GC while any co-routine exists. This only works well if the number of co-routines reaches 0 quickly and often. Afaik this may be the case, but would need to be confirmed.

@matthewbauer
Copy link
Member Author

The altstack is for signal handling, so I'm almost certain that this is a wrong assumption.

Yeah, not sure what is going on. What gave me that impression was this block, which is the only place paltstack_lo is set:

https://github.com/ivmai/bdwgc/blob/af1eede4362a53a32b25c4063df6c2ccc8ed489b/darwin_stop_world.c#L328-L339

(and noticing that the callstack from the segfaults involves the if (altstack_log) condition) Perhaps it doesn't really matter for the purpose of the patch though.

@edolstra edolstra merged commit a9af12e into NixOS:master Sep 2, 2022
Minion3665 pushed a commit to Minion3665/nix that referenced this pull request Feb 23, 2023
…ine-sp-fallback-for-darwin

Update boehmgc-coroutine-sp-fallback.diff for darwin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Random segfaults with aarch64-darwin on flakes based project

3 participants