addToStore(): Evaluate on the main stack#11152
Merged
edolstra merged 5 commits intoNixOS:masterfrom Aug 20, 2024
Merged
Conversation
This was referenced Jul 22, 2024
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
This hopefully avoids the need for all our Boehm GC coroutine workarounds, since the GC roots will be on the main stack of the thread. Fixes NixOS#11141.
Since we're not doing this anymore.
431377e to
b8684eb
Compare
Member
Author
|
Team discussion:
|
tomberek
approved these changes
Aug 17, 2024
Contributor
There was a problem hiding this comment.
# started on Sat Aug 17 00:36:50 2024
Performance counter stats for './run-result.sh' (4 runs):
4,747.09 msec task-clock:u # 0.998 CPUs utilized ( +- 0.11% )
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
190,199 page-faults:u # 40.066 K/sec ( +- 0.00% )
12,047,705,665 cycles:u # 2.538 GHz ( +- 0.16% )
22,541,252,252 instructions:u # 1.87 insn per cycle ( +- 0.02% )
4,413,428,462 branches:u # 929.712 M/sec ( +- 0.02% )
74,194,689 branch-misses:u # 1.68% of all branches ( +- 0.12% )
TopdownL1 # 16.2 % tma_backend_bound
# 41.4 % tma_bad_speculation
# 15.8 % tma_frontend_bound
# 26.6 % tma_retiring ( +- 0.20% )
# Table of individual measurements:
4.75286 (-0.00515) #
4.74741 (-0.01060) #
4.76610 (+0.00809) #
4.76568 (+0.00767) #
# Final result:
4.75801 +- 0.00468 seconds time elapsed ( +- 0.10% )
# started on Sat Aug 17 00:36:05 2024
Performance counter stats for './run-stable.sh' (4 runs):
5,041.44 msec task-clock:u # 0.998 CPUs utilized ( +- 0.44% )
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
189,840 page-faults:u # 37.656 K/sec ( +- 0.01% )
12,857,410,437 cycles:u # 2.550 GHz ( +- 0.38% )
25,654,027,530 instructions:u # 2.00 insn per cycle ( +- 0.04% )
4,689,079,121 branches:u # 930.106 M/sec ( +- 0.03% )
79,691,381 branch-misses:u # 1.70% of all branches ( +- 0.23% )
TopdownL1 # 17.2 % tma_backend_bound
# 37.0 % tma_bad_speculation
# 14.9 % tma_frontend_bound
# 30.9 % tma_retiring ( +- 0.14% )
# Table of individual measurements:
5.1002 (+0.0494) #
5.0718 (+0.0210) #
5.0440 (-0.0069) #
4.9874 (-0.0635) #
# Final result:
5.0508 +- 0.0241 seconds time elapsed ( +- 0.48% )
Contributor
|
This does make evaluation with GC slightly worse, but greatly simplifies implementation. Performance cost seems low. I could not update this branch, but have a rebase at: https://github.com/tomberek/nix/tree/flip-coroutines |
Member
Author
|
I also measured and got a slight (not statistically significant) speedup: |
|
Backport failed because this pull request contains merge commits. You can either backport this pull request manually, or configure the action to skip merge commits. |
edolstra
added a commit
to DeterminateSystems/nix-src
that referenced
this pull request
Aug 23, 2024
Member
|
Does this mean we can get rid of 5740924 |
edolstra
added a commit
to DeterminateSystems/nix-src
that referenced
this pull request
Aug 6, 2025
This was removed in NixOS#11152. However, we need it for the multi-threaded evaluator, because otherwise Boehm GC will crash while scanning the thread stack: #0 GC_push_all_eager (bottom=<optimized out>, top=<optimized out>) at extra/../mark.c:1488 #1 0x00007ffff74691d5 in GC_push_all_stack_sections (lo=<optimized out>, hi=<optimized out>, traced_stack_sect=0x0) at extra/../mark_rts.c:704 #2 GC_push_all_stacks () at extra/../pthread_stop_world.c:876 #3 GC_default_push_other_roots () at extra/../os_dep.c:2893 #4 0x00007ffff746235c in GC_mark_some (cold_gc_frame=0x7ffee8ecaa50 "`\304G\367\377\177") at extra/../mark.c:374 #5 0x00007ffff7465a8d in GC_stopped_mark (stop_func=stop_func@entry=0x7ffff7453c80 <GC_never_stop_func>) at extra/../alloc.c:875 #6 0x00007ffff7466724 in GC_try_to_collect_inner (stop_func=0x7ffff7453c80 <GC_never_stop_func>) at extra/../alloc.c:624 #7 0x00007ffff7466a22 in GC_collect_or_expand (needed_blocks=needed_blocks@entry=1, ignore_off_page=ignore_off_page@entry=0, retry=retry@entry=0) at extra/../alloc.c:1688 #8 0x00007ffff746878f in GC_allocobj (gran=<optimized out>, kind=<optimized out>) at extra/../alloc.c:1798 #9 GC_generic_malloc_inner (lb=<optimized out>, k=k@entry=1) at extra/../malloc.c:193 #10 0x00007ffff746cd40 in GC_generic_malloc_many (lb=<optimized out>, k=<optimized out>, result=<optimized out>) at extra/../mallocx.c:477 #11 0x00007ffff746cf35 in GC_malloc_kind (bytes=120, kind=1) at extra/../thread_local_alloc.c:187 #12 0x00007ffff796ede5 in nix::allocBytes (n=<optimized out>, n=<optimized out>) at ../src/libexpr/include/nix/expr/eval-inline.hh:19 This is because it will use the stack pointer of the coroutine, so it will scan a region of memory that doesn't exist, e.g. Stack for thread 0x7ffea4ff96c0 is [0x7ffe80197af0w,0x7ffea4ffa000) (where 0x7ffe80197af0w is the sp of the coroutine and 0x7ffea4ffa000 is the base of the thread stack). We don't scan coroutine stacks, because currently they don't have GC roots (there is no evaluation happening in coroutines). So there is currently no need to restore the other parts of the original patch, such as BoehmGCStackAllocator.
edolstra
added a commit
that referenced
this pull request
Aug 7, 2025
This was removed in #11152. However, we need it for the multi-threaded evaluator, because otherwise Boehm GC will crash while scanning the thread stack: #0 GC_push_all_eager (bottom=<optimized out>, top=<optimized out>) at extra/../mark.c:1488 #1 0x00007ffff74691d5 in GC_push_all_stack_sections (lo=<optimized out>, hi=<optimized out>, traced_stack_sect=0x0) at extra/../mark_rts.c:704 #2 GC_push_all_stacks () at extra/../pthread_stop_world.c:876 #3 GC_default_push_other_roots () at extra/../os_dep.c:2893 #4 0x00007ffff746235c in GC_mark_some (cold_gc_frame=0x7ffee8ecaa50 "`\304G\367\377\177") at extra/../mark.c:374 #5 0x00007ffff7465a8d in GC_stopped_mark (stop_func=stop_func@entry=0x7ffff7453c80 <GC_never_stop_func>) at extra/../alloc.c:875 #6 0x00007ffff7466724 in GC_try_to_collect_inner (stop_func=0x7ffff7453c80 <GC_never_stop_func>) at extra/../alloc.c:624 #7 0x00007ffff7466a22 in GC_collect_or_expand (needed_blocks=needed_blocks@entry=1, ignore_off_page=ignore_off_page@entry=0, retry=retry@entry=0) at extra/../alloc.c:1688 #8 0x00007ffff746878f in GC_allocobj (gran=<optimized out>, kind=<optimized out>) at extra/../alloc.c:1798 #9 GC_generic_malloc_inner (lb=<optimized out>, k=k@entry=1) at extra/../malloc.c:193 #10 0x00007ffff746cd40 in GC_generic_malloc_many (lb=<optimized out>, k=<optimized out>, result=<optimized out>) at extra/../mallocx.c:477 #11 0x00007ffff746cf35 in GC_malloc_kind (bytes=120, kind=1) at extra/../thread_local_alloc.c:187 #12 0x00007ffff796ede5 in nix::allocBytes (n=<optimized out>, n=<optimized out>) at ../src/libexpr/include/nix/expr/eval-inline.hh:19 This is because it will use the stack pointer of the coroutine, so it will scan a region of memory that doesn't exist, e.g. Stack for thread 0x7ffea4ff96c0 is [0x7ffe80197af0w,0x7ffea4ffa000) (where 0x7ffe80197af0w is the sp of the coroutine and 0x7ffea4ffa000 is the base of the thread stack). We don't scan coroutine stacks, because currently they don't have GC roots (there is no evaluation happening in coroutines). So there is currently no need to restore the other parts of the original patch, such as BoehmGCStackAllocator.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Switching from
sinkToSource()tosourceToSink()causes the path filter evaluation to happen on the thread's main stack rather than on the coroutine stack. This should avoid the need for our Boehm GC coroutine workarounds, since the GC roots will be on the main stack of the thread.Fixes #11141.
Context
Priorities and Process
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.