assertion in build_runner runStepNames regarding memory_blocked_steps #30742

Closed
opened 2026-01-08 00:56:30 +01:00 by andrewrk · 3 comments
Owner

spotted in the master branch CI run for 335c0fcba1 on aarch64-macos-debug:

+ stage3-debug/bin/zig build test docs --maxrss 10737418240 --zig-lib-dir /Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor/build-debug/../lib -Denable-macos-sdk -Dstatic-llvm -Dskip-non-native --search-prefix /Users/zigci/zig+llvm+lld+clang-aarch64-macos-none-0.16.0-dev.104+689461e31 --test-timeout 2m
]9;4;1;9
/Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor/lib/std/debug.zig:419:14: 0x102882193 in assert (build)
    if (!ok) unreachable; // assertion failure
             ^
/Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor/lib/compiler/build_runner.zig:770:11: 0x1029873e3 in runStepNames (build)
    assert(run.memory_blocked_steps.items.len == 0);
          ^
/Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor/lib/compiler/build_runner.zig:575:25: 0x10299697f in main (build)
        try runStepNames(
                        ^
/Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor/lib/std/start.zig:676:88: 0x1029a30af in callMain (build)
    if (fn_info.params[0].type.? == std.process.Init.Minimal) return wrapMain(root.main(.{
                                                                                       ^
???:?:?: 0x183e7eb97 in start (/usr/lib/dyld)
error: the following build command terminated with signal ABRT:
/Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor/zig-local-cache/o/45acbbe25c19555aa2cb33fa4f5d8cf8/build /Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor/build-debug/stage3-debug/bin/zig /Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor/lib /Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor /Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor/zig-local-cache /Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor/zig-global-cache --seed 0x2f98f99b -Z78233c99509a0524 test docs --maxrss 10737418240 -Denable-macos-sdk -Dstatic-llvm -Dskip-non-native --search-prefix /Users/zigci/zig+llvm+lld+clang-aarch64-macos-none-0.16.0-dev.104+689461e31 --test-timeout 2m
spotted in the master branch CI run for 335c0fcba1bad1b3b391cdbf8b3b037267a6fee7 on aarch64-macos-debug: ``` + stage3-debug/bin/zig build test docs --maxrss 10737418240 --zig-lib-dir /Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor/build-debug/../lib -Denable-macos-sdk -Dstatic-llvm -Dskip-non-native --search-prefix /Users/zigci/zig+llvm+lld+clang-aarch64-macos-none-0.16.0-dev.104+689461e31 --test-timeout 2m ]9;4;1;9 /Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor/lib/std/debug.zig:419:14: 0x102882193 in assert (build) if (!ok) unreachable; // assertion failure ^ /Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor/lib/compiler/build_runner.zig:770:11: 0x1029873e3 in runStepNames (build) assert(run.memory_blocked_steps.items.len == 0); ^ /Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor/lib/compiler/build_runner.zig:575:25: 0x10299697f in main (build) try runStepNames( ^ /Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor/lib/std/start.zig:676:88: 0x1029a30af in callMain (build) if (fn_info.params[0].type.? == std.process.Init.Minimal) return wrapMain(root.main(.{ ^ ???:?:?: 0x183e7eb97 in start (/usr/lib/dyld) error: the following build command terminated with signal ABRT: /Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor/zig-local-cache/o/45acbbe25c19555aa2cb33fa4f5d8cf8/build /Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor/build-debug/stage3-debug/bin/zig /Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor/lib /Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor /Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor/zig-local-cache /Users/zigci/.cache/act/5f632b7ac4d4a44e/hostexecutor/zig-global-cache --seed 0x2f98f99b -Z78233c99509a0524 test docs --maxrss 10737418240 -Denable-macos-sdk -Dstatic-llvm -Dskip-non-native --search-prefix /Users/zigci/zig+llvm+lld+clang-aarch64-macos-none-0.16.0-dev.104+689461e31 --test-timeout 2m ```
andrewrk added this to the Urgent milestone 2026-01-08 00:56:30 +01:00
Owner

Worth noting that this isn't a (recent) regression, we first became aware of it in October: see https://github.com/ziglang/zig/pull/25585 for a diff which makes it reproduce consistently in CI (the first commit of that PR). Alex and I weren't able to figure out the bug when we last looked into this.

(I thought there was an issue open for it on GitHub, but I can't find it, so I'm guessing we forgot to open one!)

Worth noting that this isn't a (recent) regression, we first became aware of it in October: see https://github.com/ziglang/zig/pull/25585 for a diff which makes it reproduce consistently in CI (the first commit of that PR). Alex and I weren't able to figure out the bug when we last looked into this. (I thought there was an issue open for it on GitHub, but I can't find it, so I'm guessing we forgot to open one!)

I think a have repro for this, with four modules having one sleeping test. A, B, D have max_rss budget of 40M, C doesn't have any (pun intended) and it's marked by star in logs.

Test runner is running with -j4 --maxrss 60M. Note: I editorialized are thread ids, originals in details.

Clean run

➜  repro git:(938efe4aab) ✗ just test 60M
../build-debug/stage3-debug/bin/zig build test -j4 --maxrss 60M --zig-lib-dir ../lib
[1][D ] schedule: success
[2][A ] schedule: fail - memory blocked
[3][C*] schedule: success
[2][C*] schedule: fail - taken
[2][B ] schedule: fail - memory blocked
[4][B ] schedule: fail - memory blocked
[1][D ] run: finish
[1][D ] queue: memory blocked, blocked=[A, B, B]
[1][A ] schedule: success
[1][A ] run: finish
[1][A ] queue: memory blocked, blocked=[B, B]
[1][B ] schedule: success
[1][B ] run: finish
[1][B ] queue: memory blocked, blocked=[B]
[1][B ] schedule: fail - taken
[3][C*] run: finish
n: 0
Original log
➜  repro git:(938efe4aab) ✗ just test 60M
../build-debug/stage3-debug/bin/zig build test -j4 --maxrss 60M --zig-lib-dir ../lib
[4353137][D ] schedule: success
[4353139][A ] schedule: fail - memory blocked
[4353138][C*] schedule: success
[4353139][C*] schedule: fail - taken
[4353139][B ] schedule: fail - memory blocked
[4353135][B ] schedule: fail - memory blocked
[4353137][D ] run: finish
[4353137][D ] queue: memory blocked, blocked=[A, B, B]
[4353137][A ] schedule: success
[4353137][A ] run: finish
[4353137][A ] queue: memory blocked, blocked=[B, B]
[4353137][B ] schedule: success
[4353137][B ] run: finish
[4353137][B ] queue: memory blocked, blocked=[B]
[4353137][B ] schedule: fail - taken
[4353138][C*] run: finish
n: 0

Failed run

➜  repro git:(938efe4aab) ✗ just test 60M
../build-debug/stage3-debug/bin/zig build test -j4 --maxrss 60M --zig-lib-dir ../lib
[1][B ] schedule: success
[2][A ] schedule: fail - memory blocked
[3][A ] schedule: fail - memory blocked
[4][C*] schedule: success
[2][D ] schedule: fail - memory blocked
[1][B ] run: finish
[1][B ] queue: memory blocked, blocked=[A, A, D]
[1][A ] schedule: success
[1][A ] run: finish
[1][A ] queue: memory blocked, blocked=[A, D]
[1][A ] schedule: fail - taken
[4][C*] run: finish
n: 1
thread 3 panic: reached unreachable code
/Users/path/Documents/projects/zig/lib/std/debug.zig:419:14: 0x104812193 in assert (build)
    if (!ok) unreachable; // assertion failure
             ^
/Users/path/Documents/projects/zig/lib/compiler/build_runner.zig:771:11: 0x1049173fb in runStepNames (build)
    assert(run.memory_blocked_steps.items.len == 0);
          ^
/Users/path/Documents/projects/zig/lib/compiler/build_runner.zig:575:25: 0x104926a93 in main (build)
        try runStepNames(
                        ^
/Users/path/Documents/projects/zig/lib/std/start.zig:676:88: 0x104933177 in callMain (build)
    if (fn_info.params[0].type.? == std.process.Init.Minimal) return wrapMain(root.main(.{
                                                                                       ^
???:?:?: 0x190f0dd53 in start (/usr/lib/dyld)
error: the following build command terminated with signal ABRT:
.zig-cache/o/c8f33cfeae6261df515b5d98ff84e13f/build /Users/path/Documents/projects/zig/build-debug/stage3-debug/bin/zig /Users/path/Documents/projects/zig/lib /Users/path/Documents/projects/zig/repro .zig-cache /Users/path/.cache/zig --seed 0xdabd06a7 -Z3277c8208c2690b9 test -j4 --maxrss 60M
error: Recipe `test` failed on line 2 with exit code 1
Original log
➜  repro git:(938efe4aab) ✗ just test 60M
../build-debug/stage3-debug/bin/zig build test -j4 --maxrss 60M --zig-lib-dir ../lib
[4353366][B ] schedule: success
[4353367][A ] schedule: fail - memory blocked
[4353364][A ] schedule: fail - memory blocked
[4353368][C*] schedule: success
[4353367][D ] schedule: fail - memory blocked
[4353366][B ] run: finish
[4353366][B ] queue: memory blocked, blocked=[A, A, D]
[4353366][A ] schedule: success
[4353366][A ] run: finish
[4353366][A ] queue: memory blocked, blocked=[A, D]
[4353366][A ] schedule: fail - taken
[4353368][C*] run: finish
n: 1
thread 4353364 panic: reached unreachable code
/Users/path/Documents/projects/zig/lib/std/debug.zig:419:14: 0x104812193 in assert (build)
    if (!ok) unreachable; // assertion failure
             ^
/Users/path/Documents/projects/zig/lib/compiler/build_runner.zig:771:11: 0x1049173fb in runStepNames (build)
    assert(run.memory_blocked_steps.items.len == 0);
          ^
/Users/path/Documents/projects/zig/lib/compiler/build_runner.zig:575:25: 0x104926a93 in main (build)
        try runStepNames(
                        ^
/Users/path/Documents/projects/zig/lib/std/start.zig:676:88: 0x104933177 in callMain (build)
    if (fn_info.params[0].type.? == std.process.Init.Minimal) return wrapMain(root.main(.{
                                                                                       ^
???:?:?: 0x190f0dd53 in start (/usr/lib/dyld)
error: the following build command terminated with signal ABRT:
.zig-cache/o/c8f33cfeae6261df515b5d98ff84e13f/build /Users/path/Documents/projects/zig/build-debug/stage3-debug/bin/zig /Users/path/Documents/projects/zig/lib /Users/path/Documents/projects/zig/repro .zig-cache /Users/path/.cache/zig --seed 0xdabd06a7 -Z3277c8208c2690b9 test -j4 --maxrss 60M
error: Recipe `test` failed on line 2 with exit code 1

Diffs

Debug messages in build_runner.zig
diff --git a/lib/compiler/build_runner.zig b/lib/compiler/build_runner.zig
index 752f1fdae3..3e81c32bb5 100644
--- a/lib/compiler/build_runner.zig
+++ b/lib/compiler/build_runner.zig
@@ -767,6 +767,7 @@ fn runStepNames(
         try group.await(io);
     }
 
+    std.debug.print("n: {d}\n", .{run.memory_blocked_steps.items.len});
     assert(run.memory_blocked_steps.items.len == 0);
 
     var test_pass_count: usize = 0;
@@ -1340,12 +1341,16 @@ fn workerMakeOneStep(
 
         // Avoid running steps twice.
         if (s.state != .precheck_done) {
+            if (s.name.len <= 2)
+                std.debug.print("[{any}][{s}{s}] schedule: fail - taken\n", .{ std.Thread.getCurrentId(), s.name, if (s.max_rss != 0) " " else "*" });
             // Another worker got the job.
             return;
         }
 
         const new_claimed_rss = run.claimed_rss + s.max_rss;
         if (new_claimed_rss > run.max_rss) {
+            if (s.name.len <= 2)
+                std.debug.print("[{any}][{s}{s}] schedule: fail - memory blocked\n", .{ std.Thread.getCurrentId(), s.name, if (s.max_rss != 0) " " else "*" });
             // Running this step right now could possibly exceed the allotted RSS.
             // Add this step to the queue of memory-blocked steps.
             run.memory_blocked_steps.append(gpa, s) catch @panic("OOM");
@@ -1357,10 +1362,14 @@ fn workerMakeOneStep(
     } else {
         // Avoid running steps twice.
         if (@cmpxchgStrong(Step.State, &s.state, .precheck_done, .running, .seq_cst, .seq_cst) != null) {
+            if (s.name.len <= 2)
+                std.debug.print("[{any}][{s}{s}] schedule: fail - taken\n", .{ std.Thread.getCurrentId(), s.name, if (s.max_rss != 0) " " else "*" });
             // Another worker got the job.
             return;
         }
     }
+    if (s.name.len <= 2)
+        std.debug.print("[{any}][{s}{s}] schedule: success\n", .{ std.Thread.getCurrentId(), s.name, if (s.max_rss != 0) " " else "*" });
 
     const sub_prog_node = prog_node.start(s.name, 0);
     defer sub_prog_node.end();
@@ -1387,6 +1396,9 @@ fn workerMakeOneStep(
         printErrorMessages(gpa, s, .{}, stderr.terminal(), run.error_style, run.multiline_errors) catch {};
     }
 
+    if (s.name.len <= 2)
+        std.debug.print("[{any}][{s}{s}] run: finish\n", .{ std.Thread.getCurrentId(), s.name, if (s.max_rss != 0) " " else "*" });
+
     handle_result: {
         if (make_result) |_| {
             @atomicStore(Step.State, &s.state, .success, .seq_cst);
@@ -1419,6 +1431,13 @@ fn workerMakeOneStep(
         {
             run.max_rss_mutex.lockUncancelable(io);
             defer run.max_rss_mutex.unlock(io);
+            if (s.name.len <= 2) {
+                std.debug.print("[{any}][{s}{s}] queue: memory blocked, blocked=[", .{ std.Thread.getCurrentId(), s.name, if (s.max_rss != 0) " " else "*" });
+                for (run.memory_blocked_steps.items, 0..) |dep, i| {
+                    if (i == 0) std.debug.print("{s}", .{dep.name}) else std.debug.print(", {s}", .{dep.name});
+                }
+                std.debug.print("]\n", .{});
+            }
 
             dispatch_deps.ensureUnusedCapacity(gpa, run.memory_blocked_steps.items.len) catch @panic("OOM");
Repro code
### ./Justfile ###
test maxrss="60M":
    ../build-debug/stage3-debug/bin/zig build test -j4 --maxrss {{maxrss}} --zig-lib-dir ../lib
### ./build.zig ###
const std = @import("std");

pub fn build(b: *std.Build) void {
    const target = b.standardTargetOptions(.{});
    const optimize = b.standardOptimizeOption(.{});

    const test_a = addTestFile(b, target, optimize, "test_a.zig");
    const test_b = addTestFile(b, target, optimize, "test_b.zig");
    const test_c = addTestFile(b, target, optimize, "test_c.zig");
    const test_d = addTestFile(b, target, optimize, "test_d.zig");

    const run_a = b.addRunArtifact(test_a);
    const run_b = b.addRunArtifact(test_b);
    const run_c = b.addRunArtifact(test_c);
    const run_d = b.addRunArtifact(test_d);

    const alloc = b.allocator;
    run_a.step.max_rss = 40_000_000;
    run_a.step.name = alloc.dupe(u8, "A") catch @panic("OOM");
    run_b.step.max_rss = 40_000_000;
    run_b.step.name = alloc.dupe(u8, "B") catch @panic("OOM");
    run_c.step.max_rss = 0;
    run_c.step.name = alloc.dupe(u8, "C") catch @panic("OOM");
    run_d.step.max_rss = 40_000_000;
    run_d.step.name = alloc.dupe(u8, "D") catch @panic("OOM");

    const test_step = b.step("test", "Run all tests");
    test_step.dependOn(&run_a.step);
    test_step.dependOn(&run_b.step);
    test_step.dependOn(&run_c.step);
    test_step.dependOn(&run_d.step);
}

const Compile = std.Build.Step.Compile;
const Target = std.Build.ResolvedTarget;
const Optimize = std.builtin.OptimizeMode;
fn addTestFile(b: *std.Build, target: Target, optimize: Optimize, path: []const u8) *std.Build.Step.Compile {
    return b.addTest(.{
        .root_module = b.createModule(.{
            .root_source_file = b.path(path),
            .target = target,
            .optimize = optimize,
        }),
    });
}
### ./test_c.zig ###
const std = @import("std");

test "C" {
    const io = std.testing.io;
    try io.sleep(.fromSeconds(2), .awake);
}
### ./test_b.zig ###
const std = @import("std");

test "B" {
    const io = std.testing.io;
    try io.sleep(.fromMilliseconds(500), .awake);
}
### ./test_a.zig ###
const std = @import("std");

test "A" {
    const io = std.testing.io;
    try io.sleep(.fromMilliseconds(300), .awake);
}
### ./test_d.zig ###
const std = @import("std");

test "D" {
    const io = std.testing.io;
    try io.sleep(.fromMilliseconds(800), .awake);
}
I think a have repro for this, with four modules having one sleeping test. A, B, D have `max_rss` budget of 40M, C doesn't have any (pun intended) and it's marked by star in logs. Test runner is running with `-j4 --maxrss 60M`. Note: I editorialized are thread ids, originals in details. ### Clean run ``` ➜ repro git:(938efe4aab) ✗ just test 60M ../build-debug/stage3-debug/bin/zig build test -j4 --maxrss 60M --zig-lib-dir ../lib [1][D ] schedule: success [2][A ] schedule: fail - memory blocked [3][C*] schedule: success [2][C*] schedule: fail - taken [2][B ] schedule: fail - memory blocked [4][B ] schedule: fail - memory blocked [1][D ] run: finish [1][D ] queue: memory blocked, blocked=[A, B, B] [1][A ] schedule: success [1][A ] run: finish [1][A ] queue: memory blocked, blocked=[B, B] [1][B ] schedule: success [1][B ] run: finish [1][B ] queue: memory blocked, blocked=[B] [1][B ] schedule: fail - taken [3][C*] run: finish n: 0 ``` <details> <summary>Original log</summary> ➜ repro git:(938efe4aab) ✗ just test 60M ../build-debug/stage3-debug/bin/zig build test -j4 --maxrss 60M --zig-lib-dir ../lib [4353137][D ] schedule: success [4353139][A ] schedule: fail - memory blocked [4353138][C*] schedule: success [4353139][C*] schedule: fail - taken [4353139][B ] schedule: fail - memory blocked [4353135][B ] schedule: fail - memory blocked [4353137][D ] run: finish [4353137][D ] queue: memory blocked, blocked=[A, B, B] [4353137][A ] schedule: success [4353137][A ] run: finish [4353137][A ] queue: memory blocked, blocked=[B, B] [4353137][B ] schedule: success [4353137][B ] run: finish [4353137][B ] queue: memory blocked, blocked=[B] [4353137][B ] schedule: fail - taken [4353138][C*] run: finish n: 0 </details> ### Failed run ``` ➜ repro git:(938efe4aab) ✗ just test 60M ../build-debug/stage3-debug/bin/zig build test -j4 --maxrss 60M --zig-lib-dir ../lib [1][B ] schedule: success [2][A ] schedule: fail - memory blocked [3][A ] schedule: fail - memory blocked [4][C*] schedule: success [2][D ] schedule: fail - memory blocked [1][B ] run: finish [1][B ] queue: memory blocked, blocked=[A, A, D] [1][A ] schedule: success [1][A ] run: finish [1][A ] queue: memory blocked, blocked=[A, D] [1][A ] schedule: fail - taken [4][C*] run: finish n: 1 thread 3 panic: reached unreachable code /Users/path/Documents/projects/zig/lib/std/debug.zig:419:14: 0x104812193 in assert (build) if (!ok) unreachable; // assertion failure ^ /Users/path/Documents/projects/zig/lib/compiler/build_runner.zig:771:11: 0x1049173fb in runStepNames (build) assert(run.memory_blocked_steps.items.len == 0); ^ /Users/path/Documents/projects/zig/lib/compiler/build_runner.zig:575:25: 0x104926a93 in main (build) try runStepNames( ^ /Users/path/Documents/projects/zig/lib/std/start.zig:676:88: 0x104933177 in callMain (build) if (fn_info.params[0].type.? == std.process.Init.Minimal) return wrapMain(root.main(.{ ^ ???:?:?: 0x190f0dd53 in start (/usr/lib/dyld) error: the following build command terminated with signal ABRT: .zig-cache/o/c8f33cfeae6261df515b5d98ff84e13f/build /Users/path/Documents/projects/zig/build-debug/stage3-debug/bin/zig /Users/path/Documents/projects/zig/lib /Users/path/Documents/projects/zig/repro .zig-cache /Users/path/.cache/zig --seed 0xdabd06a7 -Z3277c8208c2690b9 test -j4 --maxrss 60M error: Recipe `test` failed on line 2 with exit code 1 ``` <details> <summary>Original log</summary> ➜ repro git:(938efe4aab) ✗ just test 60M ../build-debug/stage3-debug/bin/zig build test -j4 --maxrss 60M --zig-lib-dir ../lib [4353366][B ] schedule: success [4353367][A ] schedule: fail - memory blocked [4353364][A ] schedule: fail - memory blocked [4353368][C*] schedule: success [4353367][D ] schedule: fail - memory blocked [4353366][B ] run: finish [4353366][B ] queue: memory blocked, blocked=[A, A, D] [4353366][A ] schedule: success [4353366][A ] run: finish [4353366][A ] queue: memory blocked, blocked=[A, D] [4353366][A ] schedule: fail - taken [4353368][C*] run: finish n: 1 thread 4353364 panic: reached unreachable code /Users/path/Documents/projects/zig/lib/std/debug.zig:419:14: 0x104812193 in assert (build) if (!ok) unreachable; // assertion failure ^ /Users/path/Documents/projects/zig/lib/compiler/build_runner.zig:771:11: 0x1049173fb in runStepNames (build) assert(run.memory_blocked_steps.items.len == 0); ^ /Users/path/Documents/projects/zig/lib/compiler/build_runner.zig:575:25: 0x104926a93 in main (build) try runStepNames( ^ /Users/path/Documents/projects/zig/lib/std/start.zig:676:88: 0x104933177 in callMain (build) if (fn_info.params[0].type.? == std.process.Init.Minimal) return wrapMain(root.main(.{ ^ ???:?:?: 0x190f0dd53 in start (/usr/lib/dyld) error: the following build command terminated with signal ABRT: .zig-cache/o/c8f33cfeae6261df515b5d98ff84e13f/build /Users/path/Documents/projects/zig/build-debug/stage3-debug/bin/zig /Users/path/Documents/projects/zig/lib /Users/path/Documents/projects/zig/repro .zig-cache /Users/path/.cache/zig --seed 0xdabd06a7 -Z3277c8208c2690b9 test -j4 --maxrss 60M error: Recipe `test` failed on line 2 with exit code 1 </details> ### Diffs <details> <summary>Debug messages in build_runner.zig</summary> ```diff diff --git a/lib/compiler/build_runner.zig b/lib/compiler/build_runner.zig index 752f1fdae3..3e81c32bb5 100644 --- a/lib/compiler/build_runner.zig +++ b/lib/compiler/build_runner.zig @@ -767,6 +767,7 @@ fn runStepNames( try group.await(io); } + std.debug.print("n: {d}\n", .{run.memory_blocked_steps.items.len}); assert(run.memory_blocked_steps.items.len == 0); var test_pass_count: usize = 0; @@ -1340,12 +1341,16 @@ fn workerMakeOneStep( // Avoid running steps twice. if (s.state != .precheck_done) { + if (s.name.len <= 2) + std.debug.print("[{any}][{s}{s}] schedule: fail - taken\n", .{ std.Thread.getCurrentId(), s.name, if (s.max_rss != 0) " " else "*" }); // Another worker got the job. return; } const new_claimed_rss = run.claimed_rss + s.max_rss; if (new_claimed_rss > run.max_rss) { + if (s.name.len <= 2) + std.debug.print("[{any}][{s}{s}] schedule: fail - memory blocked\n", .{ std.Thread.getCurrentId(), s.name, if (s.max_rss != 0) " " else "*" }); // Running this step right now could possibly exceed the allotted RSS. // Add this step to the queue of memory-blocked steps. run.memory_blocked_steps.append(gpa, s) catch @panic("OOM"); @@ -1357,10 +1362,14 @@ fn workerMakeOneStep( } else { // Avoid running steps twice. if (@cmpxchgStrong(Step.State, &s.state, .precheck_done, .running, .seq_cst, .seq_cst) != null) { + if (s.name.len <= 2) + std.debug.print("[{any}][{s}{s}] schedule: fail - taken\n", .{ std.Thread.getCurrentId(), s.name, if (s.max_rss != 0) " " else "*" }); // Another worker got the job. return; } } + if (s.name.len <= 2) + std.debug.print("[{any}][{s}{s}] schedule: success\n", .{ std.Thread.getCurrentId(), s.name, if (s.max_rss != 0) " " else "*" }); const sub_prog_node = prog_node.start(s.name, 0); defer sub_prog_node.end(); @@ -1387,6 +1396,9 @@ fn workerMakeOneStep( printErrorMessages(gpa, s, .{}, stderr.terminal(), run.error_style, run.multiline_errors) catch {}; } + if (s.name.len <= 2) + std.debug.print("[{any}][{s}{s}] run: finish\n", .{ std.Thread.getCurrentId(), s.name, if (s.max_rss != 0) " " else "*" }); + handle_result: { if (make_result) |_| { @atomicStore(Step.State, &s.state, .success, .seq_cst); @@ -1419,6 +1431,13 @@ fn workerMakeOneStep( { run.max_rss_mutex.lockUncancelable(io); defer run.max_rss_mutex.unlock(io); + if (s.name.len <= 2) { + std.debug.print("[{any}][{s}{s}] queue: memory blocked, blocked=[", .{ std.Thread.getCurrentId(), s.name, if (s.max_rss != 0) " " else "*" }); + for (run.memory_blocked_steps.items, 0..) |dep, i| { + if (i == 0) std.debug.print("{s}", .{dep.name}) else std.debug.print(", {s}", .{dep.name}); + } + std.debug.print("]\n", .{}); + } dispatch_deps.ensureUnusedCapacity(gpa, run.memory_blocked_steps.items.len) catch @panic("OOM"); ``` </details> <details> <summary>Repro code</summary> ``` ### ./Justfile ### test maxrss="60M": ../build-debug/stage3-debug/bin/zig build test -j4 --maxrss {{maxrss}} --zig-lib-dir ../lib ### ./build.zig ### const std = @import("std"); pub fn build(b: *std.Build) void { const target = b.standardTargetOptions(.{}); const optimize = b.standardOptimizeOption(.{}); const test_a = addTestFile(b, target, optimize, "test_a.zig"); const test_b = addTestFile(b, target, optimize, "test_b.zig"); const test_c = addTestFile(b, target, optimize, "test_c.zig"); const test_d = addTestFile(b, target, optimize, "test_d.zig"); const run_a = b.addRunArtifact(test_a); const run_b = b.addRunArtifact(test_b); const run_c = b.addRunArtifact(test_c); const run_d = b.addRunArtifact(test_d); const alloc = b.allocator; run_a.step.max_rss = 40_000_000; run_a.step.name = alloc.dupe(u8, "A") catch @panic("OOM"); run_b.step.max_rss = 40_000_000; run_b.step.name = alloc.dupe(u8, "B") catch @panic("OOM"); run_c.step.max_rss = 0; run_c.step.name = alloc.dupe(u8, "C") catch @panic("OOM"); run_d.step.max_rss = 40_000_000; run_d.step.name = alloc.dupe(u8, "D") catch @panic("OOM"); const test_step = b.step("test", "Run all tests"); test_step.dependOn(&run_a.step); test_step.dependOn(&run_b.step); test_step.dependOn(&run_c.step); test_step.dependOn(&run_d.step); } const Compile = std.Build.Step.Compile; const Target = std.Build.ResolvedTarget; const Optimize = std.builtin.OptimizeMode; fn addTestFile(b: *std.Build, target: Target, optimize: Optimize, path: []const u8) *std.Build.Step.Compile { return b.addTest(.{ .root_module = b.createModule(.{ .root_source_file = b.path(path), .target = target, .optimize = optimize, }), }); } ### ./test_c.zig ### const std = @import("std"); test "C" { const io = std.testing.io; try io.sleep(.fromSeconds(2), .awake); } ### ./test_b.zig ### const std = @import("std"); test "B" { const io = std.testing.io; try io.sleep(.fromMilliseconds(500), .awake); } ### ./test_a.zig ### const std = @import("std"); test "A" { const io = std.testing.io; try io.sleep(.fromMilliseconds(300), .awake); } ### ./test_d.zig ### const std = @import("std"); test "D" { const io = std.testing.io; try io.sleep(.fromMilliseconds(800), .awake); } ``` </details>
Owner

Oh! Thank you @jgonet, that's extremely helpful---I see exactly what's going on now. workerMakeOneStep gets called for the same step twice (normal and expected); both of those calls enter the if (s.max_rss != 0) block (one after the other due to the mutex) so the same step gets added to memory_blocked_steps twice; the first one is popped, runs the step, completes itself, and pops the next item from memory_blocked_steps, which is itself; that task notices the step is already done and immediately returns; so other steps in memory_blocked_steps are never run.

A quick hacky fix is doable, but I suspect there's a much better way to structure this logic; I'll tinker with it and see where I end up.

Oh! Thank you @jgonet, that's *extremely* helpful---I see exactly what's going on now. `workerMakeOneStep` gets called for the same step twice (normal and expected); both of those calls enter the `if (s.max_rss != 0)` block (one after the other due to the mutex) so the same step gets added to `memory_blocked_steps` twice; the first one is popped, runs the step, completes itself, and pops the next item from `memory_blocked_steps`, which is itself; that task notices the step is already done and immediately returns; so other steps in `memory_blocked_steps` are never run. A quick hacky fix is doable, but I suspect there's a much better way to structure this logic; I'll tinker with it and see where I end up.
mlugg closed this issue 2026-01-09 03:16:48 +01:00
alexrp modified the milestone from Urgent to 0.16.0 2026-01-11 06:09:02 +01:00
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ziglang/zig#30742
No description provided.