Skip to content

Revert "macos: use posix_spawn instead of fork"#3107

Merged
cjihrig merged 1 commit intolibuv:v1.xfrom
cjihrig:revert
Feb 12, 2021
Merged

Revert "macos: use posix_spawn instead of fork"#3107
cjihrig merged 1 commit intolibuv:v1.xfrom
cjihrig:revert

Conversation

@cjihrig
Copy link
Copy Markdown
Contributor

@cjihrig cjihrig commented Feb 12, 2021

This reverts commit 39968db.

This commit caused a number of failures in the Node.js test suite
while attempting to release libuv 1.41.0.

Node.js integration CI: https://ci.nodejs.org/view/libuv/job/libuv-in-node/175/. The previously observed macOS failures seem to be fixed by this revert.

Copy link
Copy Markdown
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regrettable but it is what it is. Let's revert and revisit.

Copy link
Copy Markdown
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunate that jpcanepa is on vacation now, but I expect he'll dig into the findings soon after he's back. And better to get the release done. Sorry it delayed it so much without effect!

@cjihrig
Copy link
Copy Markdown
Contributor Author

cjihrig commented Feb 12, 2021

A rare green libuv CI run 😄 : https://ci.nodejs.org/view/libuv/job/libuv-test-commit/2125/

This reverts commit 39968db.

This commit caused a number of failures in the Node.js test suite
while attempting to release libuv 1.41.0.

Refs: libuv#3086
Refs: libuv#3064
PR-URL: libuv#3107
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Jameson Nash <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
@cjihrig cjihrig merged commit 217fdf4 into libuv:v1.x Feb 12, 2021
@cjihrig cjihrig deleted the revert branch February 12, 2021 22:05
@vtjnash
Copy link
Copy Markdown
Member

vtjnash commented Feb 12, 2021

Some day I'm hoping we could install rr on these Jenkins workers, and then any linux failure would be trivial to reproduce, diagnose, and fix. We've been using it on our buildbots for JuliaLang for the past couple of months, and it has helped enormously at driving down our rare failure rate. The author also has some advance integration tooling for working with javascript, which might be interesting to the nodejs folks too. I don't really know who to contact for installing and setting this up however.

@cjihrig
Copy link
Copy Markdown
Contributor Author

cjihrig commented Feb 12, 2021

If you open an issue in https://github.com/nodejs/build they should be able to help (assuming they think it's a good idea). cc: @richardlau

@richardlau
Copy link
Copy Markdown
Member

Assuming that it is https://github.com/rr-debugger/rr that looks to be Linux only and the issue with this one was a broken macOS CI?

Feel free to open an issue in build if you feel rr would be useful -- it would be helpful to also explain how it would be used (e.g. would we need to upload new artifacts (the recordings) and how much space those are likely to need) bearing in mind that most users who are able to start jobs in Jenkins don't have permissions to log into the machines themselves (temporary access can be granted on request).

If it's Linux another option would be GitHub actions.

@deepak1556
Copy link
Copy Markdown

@cjihrig just catching up on this, can you point out which tests in Node.js suite were failing ?

Electron backported the posix_spawn change to its stable release lines last week, would be good to know why it should be reverted. Thanks!

@cjihrig
Copy link
Copy Markdown
Contributor Author

cjihrig commented Feb 18, 2021

Here are the failures: https://ci.nodejs.org/job/node-test-commit-osx/38851/. Please note that the CI results are not persisted permanently, so I don't know how much longer that link will be valid.

For posterity, the failures are:

test.parallel/test-child-process-cwd
test.parallel/test-child-process-exec-error
test.parallel/test-child-process-promisified
test.parallel/test-child-process-spawn-error
test.parallel/test-child-process-spawnsync-validation-errors
test.parallel/test-heapsnapshot-near-heap-limit-worker
test.parallel/test-child-process-exec-error
test.parallel/test-child-process-promisified
test.parallel/test-child-process-spawn-error

test-heapsnapshot-near-heap-limit-worker may not be related though.

@deepak1556
Copy link
Copy Markdown

Looked into the failing tests there were 3 different types, summarizing what I have found so far.

  1. The following tests were failing when they hit a runtime errors but libuv was closing the pipe that didn't allow for further processing in https://github.com/nodejs/node/blob/master/lib/internal/child_process.js#L444-L456
test.parallel/test-child-process-cwd
test.parallel/test-child-process-exec-error
test.parallel/test-child-process-promisified
test.parallel/test-child-process-spawn-error

I tried to match the previous behavior for these cases and it seems to fix the above tests

diff --git a/deps/uv/src/unix/process.c b/deps/uv/src/unix/process.c
index b4e44b049f..abc83e24c9 100644
--- a/deps/uv/src/unix/process.c
+++ b/deps/uv/src/unix/process.c
@@ -970,7 +970,12 @@ int uv_spawn(uv_loop_t* loop,
 
   /* Spawn the child */
   err = uv__spawn_and_init_child(options, stdio_count, pipes, signal_pipe[1], &pid);
-  if (err != 0) {
+  if (err != 0 &&
+      !(err == UV_EACCES ||
+       err == UV_EAGAIN ||
+       err == UV_EMFILE ||
+       err == UV_ENFILE ||
+       err == UV_ENOENT)) {
     uv_rwlock_wrunlock(&loop->cloexec_lock);
     uv__close(signal_pipe[0]);
     uv__close(signal_pipe[1]);
@@ -982,7 +987,7 @@ int uv_spawn(uv_loop_t* loop,
   uv__close(signal_pipe[1]);
 
   process->status = 0;
-  exec_errorno = 0;
+  exec_errorno = err;
   do
     r = read(signal_pipe[0], &exec_errorno, sizeof(exec_errorno));
   while (r == -1 && errno == EINTR);
  1. test.parallel/test-child-process-stdio was failing with EINTR at the posix_spawn call, I don't know what causes this but adding a retry logic fixes this case.
diff --git a/deps/uv/src/unix/process.c b/deps/uv/src/unix/process.c
index b4e44b049f..241272a806 100644
--- a/deps/uv/src/unix/process.c
+++ b/deps/uv/src/unix/process.c
@@ -772,7 +772,11 @@ int uv__spawn_resolve_and_spawn(const uv_process_options_t* options,
     /* Try to spawn the new process file. If it fails with ENOENT, the
      * new process file is not in this PATH entry, continue with the next
      * PATH entry. */
-    err = posix_spawn(pid, b, actions, attrs, options->args, env);
+    int max_retries = 4;
+    int retries = 0;
+    do
+      err = posix_spawn(pid, b, actions, attrs, options->args, env);
+    while (err == EINTR && ++retries < max_retries);
  1. test.parallel/test-child-process-spawnsync-validation-errors this is another interesting one, it was failing with EPERM whenever a test tried to set uid or gid for the child process. A minimal repro at https://gist.github.com/deepak1556/7b02c18eeede9cdcd5da55439e909e1b also validates this. Looking at the posix_spawn code, whenever posix_cred_info is filled by posix_spawnattr_set_uid_np, posix_spawnattr_set_gid_np, posix_spawnattr_set_groups_np it seems to trigger the following check
//https://github.com/apple/darwin-xnu/blob/8f02f2a044b9bb1ad951987ef5bab20ec9486310/bsd/kern/kern_exec.c#L3231

if (!kauth_cred_issuser(kauth_cred_get())) {
				error = EPERM;
				goto bad;
}

and kauth_cred_issuer seems to do the following check

// https://github.com/apple/darwin-xnu/blob/8f02f2a044b9bb1ad951987ef5bab20ec9486310/bsd/kern/kern_credential.c#L3500
/*
 * kauth_cred_issuser
 *
 * Description:	Fast replacement for issuser()
 *
 * Parameters:	cred				Credential to check for super
 *						user privileges
 *
 * Returns:	0				Not super user
 *		!0				Is super user
 *
 * Notes:	This function uses a magic number which is not a manifest
 *		constant; this is bad practice.
 */

Does this mean a super user needs to call the process triggering posix_spawn to set UID/GID bits ?

@vtjnash
Copy link
Copy Markdown
Member

vtjnash commented Feb 20, 2021

Yes, setuid / setgid should require being root, uid, or euid already, or fail with EPERM.

@vtjnash
Copy link
Copy Markdown
Member

vtjnash commented Feb 20, 2021

posix_spawn is not supposed to return EINTR, but if it was spawning multiple processes, we're probably getting many SIGCHLD notifications, so I wouldn't be surprised if we did get them here

@vtjnash
Copy link
Copy Markdown
Member

vtjnash commented Feb 20, 2021

Yeah, rr is a wish list feature, but still a bit hard to set up for CI, so while it would be great, it isn't something I'm fully prepared yet to champion. I just mention it because it makes (linux) debugging so much easier for particularly types of bugs, so I think that everyone should learn about it!

In libuv, there are still some frequent linux failures which keep it from being consistently green, though I'd fixed many of them this past year, so it is sometimes green. Unfortunately using rr also depends on a particular CPU feature, and many VM hosts don't enable that feature, including probably github actions (rr-debugger/rr#2402 (comment)). For configuration, yes, those would need to be temporarily made available somewhere for download. They're fairly compact, as they only need to include a copy of all files read and a bit of metadata.

@deepak1556
Copy link
Copy Markdown

deepak1556 commented Feb 20, 2021

Yes, setuid / setgid should require being root, uid, or euid already, or fail with EPERM.

The test here https://github.com/nodejs/node/blob/master/test/parallel/test-child-process-spawnsync-validation-errors.js#L60 is trying to set uid of parent process, should that fail with EPERM ? It looks the check in kern_exec.c was only added in Catalina and higher which seem to require being root to make any uid changes even if the uid is same as the parent process...

but if it was spawning multiple processes, we're probably getting many SIGCHLD notifications, so I wouldn't be surprised if we did get them here

Yup that seems to be the case as the relevant test cases https://github.com/nodejs/node/blob/master/test/parallel/test-child-process-stdio.js in Node.js involve the PATH resolution stage https://github.com/descriptinc/libuv/blob/39968db6434f16f9fbd3c78f6c7392fd7ae9074e/src/unix/process.c#L693-L719 in the posix_spawn implementation.

@jpcanepa
Copy link
Copy Markdown
Contributor

Unfortunate that jpcanepa is on vacation now, but I expect he'll dig into the findings soon after he's back. And better to get the release done. Sorry it delayed it so much without effect!

I'm back. I see @vtjnash has done a pretty good rundown already, but I'll try look on my end too.

@vtjnash
Copy link
Copy Markdown
Member

vtjnash commented Jul 26, 2021

For EINTR, I notice that is observed elsewhere also (https://reviews.llvm.org/D61096), though it potentially also should go away with #3251 (which blocks signals)

vtjnash added a commit to vtjnash/libuv that referenced this pull request Jul 26, 2021
vtjnash added a commit to vtjnash/libuv that referenced this pull request Jul 26, 2021
This kernel function is not permitted unless the process is setuid root,
so disable this syscall. Falling back to fork/exec should be okay for
the rare cases that the user decides they need to do setuid(getuid()) or
setuid(geteuid()) for the child.

Refs: libuv#3107 (comment)
vtjnash added a commit to vtjnash/libuv that referenced this pull request Jan 31, 2022
vtjnash added a commit to vtjnash/libuv that referenced this pull request Jan 31, 2022
This kernel function is not permitted unless the process is setuid root,
so disable this syscall. Falling back to fork/exec should be okay for
the rare cases that the user decides they need to do setuid(getuid()) or
setuid(geteuid()) for the child.

Refs: libuv#3107 (comment)
vtjnash added a commit to vtjnash/libuv that referenced this pull request Feb 1, 2022
vtjnash added a commit to vtjnash/libuv that referenced this pull request Feb 1, 2022
This kernel function is not permitted unless the process is setuid root,
so disable this syscall. Falling back to fork/exec should be okay for
the rare cases that the user decides they need to do setuid(getuid()) or
setuid(geteuid()) for the child.

Refs: libuv#3107 (comment)
vtjnash added a commit to vtjnash/libuv that referenced this pull request Feb 23, 2022
vtjnash added a commit to vtjnash/libuv that referenced this pull request Feb 23, 2022
This kernel function is not permitted unless the process is setuid root,
so disable this syscall. Falling back to fork/exec should be okay for
the rare cases that the user decides they need to do setuid(getuid()) or
setuid(geteuid()) for the child.

Refs: libuv#3107 (comment)
vtjnash added a commit that referenced this pull request Mar 2, 2022
Fixes: #3050
Refs: #3086
Refs: #3064
Refs: #3107
Refs: #3064

This reverts commit 217fdf4, then fixes
several issues with it:

* remove error fast-cleanup code that triggers a nodejs bug

Refs: #3107 (comment)

* protect posix_spawn from EINTR

This is not a documented valid error, but seems to have been observed.

* ignore setuid/setgid syscall

This kernel function is not permitted unless the process is setuid root,
so disable this syscall. Falling back to fork/exec should be okay for
the rare cases that the user decides they need to do setuid(getuid()) or
setuid(geteuid()) for the child.

Refs: #3107 (comment)

* improve posix_spawn path search

Ports the improvements in musl back to this function

* fix some additional problems and formatting issues

We previously might fail to start a watcher, in rare failure cases,
resulting in a zombie that we would fail to kill. Also avoid creating
the signal-pipe unless required (addresses a review comment from Apple)

* fix fd->fd mapping reuse

There was a chance that when duplicating the fd's into stdio_count+fd we
might be closing a currently opened fd with that value.
vtjnash added a commit to vtjnash/libuv that referenced this pull request Mar 3, 2022
several issues with it:

* remove error fast-cleanup code that triggers a nodejs bug

Refs: libuv#3107 (comment)

* protect posix_spawn from EINTR

This is not a documented valid error, but seems to have been observed.

* ignore setuid/setgid syscall

This kernel function is not permitted unless the process is setuid root,
so disable this syscall. Falling back to fork/exec should be okay for
the rare cases that the user decides they need to do setuid(getuid()) or
setuid(geteuid()) for the child.

Refs: libuv#3107 (comment)

* improve posix_spawn path search

Ports the improvements in musl back to this function

* fix some additional problems and formatting issues

We previously might fail to start a watcher, in rare failure cases,
resulting in a zombie that we would fail to kill. Also avoid creating
the signal-pipe unless required (addresses a review comment from Apple)

* fix fd->fd mapping reuse
JeffroMF pushed a commit to JeffroMF/libuv that referenced this pull request May 16, 2022
This reverts commit 39968db.

This commit caused a number of failures in the Node.js test suite
while attempting to release libuv 1.41.0.

Refs: libuv#3086
Refs: libuv#3064
PR-URL: libuv#3107
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Jameson Nash <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
JeffroMF pushed a commit to JeffroMF/libuv that referenced this pull request May 16, 2022
Fixes: libuv#3050
Refs: libuv#3086
Refs: libuv#3064
Refs: libuv#3107
Refs: libuv#3064

This reverts commit 217fdf4, then fixes
several issues with it:

* remove error fast-cleanup code that triggers a nodejs bug

Refs: libuv#3107 (comment)

* protect posix_spawn from EINTR

This is not a documented valid error, but seems to have been observed.

* ignore setuid/setgid syscall

This kernel function is not permitted unless the process is setuid root,
so disable this syscall. Falling back to fork/exec should be okay for
the rare cases that the user decides they need to do setuid(getuid()) or
setuid(geteuid()) for the child.

Refs: libuv#3107 (comment)

* improve posix_spawn path search

Ports the improvements in musl back to this function

* fix some additional problems and formatting issues

We previously might fail to start a watcher, in rare failure cases,
resulting in a zombie that we would fail to kill. Also avoid creating
the signal-pipe unless required (addresses a review comment from Apple)

* fix fd->fd mapping reuse

There was a chance that when duplicating the fd's into stdio_count+fd we
might be closing a currently opened fd with that value.
liujinye-sys pushed a commit to open-vela/apps_system_libuv that referenced this pull request Jul 23, 2025
This reverts commit 39968db.

This commit caused a number of failures in the Node.js test suite
while attempting to release libuv 1.41.0.

Refs: libuv/libuv#3086
Refs: libuv/libuv#3064
PR-URL: libuv/libuv#3107
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Jameson Nash <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
liujinye-sys pushed a commit to open-vela/apps_system_libuv that referenced this pull request Jul 23, 2025
Fixes: libuv/libuv#3050
Refs: libuv/libuv#3086
Refs: libuv/libuv#3064
Refs: libuv/libuv#3107
Refs: libuv/libuv#3064

This reverts commit 217fdf4, then fixes
several issues with it:

* remove error fast-cleanup code that triggers a nodejs bug

Refs: libuv/libuv#3107 (comment)

* protect posix_spawn from EINTR

This is not a documented valid error, but seems to have been observed.

* ignore setuid/setgid syscall

This kernel function is not permitted unless the process is setuid root,
so disable this syscall. Falling back to fork/exec should be okay for
the rare cases that the user decides they need to do setuid(getuid()) or
setuid(geteuid()) for the child.

Refs: libuv/libuv#3107 (comment)

* improve posix_spawn path search

Ports the improvements in musl back to this function

* fix some additional problems and formatting issues

We previously might fail to start a watcher, in rare failure cases,
resulting in a zombie that we would fail to kill. Also avoid creating
the signal-pipe unless required (addresses a review comment from Apple)

* fix fd->fd mapping reuse

There was a chance that when duplicating the fd's into stdio_count+fd we
might be closing a currently opened fd with that value.
liujinye-sys pushed a commit to open-vela/apps_system_libuv that referenced this pull request Dec 16, 2025
This reverts commit 39968db.

This commit caused a number of failures in the Node.js test suite
while attempting to release libuv 1.41.0.

Refs: libuv/libuv#3086
Refs: libuv/libuv#3064
PR-URL: libuv/libuv#3107
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Jameson Nash <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
liujinye-sys pushed a commit to open-vela/apps_system_libuv that referenced this pull request Dec 16, 2025
Fixes: libuv/libuv#3050
Refs: libuv/libuv#3086
Refs: libuv/libuv#3064
Refs: libuv/libuv#3107
Refs: libuv/libuv#3064

This reverts commit 217fdf4, then fixes
several issues with it:

* remove error fast-cleanup code that triggers a nodejs bug

Refs: libuv/libuv#3107 (comment)

* protect posix_spawn from EINTR

This is not a documented valid error, but seems to have been observed.

* ignore setuid/setgid syscall

This kernel function is not permitted unless the process is setuid root,
so disable this syscall. Falling back to fork/exec should be okay for
the rare cases that the user decides they need to do setuid(getuid()) or
setuid(geteuid()) for the child.

Refs: libuv/libuv#3107 (comment)

* improve posix_spawn path search

Ports the improvements in musl back to this function

* fix some additional problems and formatting issues

We previously might fail to start a watcher, in rare failure cases,
resulting in a zombie that we would fail to kill. Also avoid creating
the signal-pipe unless required (addresses a review comment from Apple)

* fix fd->fd mapping reuse

There was a chance that when duplicating the fd's into stdio_count+fd we
might be closing a currently opened fd with that value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants