Skip to content

Conversation

@gireeshpunathil
Copy link
Contributor

On AIX we were seing failures in the Node test:

simple/test-child-process-execsync.js

where the error was ENOTCONN. We identified that the
ENOTCONN a result of the call to uv_spawn in libuv.

When uv_spawn is called the child is fork/exec'd and the
parent waits for the child to terminate. Once the child
terminates it gathers the child's execution result. As
part of this process shutdown is called on the streams
which connect it to the child. At the time the parent
calls shutdown() the child will have already closed
its side of the streams. The call to shutdown is in
uv__drain() within src/unix/stream.c

Based on c test cases we ran, when run on AIX calling
shutdown on a stream for which the other side has been
closed results in the error ENOTCONN being returned.
When running the same test case on linux no error
is returned.

It appears that in uv_drain within src/unix/stream.c
the intention is to ensure we can no longer write to
the stream. If the other side is already closed this
will be the case so we believe the best course of
action is to simply ignore the ENOTCONN error in
uv__drain. And hence this pull request.

This change does not introduce any new failures when
running the tests on AIX. It also does not fix any
existing failures, but it does allow the node test
simple/test-child-process-execsync.js to pass.

shutdown on sockets whose peer is closed causes ENOTCONN error in AIX.
As we are shutting down the write part of the socket anyways, this
error is of no consequence. Don't propagate, ignore instead.
@mhdawson
Copy link
Contributor

I believe Gireesh meat to say in the last paragraph that it lets the simple/test-child-process-execsync.js pass.

@gireeshpunathil
Copy link
Contributor Author

Yes! added a wrong test case name by mistake. Corrected now.

@bnoordhuis
Copy link
Member

I'm willing to believe that Linux is a bit more forgiving than AIX but what I don't understand is why the test isn't failing on BSD-like platforms.

From your description of the issue it sounds like AIX has the same shutdown semantics as the BSDs and OS X: shutdown() on one half of a socket pair when the other end is closed fails with ENOTCONN.

I don't want to be difficult but I have a feeling the real issue may be something else. Maybe a timing issue in the test?

@gireeshpunathil
Copy link
Contributor Author

@bnoordhuis, thanks for your comments. In OS X as well I am able to observe the
ENOTCONN issue for this test case, though with less frequency than in AIX case.

Here are the version details:

Darwin Kernel Version 12.5.0
node version: v0.12.0
uv version: 1.0.2

However, I see the failure pattern changes over time - probably due to the load pattern
change in the system. If I add a small delay (100 ms) in uv__drain before the shutdown()
call, the failure becomes consistent.

@bnoordhuis
Copy link
Member

@gireeshpunathil Do you also see it with io.js? The libuv that ships with node.js v0.12 is pretty old by now and I wouldn't rule out that v0.12 itself has a couple of bugs that have been fixed in io.js.

@gireeshpunathil
Copy link
Contributor Author

@bnoordhuis , I ran the test on the latest io.js (v1.5.2) on OS X, and found that the issue is present there too.

@bnoordhuis
Copy link
Member

@gireeshpunathil Thanks, I've been able to confirm the issue. I'll take a look.

@bnoordhuis
Copy link
Member

Continues in nodejs/node#1214. I don't think handling it in libuv is the right approach because it can mask genuine errors but it's regrettable that platform differences leak through to libuv users. I'm going to tentatively close this.

@bnoordhuis bnoordhuis closed this Mar 19, 2015
@saghul
Copy link
Member

saghul commented Mar 19, 2015

bnoordhuis added a commit to bnoordhuis/io.js that referenced this pull request Mar 28, 2015
On AIX, OS X and the BSDs, calling shutdown() on one end of a pipe
when the other end has closed the connection fails with ENOTCONN.

The sequential/test-child-process-execsync test failed sporadically
because of a race between the parent and the child where one closed
its end of the pipe before the other got around to calling shutdown()
on its end of the pipe.

Libuv is not the right place to handle that because it can't tell if
the ENOTCONN error is genuine but io.js can.

Refs: libuv/libuv#268
PR-URL: nodejs#1214
Reviewed-By: Bert Belder <[email protected]>
mhdawson pushed a commit to nodejs/node-v0.x-archive that referenced this pull request Apr 13, 2015
This is a backport of ea37ac0

Original commit message:

  On AIX, OS X and the BSDs, calling shutdown() on one end of a pipe
  when the other end has closed the connection fails with ENOTCONN.

  The sequential/test-child-process-execsync test failed sporadically
  because of a race between the parent and the child where one closed
  its end of the pipe before the other got around to calling shutdown()
  on its end of the pipe.

  Libuv is not the right place to handle that because it can't tell if
  the ENOTCONN error is genuine but io.js can.

  Refs: libuv/libuv#268
  PR-URL: iojs#1214
  Reviewed-By: Bert Belder <[email protected]>

Fixes: #9444.

Reviewed-By: Julien Gilli <[email protected]>
PR-URL: #14480
joefitzgerald added a commit to joefitzgerald/go-plus that referenced this pull request Apr 21, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants