internal/cmd IO-related bugs #1296

helsaawy · 2022-02-14T21:04:42Z

Currently, when exec-ing into a WCOW container with a stdin pipe via ctricl (or the shim via shimdiag exec) internal/cmd:Cmd.Wait() closes the underlying process and returns to internal/cmd:ExecInUvm() without waiting on or closing the go routine that copies data from the upstream stdin pipe (which is exposed to containerd) to the process's stdin (via relayIO) in the background. ExecInUvm then closes the upstream std IO pipes. (Note: this is different from exec.Cmd, which does not finish Waiting until stdin is either closed or yields an EOF on a read.)

This yields several errors, one from the relayIO function:

Message: 	Cmd IO relay complete
error: 	 	file has already been closed
file: 	 	stdin

And one from the background go routines, when they attempt to close their respective IO pipes:

Message:	failed to close Cmd stdin

With the error on hypervisor-isolated containers:

error:		close hvsock [...]: shutdown: An operation was attempted on something that is not a socket.

And on process-isolated containers:

error: 		hcs::Process::CloseStdin [...]: hcsshim: the handle has already been closed

This PR solves several issues that raise those errors.

Currently, if the exec exits without the caller first closing stdin, the IO relay for stdin is guaranteed to raise the file has already been closed error, which is raised by winio when attempting to read from a closed pipe. internal/cmd:relayIO was modified to ignore file closed errors from the io.Copy on stdin, but not from stdout and stderr, since those closing are likely anomalous.
Since there is no guarantee as the ordering between cmd.Wait calling process.Close and the relayIO go routines calling process.CloseStdin/out/err, the close IO operations now ignore file has already been closed errors.
hcs.ErrAlreadyClosed from process.Close are now also ignored, as cmd.Wait() may call it twice.

internal/gcs:Process was updated to match internal/hcs:Process and internal/jobcontainers:Process:

.CloseStdin/out/err functions return hcs.ErrAlreadyClosed if the process was already closed;
.CloseStdin first calls CloseWrite to send an EOF, and then closes the IO channel; and
.Close zeros out its gc connection to prevent future accidental uses, and protects gc with a RWLock so closing does not occur while other operations that require the guest connection are executing.

For internal/hcs:Process, if process.CloseStdin is called after the process exits but before it is closed, it sends an HcsModifyProcess request, which errors:

error: 	The process has already exited.
name: 	HcsModifyProcess
settings: 
{
	""Operation"": 	""CloseHandle""
	""CloseHandle"": 	{""Handle"":""StdIn""}
}

Now it checks if the process exited before sending the close handle request.
Additionally, it retuned hcs.ErrAlreadyClosed if the process.CloseStdin was called on a closed process, but did not do the same for CloseStdout and CloseStderr. The latter two were updated to match CloseStdin and jobcontainers:Process.

Finally, a flag was added to internal/cmd:Cmd to close the upstream Stdin IO channel after the process exits but before Cmd.Wait() waits on the IO operations. This allows callers to close the upstream stdin and guarantee that the stdin io.Copy operations end safely and properly after the process exits.

Note: it is impossible to guarantee interrupting the io.Copy operation without closing the upstream pipe, as it can block indefinitely on the syscall to read from the upstream stdin pipe. Therefore, only closing or sending an EOF (via .CloseWrite(), which is not available on the default, non-message based winio Pipes) will end the go routine copying data from that pipe.

Tests were updated to account for new functionality.

Signed-off-by: Hamza El-Saawy [email protected]

internal/cmd/cmd.go

katiewasnothere · 2022-02-19T03:11:39Z

internal/cmd/cmd.go

+	if err == nil {
+		err = ioErr
+	}


Should we maybe wrap the errors together instead?

as in?

if err != nil { err := fmt.Errorf("multiple errors during IO copy: %w; %w", err, cerr) }

internal/cmd/cmd.go

internal/cmd/cmd_test.go

katiewasnothere · 2022-04-15T00:19:54Z

internal/cmd/cmd_test.go

+	tr := time.NewTimer(250 * time.Millisecond) // give the cmd a chance to finish running
+	defer tr.Stop()
+	select {
+	case err := <-done:
+		if err != nil {
+			t.Fatalf("cmd run failed: %v", err)
+		}
+		t.Fatal("command should have blocked indefinitely")
+	case <-tr.C:


Maybe instead of this, you could just check that you can still write to stdout?

from the process itself, or in the test function itself?

internal/gcs/process.go

internal/hcs/process.go

internal/cmd/cmd.go

katiewasnothere · 2022-04-15T00:34:13Z

internal/cmd/cmd.go

+		// Even if `stdin` is closed, the runtime can block indefinitely on reading
+		// c.Stdin, so the only reliable way to unblock this is with:
+		// c.Stdin.CloseWrite() (if it implements it) or c.Stdin.Close().
+		// However, we are only passed on the Reader end of Stdin, and closing the
+		// upstream c.Stdin breaks with the functionality that os.exec.Cmd implements.


What is this comment trying to justify?

It expands on the original comment with more info, and gives context as to what is happening.
I didnt change anything here, so no justification.

katiewasnothere · 2022-05-02T19:28:58Z

To be clear, with this PR, we'd never log isIOChannelClosedErr errors?

helsaawy · 2022-05-02T19:53:32Z

To be clear, with this PR, we'd never log isIOChannelClosedErr errors?

For the upstream and processStdin, we never log it, even if it is closed during io.Copy. For the process std IOs, we dont log it if it arises when we try and close them after the io.Copy finishes, since the calls to close the individual IO channels may come after the process.Close happens and closes them.

helsaawy · 2022-06-22T22:07:41Z

reabase to fix merge conflicts, remove test/vendor directory

dcantah · 2022-08-04T00:24:37Z

internal/gcs/process.go

+		if serr := p.stderr.Close(); serr != nil {
+			log.G(ctx).WithError(serr).Warn("close stderr failed")
+			if err == nil {
+				err = serr


We don't actually do anything with these assignments right? After all of these close, whether successful or not, we just return nil on line 173.

Or is it some weird Go named return value shenanigans that I'm forgetting

We use it to set the span status, but I dont know if we should even bother to return stdIO close errors (or log it in the span).
hcs\process.Close just ignores stdIO errors.
We probably should standardize how we do closes ....

Ahh, for the span okay. And yes agreed. It's frustrating to go look at how each type of container we support does something, and they all have their own little quirks or way of handling things

When the function returns though, that value gets set to the "err" value of the function. So then the span would still be getting set to nil error. Right?

oh right, i missed that ...
I guess either we return an errors for std IO close, even though we chug through
or we ignore it completely
or we do some more naming shenanigans to get it to work
preferences?

dcantah · 2022-08-04T00:28:06Z

internal/gcs/process.go


+	p.gcLock.RLock()
+	defer p.gcLock.RUnlock()
+	if p.gc == nil {


Was this before the introduction of our friend p.gcClosed()?

gcClosed RLocks, and I dont think RWMutex is re-entrant (The docs say "It should not be used for recursive read locking"), so I would need to rewrite it as:

if p.gcClosed() { return hcs.ErrAlreadyClosed } p.gcLock.RLock() defer p.gcLock.RUnlock() // ...

But that allows a race condition where another process closes gc inbetween the check and the rpc call

As a side note, I considered:

func (p *Process) gcClosed() bool { p.gcLock.RLock() defer p.gcLock.RUnlock() return gcclosed() } // unsafe: requires holding p.gcLock func (p *Process) gcclosed() bool { return p.gc == nil }

but that seems .... excessive

internal/gcs/process.go

internal/hcs/process.go

dcantah · 2022-08-04T00:37:18Z

internal/hcs/process.go

+		if err == nil {
+			var resultJSON string
+			resultJSON, err = vmcompute.HcsModifyProcess(ctx, process.handle, string(b))
+			events := processHcsResult(ctx, resultJSON)
+			if err != nil {
+				err = makeProcessError(process, operation, err, events)
+			}
+		}


Is this just so we always call process.stdin.Close() instead of early returning if the marshal fails?

yeah, it feels weird to give up on closing stdin on the host side just because the marshaling failed. Especially since the marshalling is an internal detail and the error is salvageable.

helsaawy · 2022-08-08T16:27:51Z

rebased to update CI pipeline

Updating cmd to wait on all relayIO operations, and using CloseStdin to signal closure of the process. `hcs.Process` would attempt to send `ModifyProcessRequest` to close stdin even after process exits. Was also inconsistent about returning error on already closed IO handles. Changed `gcs` Process to mirror `hcs.Process` and `jobcontainer.Process` `.CloseStd*` now return errors if the process was already closed, `.Close) zeros out the `gc` connection, and `.CloseStdin` both performs a `CloseWrite` and closes the channel. Signed-off-by: Hamza El-Saawy <[email protected]>

Bug with how stdin streams were closed for processes fixed. removed ineffectual assignment to ctx in c.Wait() Signed-off-by: Hamza El-Saawy <[email protected]>

internal/cmd/cmd.go

katiewasnothere · 2022-09-16T00:01:53Z

internal/cmd/io.go

+// Use skipErr to ignore and not log errors that it returns true for: passing in
+// `isIOChannelClosedErr` will ignore errors raised by reading from or writing to
+// a closed Reader or Writer, respectively. It can be `nil`
+func relayIO(w io.Writer, r io.Reader, log *logrus.Entry, name string, skipErr func(error) bool) (int64, error) {


I'm kind of torn here. I wonder if we should just remove the logging in relayIO entirely and just log in the caller. Thoughts? That would avoid this weird "skipErr" func. In which case this would essentially just be io.Copy. What are the downsides?

honestly, none. I didnt want to remove since I didnt know the history, but it can easily be inlined into Cmd.Start()

I say go for it then haha

internal/gcs/process.go

helsaawy marked this pull request as ready for review February 15, 2022 17:55

helsaawy requested a review from a team as a code owner February 15, 2022 17:55

katiewasnothere reviewed Feb 19, 2022

View reviewed changes

internal/cmd/cmd.go Outdated Show resolved Hide resolved

katiewasnothere reviewed Feb 19, 2022

View reviewed changes

internal/cmd/cmd.go Outdated Show resolved Hide resolved

katiewasnothere reviewed Feb 19, 2022

View reviewed changes

internal/cmd/cmd.go Outdated Show resolved Hide resolved

katiewasnothere reviewed Feb 19, 2022

View reviewed changes

internal/cmd/cmd.go Show resolved Hide resolved

katiewasnothere reviewed Feb 19, 2022

View reviewed changes

internal/cmd/cmd.go Outdated Show resolved Hide resolved

helsaawy force-pushed the he/iorelay branch 2 times, most recently from 146556a to 74c5101 Compare February 22, 2022 20:15

helsaawy assigned helsaawy and katiewasnothere and unassigned helsaawy Mar 30, 2022

helsaawy force-pushed the he/iorelay branch from 7312557 to fcc611a Compare April 12, 2022 14:40