Skip to content

Conversation

@helsaawy
Copy link
Contributor

@helsaawy helsaawy commented Feb 14, 2022

Currently, when exec-ing into a WCOW container with a stdin pipe via ctricl (or the shim via shimdiag exec) internal/cmd:Cmd.Wait() closes the underlying process and returns to internal/cmd:ExecInUvm() without waiting on or closing the go routine that copies data from the upstream stdin pipe (which is exposed to containerd) to the process's stdin (via relayIO) in the background. ExecInUvm then closes the upstream std IO pipes. (Note: this is different from exec.Cmd, which does not finish Waiting until stdin is either closed or yields an EOF on a read.)

This yields several errors, one from the relayIO function:

Message: 	Cmd IO relay complete
error: 	 	file has already been closed
file: 	 	stdin

And one from the background go routines, when they attempt to close their respective IO pipes:

Message:	failed to close Cmd stdin

With the error on hypervisor-isolated containers:

error:		close hvsock [...]: shutdown: An operation was attempted on something that is not a socket.

And on process-isolated containers:

error: 		hcs::Process::CloseStdin [...]: hcsshim: the handle has already been closed

This PR solves several issues that raise those errors.

Currently, if the exec exits without the caller first closing stdin, the IO relay for stdin is guaranteed to raise the file has already been closed error, which is raised by winio when attempting to read from a closed pipe. internal/cmd:relayIO was modified to ignore file closed errors from the io.Copy on stdin, but not from stdout and stderr, since those closing are likely anomalous.
Since there is no guarantee as the ordering between cmd.Wait calling process.Close and the relayIO go routines calling process.CloseStdin/out/err, the close IO operations now ignore file has already been closed errors.
hcs.ErrAlreadyClosed from process.Close are now also ignored, as cmd.Wait() may call it twice.

internal/gcs:Process was updated to match internal/hcs:Process and internal/jobcontainers:Process:

  1. .CloseStdin/out/err functions return hcs.ErrAlreadyClosed if the process was already closed;
  2. .CloseStdin first calls CloseWrite to send an EOF, and then closes the IO channel; and
  3. .Close zeros out its gc connection to prevent future accidental uses, and protects gc with a RWLock so closing does not occur while other operations that require the guest connection are executing.

For internal/hcs:Process, if process.CloseStdin is called after the process exits but before it is closed, it sends an HcsModifyProcess request, which errors:

error: 	The process has already exited.
name: 	HcsModifyProcess
settings: 
{
	""Operation"": 	""CloseHandle""
	""CloseHandle"": 	{""Handle"":""StdIn""}
}

Now it checks if the process exited before sending the close handle request.
Additionally, it retuned hcs.ErrAlreadyClosed if the process.CloseStdin was called on a closed process, but did not do the same for CloseStdout and CloseStderr. The latter two were updated to match CloseStdin and jobcontainers:Process.

Finally, a flag was added to internal/cmd:Cmd to close the upstream Stdin IO channel after the process exits but before Cmd.Wait() waits on the IO operations. This allows callers to close the upstream stdin and guarantee that the stdin io.Copy operations end safely and properly after the process exits.

Note: it is impossible to guarantee interrupting the io.Copy operation without closing the upstream pipe, as it can block indefinitely on the syscall to read from the upstream stdin pipe. Therefore, only closing or sending an EOF (via .CloseWrite(), which is not available on the default, non-message based winio Pipes) will end the go routine copying data from that pipe.

Tests were updated to account for new functionality.

Signed-off-by: Hamza El-Saawy [email protected]

@helsaawy helsaawy marked this pull request as ready for review February 15, 2022 17:55
@helsaawy helsaawy requested a review from a team as a code owner February 15, 2022 17:55
Comment on lines 334 to 357
if err == nil {
err = ioErr
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe wrap the errors together instead?

Copy link
Contributor Author

@helsaawy helsaawy Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as in?

if err != nil {
  err := fmt.Errorf("multiple errors during IO copy: %w; %w", err, cerr)
}

@helsaawy helsaawy force-pushed the he/iorelay branch 2 times, most recently from 146556a to 74c5101 Compare February 22, 2022 20:15
@helsaawy helsaawy assigned helsaawy and katiewasnothere and unassigned helsaawy Mar 30, 2022
Comment on lines +346 to +290
tr := time.NewTimer(250 * time.Millisecond) // give the cmd a chance to finish running
defer tr.Stop()
select {
case err := <-done:
if err != nil {
t.Fatalf("cmd run failed: %v", err)
}
t.Fatal("command should have blocked indefinitely")
case <-tr.C:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of this, you could just check that you can still write to stdout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the process itself, or in the test function itself?

Comment on lines +216 to +224
// Even if `stdin` is closed, the runtime can block indefinitely on reading
// c.Stdin, so the only reliable way to unblock this is with:
// c.Stdin.CloseWrite() (if it implements it) or c.Stdin.Close().
// However, we are only passed on the Reader end of Stdin, and closing the
// upstream c.Stdin breaks with the functionality that os.exec.Cmd implements.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this comment trying to justify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It expands on the original comment with more info, and gives context as to what is happening.
I didnt change anything here, so no justification.

@katiewasnothere
Copy link

To be clear, with this PR, we'd never log isIOChannelClosedErr errors?

@helsaawy
Copy link
Contributor Author

helsaawy commented May 2, 2022

To be clear, with this PR, we'd never log isIOChannelClosedErr errors?

For the upstream and processStdin, we never log it, even if it is closed during io.Copy. For the process std IOs, we dont log it if it arises when we try and close them after the io.Copy finishes, since the calls to close the individual IO channels may come after the process.Close happens and closes them.

@helsaawy
Copy link
Contributor Author

reabase to fix merge conflicts, remove test/vendor directory

if serr := p.stderr.Close(); serr != nil {
log.G(ctx).WithError(serr).Warn("close stderr failed")
if err == nil {
err = serr
Copy link
Contributor

@dcantah dcantah Aug 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually do anything with these assignments right? After all of these close, whether successful or not, we just return nil on line 173.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is it some weird Go named return value shenanigans that I'm forgetting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use it to set the span status, but I dont know if we should even bother to return stdIO close errors (or log it in the span).
hcs\process.Close just ignores stdIO errors.
We probably should standardize how we do closes ....

Copy link
Contributor

@dcantah dcantah Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, for the span okay. And yes agreed. It's frustrating to go look at how each type of container we support does something, and they all have their own little quirks or way of handling things

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the function returns though, that value gets set to the "err" value of the function. So then the span would still be getting set to nil error. Right?

Copy link
Contributor Author

@helsaawy helsaawy Sep 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right, i missed that ...
I guess either we return an errors for std IO close, even though we chug through
or we ignore it completely
or we do some more naming shenanigans to get it to work
preferences?


p.gcLock.RLock()
defer p.gcLock.RUnlock()
if p.gc == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this before the introduction of our friend p.gcClosed()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gcClosed RLocks, and I dont think RWMutex is re-entrant (The docs say "It should not be used for recursive read locking"), so I would need to rewrite it as:

if p.gcClosed() { 		
	return hcs.ErrAlreadyClosed
}
p.gcLock.RLock()
defer p.gcLock.RUnlock()
// ...

But that allows a race condition where another process closes gc inbetween the check and the rpc call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note, I considered:

func (p *Process) gcClosed() bool {
	p.gcLock.RLock()
	defer p.gcLock.RUnlock()
	return gcclosed()	
}

// unsafe: requires holding p.gcLock
func (p *Process) gcclosed() bool { return p.gc == nil }

but that seems .... excessive

Comment on lines +413 to +420
if err == nil {
var resultJSON string
resultJSON, err = vmcompute.HcsModifyProcess(ctx, process.handle, string(b))
events := processHcsResult(ctx, resultJSON)
if err != nil {
err = makeProcessError(process, operation, err, events)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just so we always call process.stdin.Close() instead of early returning if the marshal fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it feels weird to give up on closing stdin on the host side just because the marshaling failed. Especially since the marshalling is an internal detail and the error is salvageable.

@helsaawy
Copy link
Contributor Author

helsaawy commented Aug 8, 2022

rebased to update CI pipeline

Updating cmd to wait on all relayIO operations, and using CloseStdin to
signal closure of the process.

`hcs.Process` would attempt to send `ModifyProcessRequest` to close
stdin even after process exits.
Was also inconsistent about returning error on already closed IO
handles.

Changed `gcs` Process to mirror `hcs.Process` and `jobcontainer.Process`
`.CloseStd*` now return errors if the process was already closed,
`.Close) zeros out the `gc` connection, and `.CloseStdin` both performs a
`CloseWrite` and closes the channel.

Signed-off-by: Hamza El-Saawy <[email protected]>
Bug with how stdin streams were closed for processes fixed.
removed ineffectual assignment to ctx in c.Wait()

Signed-off-by: Hamza El-Saawy <[email protected]>
// Use skipErr to ignore and not log errors that it returns true for: passing in
// `isIOChannelClosedErr` will ignore errors raised by reading from or writing to
// a closed Reader or Writer, respectively. It can be `nil`
func relayIO(w io.Writer, r io.Reader, log *logrus.Entry, name string, skipErr func(error) bool) (int64, error) {
Copy link

@katiewasnothere katiewasnothere Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kind of torn here. I wonder if we should just remove the logging in relayIO entirely and just log in the caller. Thoughts? That would avoid this weird "skipErr" func. In which case this would essentially just be io.Copy. What are the downsides?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly, none. I didnt want to remove since I didnt know the history, but it can easily be inlined into Cmd.Start()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say go for it then haha

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants