-
Notifications
You must be signed in to change notification settings - Fork 275
internal/cmd IO-related bugs #1296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| if err == nil { | ||
| err = ioErr | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe wrap the errors together instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as in?
if err != nil {
err := fmt.Errorf("multiple errors during IO copy: %w; %w", err, cerr)
}146556a to
74c5101
Compare
| tr := time.NewTimer(250 * time.Millisecond) // give the cmd a chance to finish running | ||
| defer tr.Stop() | ||
| select { | ||
| case err := <-done: | ||
| if err != nil { | ||
| t.Fatalf("cmd run failed: %v", err) | ||
| } | ||
| t.Fatal("command should have blocked indefinitely") | ||
| case <-tr.C: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of this, you could just check that you can still write to stdout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the process itself, or in the test function itself?
| // Even if `stdin` is closed, the runtime can block indefinitely on reading | ||
| // c.Stdin, so the only reliable way to unblock this is with: | ||
| // c.Stdin.CloseWrite() (if it implements it) or c.Stdin.Close(). | ||
| // However, we are only passed on the Reader end of Stdin, and closing the | ||
| // upstream c.Stdin breaks with the functionality that os.exec.Cmd implements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this comment trying to justify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It expands on the original comment with more info, and gives context as to what is happening.
I didnt change anything here, so no justification.
|
To be clear, with this PR, we'd never log |
For the upstream and process |
|
reabase to fix merge conflicts, remove |
| if serr := p.stderr.Close(); serr != nil { | ||
| log.G(ctx).WithError(serr).Warn("close stderr failed") | ||
| if err == nil { | ||
| err = serr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't actually do anything with these assignments right? After all of these close, whether successful or not, we just return nil on line 173.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is it some weird Go named return value shenanigans that I'm forgetting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use it to set the span status, but I dont know if we should even bother to return stdIO close errors (or log it in the span).
hcs\process.Close just ignores stdIO errors.
We probably should standardize how we do closes ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, for the span okay. And yes agreed. It's frustrating to go look at how each type of container we support does something, and they all have their own little quirks or way of handling things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the function returns though, that value gets set to the "err" value of the function. So then the span would still be getting set to nil error. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right, i missed that ...
I guess either we return an errors for std IO close, even though we chug through
or we ignore it completely
or we do some more naming shenanigans to get it to work
preferences?
|
|
||
| p.gcLock.RLock() | ||
| defer p.gcLock.RUnlock() | ||
| if p.gc == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this before the introduction of our friend p.gcClosed()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gcClosed RLocks, and I dont think RWMutex is re-entrant (The docs say "It should not be used for recursive read locking"), so I would need to rewrite it as:
if p.gcClosed() {
return hcs.ErrAlreadyClosed
}
p.gcLock.RLock()
defer p.gcLock.RUnlock()
// ...But that allows a race condition where another process closes gc inbetween the check and the rpc call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a side note, I considered:
func (p *Process) gcClosed() bool {
p.gcLock.RLock()
defer p.gcLock.RUnlock()
return gcclosed()
}
// unsafe: requires holding p.gcLock
func (p *Process) gcclosed() bool { return p.gc == nil }but that seems .... excessive
| if err == nil { | ||
| var resultJSON string | ||
| resultJSON, err = vmcompute.HcsModifyProcess(ctx, process.handle, string(b)) | ||
| events := processHcsResult(ctx, resultJSON) | ||
| if err != nil { | ||
| err = makeProcessError(process, operation, err, events) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just so we always call process.stdin.Close() instead of early returning if the marshal fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it feels weird to give up on closing stdin on the host side just because the marshaling failed. Especially since the marshalling is an internal detail and the error is salvageable.
|
rebased to update CI pipeline |
Updating cmd to wait on all relayIO operations, and using CloseStdin to signal closure of the process. `hcs.Process` would attempt to send `ModifyProcessRequest` to close stdin even after process exits. Was also inconsistent about returning error on already closed IO handles. Changed `gcs` Process to mirror `hcs.Process` and `jobcontainer.Process` `.CloseStd*` now return errors if the process was already closed, `.Close) zeros out the `gc` connection, and `.CloseStdin` both performs a `CloseWrite` and closes the channel. Signed-off-by: Hamza El-Saawy <[email protected]>
Bug with how stdin streams were closed for processes fixed. removed ineffectual assignment to ctx in c.Wait() Signed-off-by: Hamza El-Saawy <[email protected]>
| // Use skipErr to ignore and not log errors that it returns true for: passing in | ||
| // `isIOChannelClosedErr` will ignore errors raised by reading from or writing to | ||
| // a closed Reader or Writer, respectively. It can be `nil` | ||
| func relayIO(w io.Writer, r io.Reader, log *logrus.Entry, name string, skipErr func(error) bool) (int64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm kind of torn here. I wonder if we should just remove the logging in relayIO entirely and just log in the caller. Thoughts? That would avoid this weird "skipErr" func. In which case this would essentially just be io.Copy. What are the downsides?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honestly, none. I didnt want to remove since I didnt know the history, but it can easily be inlined into Cmd.Start()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I say go for it then haha
Currently, when exec-ing into a WCOW container with a
stdinpipe via ctricl (or the shim via shimdiag exec)internal/cmd:Cmd.Wait()closes the underlying process and returns tointernal/cmd:ExecInUvm()without waiting on or closing the go routine that copies data from the upstreamstdinpipe (which is exposed to containerd) to the process'sstdin(viarelayIO) in the background.ExecInUvmthen closes the upstream std IO pipes. (Note: this is different fromexec.Cmd, which does not finishWaiting untilstdinis either closed or yields anEOFon a read.)This yields several errors, one from the
relayIOfunction:And one from the background go routines, when they attempt to close their respective IO pipes:
With the error on hypervisor-isolated containers:
And on process-isolated containers:
This PR solves several issues that raise those errors.
Currently, if the exec exits without the caller first closing
stdin, the IO relay forstdinis guaranteed to raise thefile has already been closederror, which is raised bywiniowhen attempting to read from a closed pipe.internal/cmd:relayIOwas modified to ignore file closed errors from theio.Copyonstdin, but not fromstdoutandstderr, since those closing are likely anomalous.Since there is no guarantee as the ordering between
cmd.Waitcallingprocess.Closeand therelayIOgo routines callingprocess.CloseStdin/out/err, the close IO operations now ignorefile has already been closederrors.hcs.ErrAlreadyClosedfromprocess.Closeare now also ignored, ascmd.Wait()may call it twice.internal/gcs:Processwas updated to matchinternal/hcs:Processandinternal/jobcontainers:Process:.CloseStdin/out/errfunctions returnhcs.ErrAlreadyClosedif the process was already closed;.CloseStdinfirst callsCloseWriteto send anEOF, and then closes the IO channel; and.Closezeros out itsgcconnection to prevent future accidental uses, and protectsgcwith aRWLockso closing does not occur while other operations that require the guest connection are executing.For
internal/hcs:Process, ifprocess.CloseStdinis called after the process exits but before it is closed, it sends anHcsModifyProcessrequest, which errors:Now it checks if the process exited before sending the close handle request.
Additionally, it retuned
hcs.ErrAlreadyClosedif theprocess.CloseStdinwas called on a closed process, but did not do the same forCloseStdoutandCloseStderr. The latter two were updated to matchCloseStdinandjobcontainers:Process.Finally, a flag was added to
internal/cmd:Cmdto close the upstreamStdinIO channel after the process exits but beforeCmd.Wait()waits on the IO operations. This allows callers to close the upstreamstdinand guarantee that thestdinio.Copyoperations end safely and properly after the process exits.Note: it is impossible to guarantee interrupting the
io.Copyoperation without closing the upstream pipe, as it can block indefinitely on the syscall to read from the upstreamstdinpipe. Therefore, only closing or sending an EOF (via.CloseWrite(), which is not available on the default, non-message basedwinioPipes) will end the go routine copying data from that pipe.Tests were updated to account for new functionality.
Signed-off-by: Hamza El-Saawy [email protected]