Skip to content

Add wait API endpoint for waiting on process exit#1525

Merged
stevvooe merged 1 commit intocontainerd:masterfrom
crosbymichael:shim-hang
Sep 22, 2017
Merged

Add wait API endpoint for waiting on process exit#1525
stevvooe merged 1 commit intocontainerd:masterfrom
crosbymichael:shim-hang

Conversation

@crosbymichael
Copy link
Copy Markdown
Member

@crosbymichael crosbymichael commented Sep 19, 2017

While running stress tests there would always be one or two shims that hang. This was nothing wrong in the shim and the shims were properly sending the exit events. However, the client would miss the event because of the small race where you connect to the event server and start a goroutine.

This adds a Wait endpoint that blocks until the process exits. If the process has already exited when you call Wait it will immediately return the exit status and timestamp.

This moves much of the wait logic from the client to the server to properly handle this.

Nothing changes from the client POV so updating to the new version will just work for existing clients.

Signed-off-by: Michael Crosby [email protected]

@mlaventure
Copy link
Copy Markdown
Contributor

mlaventure commented Sep 19, 2017

@crosbymichael you forgot to implement it for windows :trollface:

@crosbymichael
Copy link
Copy Markdown
Member Author

@mlaventure i know, waiting on feedback on the overall design

Comment thread linux/shim/service.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's wrong with log.G()?

Comment thread linux/shim/service.go Outdated
Copy link
Copy Markdown
Contributor

@mlaventure mlaventure Sep 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need for all those calls to be fully synchronous with each others?

@Random-Liu
Copy link
Copy Markdown
Member

Random-Liu commented Sep 19, 2017

the client would miss the event because of the small race where you connect to the event server and start a goroutine.

@crosbymichael Hm, how could this happen?
If I remember correctly, we subscribe event first, and then start container, right?
Is it because event may loss after subscription? Or subscribe is not synchronized operation, after subscribe, the event channel is not ready yet?

@crosbymichael
Copy link
Copy Markdown
Member Author

@Random-Liu its because of the go runtime and we cannot make sure the connection is blocking waiting for the event before calling start.

@Random-Liu
Copy link
Copy Markdown
Member

its because of the go runtime and we cannot make sure the connection is blocking waiting for the event before calling start.

Hm. If so, it seems that we have to implement this on the server side.

BTW, is there any golang or other issue for this behavior? I'd like to understand more.

@stevvooe
Copy link
Copy Markdown
Member

LGTM on the API

@crosbymichael
Copy link
Copy Markdown
Member Author

@mlaventure windows fixed

@codecov-io
Copy link
Copy Markdown

codecov-io commented Sep 19, 2017

Codecov Report

Merging #1525 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1525   +/-   ##
=======================================
  Coverage   42.61%   42.61%           
=======================================
  Files          24       24           
  Lines        3318     3318           
=======================================
  Hits         1414     1414           
  Misses       1581     1581           
  Partials      323      323

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f5dda6...d67763d. Read the comment docs.

@crosbymichael
Copy link
Copy Markdown
Member Author

rebased

@crosbymichael crosbymichael added this to the containerd 1.0.0 milestone Sep 21, 2017
@stevvooe
Copy link
Copy Markdown
Member

LGTM

1 similar comment
@dmcgowan
Copy link
Copy Markdown
Member

LGTM

@stevvooe stevvooe merged commit fe9d6a4 into containerd:master Sep 22, 2017
@crosbymichael crosbymichael deleted the shim-hang branch September 22, 2017 14:05
mauriciovasquezbernal pushed a commit to kinvolk/containerd that referenced this pull request Nov 13, 2020
…ndbox-container

Forcibly remove sandbox container
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants