Skip to content

Fix checkpoint's exiting semantics.#37360

Merged
thaJeztah merged 1 commit intomoby:masterfrom
bjbroder:checkpoint-exit
Jul 26, 2018
Merged

Fix checkpoint's exiting semantics.#37360
thaJeztah merged 1 commit intomoby:masterfrom
bjbroder:checkpoint-exit

Conversation

@bjbroder
Copy link
Copy Markdown
Contributor

Previously, dockerd would always ask containerd to pass --leave-running
to runc/runsc, ignoring the exit boolean value. Hence, even docker checkpoint create --leave-running=false ... would not stop the
container.

Signed-off-by: Brielle Broder [email protected]

@hugelgupf
Copy link
Copy Markdown

cc @hugelgupf @nlacasse

@hugelgupf
Copy link
Copy Markdown

hugelgupf commented Jun 28, 2018

janky seems to be failing for no good reason?

01:07:11 W: Failed to fetch http://cdn-fastly.deb.debian.org/debian/dists/stretch/InRelease  Could not resolve 'cdn-fastly.deb.debian.org'

@hugelgupf
Copy link
Copy Markdown

friendly ping

@hugelgupf
Copy link
Copy Markdown

cc @cpuguy83 maybe?

@cpuguy83
Copy link
Copy Markdown
Member

cpuguy83 commented Jul 9, 2018

Can we add an integration test for this?

@hugelgupf
Copy link
Copy Markdown

@cpuguy83 I'd like to, but we've actually had trouble getting this to work with runc (we're using runsc instead). Are there existing integration tests somewhere that we can build off of?

@cpuguy83
Copy link
Copy Markdown
Member

cpuguy83 commented Jul 9, 2018

Is this the error?

docker-runc did not terminate sucessfully: CRIU version check failed: write unixpacket @->@: write: broken pipe path= /var/run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/56affd8f7a2433caefd9e787c4248f6064df70ea774ad9c30ce6ad16ef4efd23/criu-dump.log: unknown error_type="*errors.errorString" module=api

@cpuguy83
Copy link
Copy Markdown
Member

cpuguy83 commented Jul 9, 2018

ping @kolyshkin

@avagin
Copy link
Copy Markdown
Contributor

avagin commented Jul 9, 2018

Actually, there is one more problem. Docker (containerd) has to call "runc checkpoint" with "--empty-ns network" to dump a docker container.

I have a patch which fixes both these issues:

avagin/docker-ce@911c89f#diff-e1f8834158a9117cafe1744dc2c7adb2

But it depends on containerd changes:
containerd/containerd#2425

@kolyshkin
Copy link
Copy Markdown
Contributor

I like @avagin approach better (i.e. appending to options if necessary).

@hugelgupf
Copy link
Copy Markdown

hugelgupf commented Jul 10, 2018

Sure, we can append to options.

Having to call with an empty NS is specific to runc. runsc does not need to be called with that. Can this be part of something runc-specific, or can this be done in an API-agnostic way?

@avagin
Copy link
Copy Markdown
Contributor

avagin commented Jul 10, 2018

@hugelgupf Here are changes which resolve the issue with an empty NS:
opencontainers/runc#1840

@codecov
Copy link
Copy Markdown

codecov Bot commented Jul 10, 2018

Codecov Report

Merging #37360 into master will decrease coverage by 0.02%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master   #37360      +/-   ##
==========================================
- Coverage   34.95%   34.92%   -0.03%     
==========================================
  Files         610      610              
  Lines       44886    44873      -13     
==========================================
- Hits        15690    15673      -17     
- Misses      27077    27081       +4     
  Partials     2119     2119

@bjbroder
Copy link
Copy Markdown
Contributor Author

@kolyshkin @avagin cpuguy wanted an integration test, but it seems like there are none for moby checkpoint/restore already? Is there anything y'all have we can contribute to?

@hugelgupf
Copy link
Copy Markdown

Ping

@avagin
Copy link
Copy Markdown
Contributor

avagin commented Jul 19, 2018

LGTM

@bjbroder
Copy link
Copy Markdown
Contributor Author

Now that it's lgtm'd, how do I get this merged?

Copy link
Copy Markdown
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐸
cc @mlaventure @thaJeztah

Previously, dockerd would always ask containerd to pass --leave-running
to runc/runsc, ignoring the exit boolean value. Hence, even `docker
checkpoint create --leave-running=false ...` would not stop the
container.

Signed-off-by: Brielle Broder <[email protected]>
Copy link
Copy Markdown
Contributor

@mlaventure mlaventure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

z failure is unrelated

@tswift242
Copy link
Copy Markdown
Contributor

@bjbroder This restores the behavior of making --leave-running configurable, but is the default still true? The default used to be false.

@bjbroder
Copy link
Copy Markdown
Contributor Author

bjbroder commented Jul 27, 2018

@tswift242 The default for --leave-running was, and still is, false. Previously, both the default (false) and an explicit set of the flag to false would cause the process to incorrectly continue running after checkpointing because "exit" had not been implemented appropriately, even though the leave-running flag is correctly set.

The leave-running flag's behavior can be seen in:
docker-ce/components/cli/cli/command/checkpoint/create.go

@tswift242
Copy link
Copy Markdown
Contributor

Gotcha. Thanks for the reply and the fix to this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants