-
Notifications
You must be signed in to change notification settings - Fork 2.2k
libcontainer/container_linux: Consider process state (running, zombie, etc.) in runType #1489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
So we can extract more than the start time with a single read. Signed-off-by: W. Trevor King <[email protected]>
107bc2c to
407e33a
Compare
|
Hooray, Travis is happy :). Can someone more familiar with the checkpoint tooling comment on whether the appropriate post-checkpoint state is |
|
@wking Thank you for the patch. I looked at it without details and it looks good for me. I'm going to review it carefully a bit later.
I am not sure that I understand what do you mean here. Could you elaborate? |
|
My reproducer for the problem from is still working. You need to stop the init process, I use gdb for that, and then start tests. |
|
Oops, I forgot to compile runc. Actually everything works as expected with this patch. |
|
@wking you changes in checkpoint_test.go looks correct.
container.Checkpoint(preDumpOpts) doesn't stop a container, this call enables memory tracker and dumps memory for processes without stopping them. Then we call container.Checkpoint() in a second time and this call stops processes and dumps them. The big part of memory is already dumped, so we have to dump only changed memory. This optimization allows us to reduce a downtime of a container, when it is migrated from one host to another host: https://criu.org/Iterative_migration |
And convert the various start-time properties from strings to uint64s. This removes all internal consumers of the deprecated GetProcessStartTime function. Signed-off-by: W. Trevor King <[email protected]>
And Stat_t.PID and Stat_t.Name while we're at it. Then use the new .State property in runType to distinguish between running and zombie/dead processes, since kill(2) does not [1]. With this change we no longer claim Running status for zombie/dead processes. I've also removed the kill(2) call from runType. It was originally added in 13841ef (new-api: return the Running state only if the init process is alive, 2014-12-23), but we've been accessing /proc/[pid]/stat since 14e95b2 (Make state detection precise, 2016-07-05, opencontainers#930), and with the /stat access the kill(2) check is redundant. I also don't see much point to the previously-separate doesInitProcessExist, so I've inlined that logic in runType. It would be nice to distinguish between "/proc/[pid]/stat doesn't exist" and errors parsing its contents, but I've skipped that for the moment. The Running -> Stopped change in checkpoint_test.go is because the post-checkpoint process is a zombie, and with this commit zombie processes are Stopped (and no longer Running). [1]: opencontainers#1483 (comment) Signed-off-by: W. Trevor King <[email protected]>
| exist, err := c.doesInitProcessExist(pid) | ||
| if !exist || err != nil { | ||
| return Stopped, err | ||
| if stat.StartTime != c.initProcessStartTime || stat.State == system.Zombie || stat.State == system.Dead { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a test with the state set as a Zombie or Dead to make sure that it gets picked off? Apologies if I missed it, and not a hold up for this PR, suggestion for later if missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a test with the state set as a Zombie or Dead to make sure that it gets picked off?
I'm not sure what you mean. Can you give more detail on the case you feel is not covered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a test with the state set as a Zombie or Dead to make sure that it gets picked off?
Ah, I think you mean a unit test where we run this on a dead/zombie process. The checkpoint_test.go change is doing that already, although there's too much going on there for it to be a unit test. If you have another way to manufacture a zombie process for testing, I'm happy to add a more focused test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was indeed thinking of a unit test, trying to ensure more code coverage. I unfortunately don't have a good way to emulate a zombie/dead process. I was just thinking of faking it out in a test to change the state of the process. I didn't see that happening for Dead/Zombie in the in the checkpoint_tests.go in particular, only that the process wasn't Stopped. Anywho, no need to address now, especially with this PR, just a point to mull.
|
LGTM, shoulda done this when I commented earlier today. |
As @avagin pointed out, we're currently failing to distinguish between running (stopped, disk-sleeping, etc.) and dead/zombie processes when determining the container state. This PR rerolls our runType approach to make that distinction.
Now that we need two values from
/proc/[pid]/stat, I've rerolledlibcontainer/system/procto use aStat_tobject (like thesys/x/unixStat_tAPI). Then we can get both the start-time and process state from the same filesystem read.Additional details in the commit messages for the curious.