runtime v2: Close platform in runc shim's Shutdown method.#3896
runtime v2: Close platform in runc shim's Shutdown method.#3896crosbymichael merged 1 commit intocontainerd:masterfrom
Conversation
|
Build succeeded.
|
|
Hmm, I'm sure you are definitely correcting an important problem, but looks like we will have to figure out what criu doesn't like about this flow of operations: |
|
@estesp Just to understand the tests a bit better, it looks like those same failures occurred in both the run with |
|
@estesp I looked into the failure a bit more, I am able to reproduce it locally when running against containerd master w/out the changes in this PR. I'm not sure if there's an easier way, but I reproduced by starting a go docker container: and then running the equivalent of what travis does AFAICT: Then when I run the integ tests, I get the same CRIU test failures There are a few extra errors (perhaps because I didn't replicate the travis env perfectly or container-in-container issues), but those CRIU errors are consistent and look to be the same as what this PR is getting. It's very strange though because I can see recent builds of containerd's master branch all succeeded... Any ideas? Is it possible it's the failures are unrelated to the code in this PR? |
33ad230 to
51f941e
Compare
|
Build succeeded.
|
|
Tried an empty force-push to re-run the tests, but got the exact same results. One other thing I just noticed that may or may not be relevant, the most recent build of containerd's master that succeeded was using a travis worker with kernel version When I reproduced the criu test failures locally on master branch, it was on a machine with kernel Is it possible for the containerd maintainers to retry builds of master in travis to see if a travis worker kernel upgrade could be interfering with criu tests? |
|
@sipsma you are correct; it is a timing issue with you happening to catch a Travis kernel upgrade that no other PR or branch run has been on! A draft PR I just created hit the issue without any code changes. Now testing against a later criu build in case there are fixes for a 5.0 kernel issue. |
Previously, the platform was closed as part of the Delete method when the process was an init for a task and there were no more tasks after its deletion. This can create problems if another task is created within the shim right after the delete runs, which results in the platform being closed but the shim continuing to run. This change moves closing the platform to the Shutdown method after the shim's context is canceled, which ensures the platform is only closed once the shim is sure its done servicing containers. Signed-off-by: Erik Sipsma <[email protected]>
51f941e to
fbd46d7
Compare
|
Build succeeded.
|
Codecov Report
@@ Coverage Diff @@
## master #3896 +/- ##
==========================================
+ Coverage 38% 42.47% +4.46%
==========================================
Files 88 130 +42
Lines 12326 14716 +2390
==========================================
+ Hits 4685 6251 +1566
- Misses 6994 7544 +550
- Partials 647 921 +274
Continue to review full report at Codecov.
|
|
LGTM |
|
Cherry picked to |
Previously, the platform was closed as part of the Delete method when the
process was an init for a task and there were no more tasks after its deletion.
This can create problems if another task is created within the shim right after
the delete runs, which results in the platform being closed but the shim
continuing to run.
This change moves closing the platform to the Shutdown method after the shim's
context is canceled, which ensures the platform is only closed once the shim
is sure its done servicing containers.
Signed-off-by: Erik Sipsma [email protected]
Fixes #3895