Use ttrpc OnClose to fix restored dead shims#2109
Use ttrpc OnClose to fix restored dead shims#2109stevvooe merged 3 commits intocontainerd:masterfrom
Conversation
Contains the OnClose method for the client Signed-off-by: Michael Crosby <[email protected]>
When restoring a task make sure that dead shims are handled. Fixes containerd#2078 Signed-off-by: Michael Crosby <[email protected]>
|
Verified this manually |
Codecov Report
@@ Coverage Diff @@
## master #2109 +/- ##
=======================================
Coverage 45.29% 45.29%
=======================================
Files 95 95
Lines 9461 9461
=======================================
Hits 4285 4285
Misses 4464 4464
Partials 712 712
Continue to review full report at Codecov.
|
| return nil, nil, errors.Wrap(err, "failed to set OOM Score on shim") | ||
| } | ||
| c, clo, err := WithConnect(address)(ctx, config) | ||
| c, clo, err := WithConnect(address, func() {})(ctx, config) |
There was a problem hiding this comment.
could we just pass nil instead? I feel it'd be more natural and expected
There was a problem hiding this comment.
nil would cause a panic. no-ops work well here
There was a problem hiding this comment.
It's just a matter of checking it before setting it though.
I just think it'd be more idiomatic.
| "id": id, | ||
| "namespace": ns, | ||
| }).Error("connecting to shim") | ||
| err := r.cleanupAfterDeadShim(ctx, bundle, ns, id, pid) |
There was a problem hiding this comment.
missing a task.Delete() here.
If it's not done, re-running the same container (with the same name) will fail until containerd is restarted (at which point it'll delete the task when the plugin loads)
There was a problem hiding this comment.
but we don't have a task at this point. Shouldn't r.terminate do the same thing?
There was a problem hiding this comment.
That why I moved the task adding in my orig PR.
The task.Delete() call is after the call to cleanupAfterDeadShim if it doesn't error out. I think it is an issue now, because task with a dead shim can no longer be listed as when a list of task is requested we now try to get its state from the shim. The original behavior left it to the user to invoke the Delete call, but that's no longer possible.
It may be better to automatically delete the task in r.terminate. But the proper event would need to be generated too (realized that wasn't the case in my orig PR either).
|
@mlaventure I added another commit to address this. let me know what you think |
mlaventure
left a comment
There was a problem hiding this comment.
Ok, this should work.
I've tested it locally too and it works with the latest change.
Would have to design a regression test for it though. can do it in a follow up PR sometime this week I think.
|
oh, it doesn't build on windows :) |
Signed-off-by: Michael Crosby <[email protected]>
17bea95 to
95d4f53
Compare
|
Fixed |
|
LGTM |
Closes #2083
Fixes #2078
This is an alternative to #2083 that does not have a blocking wait