Skip to content

Commit 1ac97c2

Browse files
committed
*: properly shutdown non-groupable shims to prevent resource leaks
Previously, to address issue containerd#11708, PR containerd#11793 changed containerd to always invoke the shim binary to establish shim connections, rather than reusing the sandbox shim. However, this change did not ensure that the Shutdown API was called to stop the shim process. Starting with containerd v2.0.0, the Shutdown API is only invoked for sandbox containers (when container.SandboxID is empty). This approach works for groupable shims, where multiple containers share a single socket address and only require a single Shutdown call. However, for non-groupable shims, each container requires its own Shutdown call during cleanup to avoid leaking shim processes. Additionally, PR containerd#11793 introduced a corner case during upgrades: - T1: An old container-shim-runc-v2 (<=v1.7.X) is running for pod A. - T2: containerd is upgraded to v2.X.Y. - T3: A new container A-C1 is created in pod A using the new shim-runc-v2 binary. - T4: bootstrap.json indicates version:3 protocol, but it is downgraded to version:2 in memory. - T5: containerd is restarted. - T6: containerd fails to connect to A-C1. - T7: The A-C1 container is left in EXITED status in the CRI plugin. To address this, ensure that loadShimTask downgrades to version:2 if necessary, and always invoke the Shutdown API for each non-groupable shim during cleanup to prevent resource leaks and handle upgrade scenarios correctly. (Introduced by containerd#11793) Signed-off-by: Wei Fu <[email protected]>
1 parent f6724ac commit 1ac97c2

4 files changed

Lines changed: 220 additions & 26 deletions

File tree

core/runtime/v2/shim.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -556,6 +556,11 @@ func (s *shimTask) delete(ctx context.Context, sandboxed bool, removeTask func(c
556556
removeTask(ctx, s.ID())
557557
}
558558

559+
const supportSandboxAPIVersion = 3
560+
if _, apiVer := s.ShimInstance.Endpoint(); apiVer < supportSandboxAPIVersion {
561+
sandboxed = false
562+
}
563+
559564
// Don't shutdown sandbox as there may be other containers running.
560565
// Let controller decide when to shutdown.
561566
if !sandboxed {

core/runtime/v2/shim_load.go

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,26 @@ func loadShimTask(ctx context.Context, bundle *Bundle, onClose func()) (_ *shimT
205205
defer cancel()
206206

207207
if _, err := s.PID(ctx); err != nil {
208-
return nil, err
208+
if !errdefs.IsNotImplemented(err) {
209+
return nil, err
210+
}
211+
212+
downgrader, ok := shim.(clientVersionDowngrader)
213+
if ok {
214+
if derr := downgrader.Downgrade(); derr == nil {
215+
log.G(ctx).WithError(err).WithField("id", shim.ID()).
216+
Warning("failed to call task.PID, downgrading client API version to try again")
217+
218+
s, err = newShimTask(shim)
219+
if err != nil {
220+
return nil, fmt.Errorf("failed to create shim task after downgrading: %w", err)
221+
}
222+
_, err = s.PID(ctx)
223+
}
224+
}
225+
if err != nil {
226+
return nil, err
227+
}
209228
}
210229
return s, nil
211230
}

integration/issue10467_linux_test.go

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,8 @@ func TestIssue10467(t *testing.T) {
6666
})
6767

6868
t.Log("Prepare pods for current release")
69-
upgradeCaseFunc, hookFunc := shouldManipulateContainersInPodAfterUpgrade("")(t, 2, previousProc.criRuntimeService(t), previousProc.criImageService(t))
69+
upgradeCaseFuncs, hookFunc := shouldManipulateContainersInPodAfterUpgrade("")(t, 2, previousProc.criRuntimeService(t), previousProc.criImageService(t))
70+
upgradeCaseFunc := upgradeCaseFuncs[0]
7071
needToCleanup = false
7172
require.Nil(t, hookFunc)
7273

0 commit comments

Comments
 (0)