runtime: fix shutdown runc v2 service #4988

Iceber · 2021-02-02T07:48:11Z

@crosbymichael
Even if the service is no longer in use after Shutdown, I think it is important to unlock service.mu

If os.Exit is called, it may be possible not to unlock service.mu
https://github.com/containerd/containerd/pull/3204/files#diff-784b061d666f1368d1c3be590b14a7bdc559e199b0855349b993a395bbd2cdb5L573

Signed-off-by: IceberGu <[email protected]>

k8s-ci-robot · 2021-02-02T07:48:19Z

Hi @Iceber. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

theopenlab-ci · 2021-02-02T07:58:21Z

Build succeeded.

containerd-build-arm64 : SUCCESS in 8m 38s (non-voting)

estesp

LGTM

@corhere

After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. For some historical reasons, the use of s.mu is a bit confusing, sometimes used to protect s.containers only, and sometimes used as a mutex for some functions. According to analysis of @corhere in containerd#9103, we can know that: Locking s.mu for Create() handler is introduced in containerd#1452, which was a solution for the missed early-exits problem: Create() holds the mutex to block checkProcesses() from handling exit events until after Create() has added the container process to the s.processes map. This locking logic was copied into the v2 shim implementation. containerd#8617 solves the same problem using a strategy that does not rely on mutual exclusion between container create and exit-event handling. As for Shutdown(), the implementation was added in containerd#3004. And in this initial implementation, the mutex is locked to safely access s.containers, it is not unlocked because nothing matters after os.Exit(0). Then ae87730 changed Shutdown() to return instead of exiting, followed by containerd#4988 to unlock upon return. If the intention is to block containers from being created while the shim's task service is shutting down, locking a mutex is a poor solution since it makes the create requests hang instead of returning an error, and is ineffective after Shutdown() returns as the mutex is unlocked. If Create() or handleProcessExit() enters again after Shutdown() unlocking s.mu and returning, shim service will panic by sending to a closed channel. So I remove the unnecessary lock guards in service, and rename mu to containersMu to make it clear that it is used to protect s.containers only. Fix: containerd#9103 Signed-off-by: Chen Yiyang <[email protected]>

@corhere

After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. For some historical reasons, the use of s.mu is a bit confusing, sometimes used to protect s.containers only, and sometimes used as a mutex for some functions. According to analysis of @corhere in containerd#9103, we can know that: Locking s.mu for Create() handler is introduced in containerd#1452, which was a solution for the missed early-exits problem: Create() holds the mutex to block checkProcesses() from handling exit events until after Create() has added the container process to the s.processes map. This locking logic was copied into the v2 shim implementation. containerd#8617 solves the same problem using a strategy that does not rely on mutual exclusion between container create and exit-event handling. As for Shutdown(), the implementation was added in containerd#3004. And in this initial implementation, the mutex is locked to safely access s.containers, it is not unlocked because nothing matters after os.Exit(0). Then ae87730 changed Shutdown() to return instead of exiting, followed by containerd#4988 to unlock upon return. If the intention is to block containers from being created while the shim's task service is shutting down, locking a mutex is a poor solution since it makes the create requests hang instead of returning an error, and is ineffective after Shutdown() returns as the mutex is unlocked. If Create() or handleProcessExit() enters again after Shutdown() unlocking s.mu and returning, shim service will panic by sending to a closed channel. So I remove the unnecessary lock guards in service, and rename mu to containersMu to make it clear that it is used to protect s.containers only. Fix: containerd#9103 Signed-off-by: Chen Yiyang <[email protected]>

runtime: fix shutdown runc v2 service

b458583

Signed-off-by: IceberGu <[email protected]>

k8s-ci-robot added the needs-ok-to-test label Feb 2, 2021

AkihiroSuda approved these changes Feb 2, 2021

View reviewed changes

AkihiroSuda added cherry-pick/1.3.x labels Feb 2, 2021

estesp approved these changes Feb 2, 2021

View reviewed changes

estesp merged commit f07e181 into containerd:master Feb 2, 2021

Iceber deleted the fix-runc-v2-service branch February 2, 2021 14:57

This was referenced Feb 3, 2021

[release/1.3] runtime: fix shutdown runc v2 service #4992

Merged

[release/1.4] runtime: fix shutdown runc v2 service #4993

Merged

thaJeztah added cherry-picked/1.3.x cherry-picked/1.4.x PR commits are cherry picked into the release/1.4 branch and removed cherry-pick/1.3.x labels Feb 4, 2021

corhere mentioned this pull request Sep 27, 2023

containerd-shim-runc-v2: avoid potential deadlock in create handler #9104

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

runtime: fix shutdown runc v2 service #4988

runtime: fix shutdown runc v2 service #4988

Uh oh!

Iceber commented Feb 2, 2021 •

edited

Loading

Uh oh!

k8s-ci-robot commented Feb 2, 2021

Uh oh!

theopenlab-ci bot commented Feb 2, 2021

Uh oh!

estesp left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

runtime: fix shutdown runc v2 service #4988

runtime: fix shutdown runc v2 service #4988

Uh oh!

Conversation

Iceber commented Feb 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Feb 2, 2021

Uh oh!

theopenlab-ci bot commented Feb 2, 2021

Uh oh!

estesp left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Iceber commented Feb 2, 2021 •

edited

Loading