Skip to content

Conversation

@Iceber
Copy link
Member

@Iceber Iceber commented Feb 2, 2021

@crosbymichael
Even if the service is no longer in use after Shutdown, I think it is important to unlock service.mu

If os.Exit is called, it may be possible not to unlock service.mu
https://github.com/containerd/containerd/pull/3204/files#diff-784b061d666f1368d1c3be590b14a7bdc559e199b0855349b993a395bbd2cdb5L573

@k8s-ci-robot
Copy link

Hi @Iceber. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Feb 2, 2021

Build succeeded.

Copy link
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@estesp estesp merged commit f07e181 into containerd:master Feb 2, 2021
@Iceber Iceber deleted the fix-runc-v2-service branch February 2, 2021 14:57
@thaJeztah thaJeztah added cherry-picked/1.3.x cherry-picked/1.4.x PR commits are cherry picked into the release/1.4 branch and removed cherry-pick/1.3.x labels Feb 4, 2021
cyyzero added a commit to cyyzero/containerd that referenced this pull request Oct 1, 2023
After pr containerd#8617, create handler of containerd-shim-runc-v2 will
call handleStarted() to record the init process and handle its exit.
Init process wouldn't quit so early in normal circumstances. But if
this screnario occurs, handleStarted() will call
handleProcessExit(), which will cause deadlock because create() had
acquired s.mu, and handleProcessExit() will try to lock it again.

For some historical reasons, the use of s.mu is a bit confusing,
sometimes used to protect s.containers only, and sometimes used as a
mutex for some functions.

According to analysis of @corhere in containerd#9103, we can know that:

Locking s.mu for Create() handler is introduced in containerd#1452, which was a
solution for the missed early-exits problem: Create() holds the mutex to
block checkProcesses() from handling exit events until after Create()
has added the container process to the s.processes map. This locking
logic was copied into the v2 shim implementation. containerd#8617 solves the same
problem using a strategy that does not rely on mutual exclusion between
container create and exit-event handling.

As for Shutdown(), the implementation was added in containerd#3004. And in this
initial implementation, the mutex is locked to safely access
s.containers, it is not unlocked because nothing matters after
os.Exit(0). Then ae87730 changed Shutdown() to return instead of
exiting, followed by containerd#4988 to unlock upon return. If the intention is
to block containers from being created while the shim's task service is
shutting down, locking a mutex is a poor solution since it makes the
create requests hang instead of returning an error, and is ineffective
after Shutdown() returns as the mutex is unlocked.

If Create() or handleProcessExit() enters again after Shutdown()
unlocking s.mu and returning, shim service will panic by sending to
a closed channel.

So I remove the unnecessary lock guards in service, and rename mu to
containersMu to make it clear that it is used to protect s.containers
only.

Fix: containerd#9103
Signed-off-by: Chen Yiyang <[email protected]>
cyyzero added a commit to cyyzero/containerd that referenced this pull request Oct 1, 2023
After pr containerd#8617, create handler of containerd-shim-runc-v2 will
call handleStarted() to record the init process and handle its exit.
Init process wouldn't quit so early in normal circumstances. But if
this screnario occurs, handleStarted() will call
handleProcessExit(), which will cause deadlock because create() had
acquired s.mu, and handleProcessExit() will try to lock it again.

For some historical reasons, the use of s.mu is a bit confusing,
sometimes used to protect s.containers only, and sometimes used as a
mutex for some functions.

According to analysis of @corhere in containerd#9103, we can know that:

Locking s.mu for Create() handler is introduced in containerd#1452, which was a
solution for the missed early-exits problem: Create() holds the mutex to
block checkProcesses() from handling exit events until after Create()
has added the container process to the s.processes map. This locking
logic was copied into the v2 shim implementation. containerd#8617 solves the same
problem using a strategy that does not rely on mutual exclusion between
container create and exit-event handling.

As for Shutdown(), the implementation was added in containerd#3004. And in this
initial implementation, the mutex is locked to safely access
s.containers, it is not unlocked because nothing matters after
os.Exit(0). Then ae87730 changed Shutdown() to return instead of
exiting, followed by containerd#4988 to unlock upon return. If the intention is
to block containers from being created while the shim's task service is
shutting down, locking a mutex is a poor solution since it makes the
create requests hang instead of returning an error, and is ineffective
after Shutdown() returns as the mutex is unlocked.

If Create() or handleProcessExit() enters again after Shutdown()
unlocking s.mu and returning, shim service will panic by sending to
a closed channel.

So I remove the unnecessary lock guards in service, and rename mu to
containersMu to make it clear that it is used to protect s.containers
only.

Fix: containerd#9103
Signed-off-by: Chen Yiyang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked/1.4.x PR commits are cherry picked into the release/1.4 branch needs-ok-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants