-
Notifications
You must be signed in to change notification settings - Fork 3.8k
runtime: fix shutdown runc v2 service #4988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: IceberGu <[email protected]>
|
Hi @Iceber. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Build succeeded.
|
estesp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. For some historical reasons, the use of s.mu is a bit confusing, sometimes used to protect s.containers only, and sometimes used as a mutex for some functions. According to analysis of @corhere in containerd#9103, we can know that: Locking s.mu for Create() handler is introduced in containerd#1452, which was a solution for the missed early-exits problem: Create() holds the mutex to block checkProcesses() from handling exit events until after Create() has added the container process to the s.processes map. This locking logic was copied into the v2 shim implementation. containerd#8617 solves the same problem using a strategy that does not rely on mutual exclusion between container create and exit-event handling. As for Shutdown(), the implementation was added in containerd#3004. And in this initial implementation, the mutex is locked to safely access s.containers, it is not unlocked because nothing matters after os.Exit(0). Then ae87730 changed Shutdown() to return instead of exiting, followed by containerd#4988 to unlock upon return. If the intention is to block containers from being created while the shim's task service is shutting down, locking a mutex is a poor solution since it makes the create requests hang instead of returning an error, and is ineffective after Shutdown() returns as the mutex is unlocked. If Create() or handleProcessExit() enters again after Shutdown() unlocking s.mu and returning, shim service will panic by sending to a closed channel. So I remove the unnecessary lock guards in service, and rename mu to containersMu to make it clear that it is used to protect s.containers only. Fix: containerd#9103 Signed-off-by: Chen Yiyang <[email protected]>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. For some historical reasons, the use of s.mu is a bit confusing, sometimes used to protect s.containers only, and sometimes used as a mutex for some functions. According to analysis of @corhere in containerd#9103, we can know that: Locking s.mu for Create() handler is introduced in containerd#1452, which was a solution for the missed early-exits problem: Create() holds the mutex to block checkProcesses() from handling exit events until after Create() has added the container process to the s.processes map. This locking logic was copied into the v2 shim implementation. containerd#8617 solves the same problem using a strategy that does not rely on mutual exclusion between container create and exit-event handling. As for Shutdown(), the implementation was added in containerd#3004. And in this initial implementation, the mutex is locked to safely access s.containers, it is not unlocked because nothing matters after os.Exit(0). Then ae87730 changed Shutdown() to return instead of exiting, followed by containerd#4988 to unlock upon return. If the intention is to block containers from being created while the shim's task service is shutting down, locking a mutex is a poor solution since it makes the create requests hang instead of returning an error, and is ineffective after Shutdown() returns as the mutex is unlocked. If Create() or handleProcessExit() enters again after Shutdown() unlocking s.mu and returning, shim service will panic by sending to a closed channel. So I remove the unnecessary lock guards in service, and rename mu to containersMu to make it clear that it is used to protect s.containers only. Fix: containerd#9103 Signed-off-by: Chen Yiyang <[email protected]>
@crosbymichael
Even if the service is no longer in use after
Shutdown, I think it is important to unlockservice.muIf
os.Exitis called, it may be possible not to unlockservice.muhttps://github.com/containerd/containerd/pull/3204/files#diff-784b061d666f1368d1c3be590b14a7bdc559e199b0855349b993a395bbd2cdb5L573