-
Notifications
You must be signed in to change notification settings - Fork 3.8k
io.containerd.runc.v2 with Container Groups #3004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for working on this! |
|
some lint fails |
02b08f0 to
047772f
Compare
runtime/v2/runc/container.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The container.go consumes the taskAPI in the package. How about we handle the notfound error in this Process function? I think it is part of the Process function. If we separate it, the caller must check the value before use.
7b5d7d6 to
f74d71a
Compare
Codecov Report
@@ Coverage Diff @@
## master #3004 +/- ##
==========================================
- Coverage 43.96% 43.87% -0.09%
==========================================
Files 102 102
Lines 10881 10903 +22
==========================================
Hits 4784 4784
- Misses 5362 5384 +22
Partials 735 735
Continue to review full report at Codecov.
|
|
Out of curiosity how does The Kubernetes model requires two annotations one to describe the |
|
|
f74d71a to
2001104
Compare
|
@jterry75 there is no need to know the namespace owner, it just launches the grouped containers under the same shim. It does not mess with namespaces |
|
Oh I see. So they may "run in the same shim" but not necessarily in the same namespace (pid, network, etc). This is really just a feature to allow grouping tasks into a single shim even if the task are unrelated. I like it! |
|
Yes, that is correct. Containers in a group can share the same lifecycle and we don't need multiple shims running to service these containers. So we save on cpu and memory for these containers. |
|
Any more comments or questions on this? |
|
How long will we continue to support shim v2 runc v1 (and shim v1)? |
Signed-off-by: Michael Crosby <[email protected]>
2001104 to
8a56943
Compare
|
@AkihiroSuda addressed your feedback. I don't see an issue supporting v1 for a while. |
8a56943 to
8d5e8e5
Compare
Signed-off-by: Michael Crosby <[email protected]>
8d5e8e5 to
84a2471
Compare
dmcgowan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@Random-Liu @crosbymichael That being said, I think that's great as it will help the whole container community to move to v2 as the de facto, right? |
|
@sboeuf Per-pod shim is not required for runc, but still good to have. It should save a bunch of go runtime memories. :) |
|
@Random-Liu agreed but it is bringing the regular container stack very close to what we have with Kata Containers, and that's great to maintain good compatibility with the stack. And btw I had missed the implementation of |
|
@sboeuf it happened at the very start of the shim v2 work ;) |
|
@crosbymichael oh okay I completely missed that... Well this brings me to another question, do you expect that runc-v2 will become the new standard at some point? |
|
@crosbymichael all you should care about is the v2 shim api, nothing else |
|
@crosbymichael That's good, that's all I wanted to hear 👍 |
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. For some historical reasons, the use of s.mu is a bit confusing, sometimes used to protect s.containers only, and sometimes used as a mutex for some functions. According to analysis of @corhere in containerd#9103, we can know that: Locking s.mu for Create() handler is introduced in containerd#1452, which was a solution for the missed early-exits problem: Create() holds the mutex to block checkProcesses() from handling exit events until after Create() has added the container process to the s.processes map. This locking logic was copied into the v2 shim implementation. containerd#8617 solves the same problem using a strategy that does not rely on mutual exclusion between container create and exit-event handling. As for Shutdown(), the implementation was added in containerd#3004. And in this initial implementation, the mutex is locked to safely access s.containers, it is not unlocked because nothing matters after os.Exit(0). Then ae87730 changed Shutdown() to return instead of exiting, followed by containerd#4988 to unlock upon return. If the intention is to block containers from being created while the shim's task service is shutting down, locking a mutex is a poor solution since it makes the create requests hang instead of returning an error, and is ineffective after Shutdown() returns as the mutex is unlocked. If Create() or handleProcessExit() enters again after Shutdown() unlocking s.mu and returning, shim service will panic by sending to a closed channel. So I remove the unnecessary lock guards in service, and rename mu to containersMu to make it clear that it is used to protect s.containers only. Fix: containerd#9103 Signed-off-by: Chen Yiyang <[email protected]>
After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. For some historical reasons, the use of s.mu is a bit confusing, sometimes used to protect s.containers only, and sometimes used as a mutex for some functions. According to analysis of @corhere in containerd#9103, we can know that: Locking s.mu for Create() handler is introduced in containerd#1452, which was a solution for the missed early-exits problem: Create() holds the mutex to block checkProcesses() from handling exit events until after Create() has added the container process to the s.processes map. This locking logic was copied into the v2 shim implementation. containerd#8617 solves the same problem using a strategy that does not rely on mutual exclusion between container create and exit-event handling. As for Shutdown(), the implementation was added in containerd#3004. And in this initial implementation, the mutex is locked to safely access s.containers, it is not unlocked because nothing matters after os.Exit(0). Then ae87730 changed Shutdown() to return instead of exiting, followed by containerd#4988 to unlock upon return. If the intention is to block containers from being created while the shim's task service is shutting down, locking a mutex is a poor solution since it makes the create requests hang instead of returning an error, and is ineffective after Shutdown() returns as the mutex is unlocked. If Create() or handleProcessExit() enters again after Shutdown() unlocking s.mu and returning, shim service will panic by sending to a closed channel. So I remove the unnecessary lock guards in service, and rename mu to containersMu to make it clear that it is used to protect s.containers only. Fix: containerd#9103 Signed-off-by: Chen Yiyang <[email protected]>
History
We added the ability to group containers under a single shim after the initial integration so existing runc containers used one shim per container.
Groups
The grouping feature in this shim allows for using
Annotationson the runtime spec to group multiple containers and only launch one shim for this group of containers.Two ways of grouping are supported using the following labels:
io.containerd.runc.v2.group=<group>io.kubernetes.cri.sandbox-id=<group>Using the sandbox-id label, this shim with automatically group containers in the same k8s pod into one shim.
Examples
Place two containers in a group, resulting in one shim being the parent of the
redis-serverandshprocesses.Commands:
Output: