io.containerd.runc.v2 with Container Groups #3004

crosbymichael · 2019-02-11T21:55:29Z

History
We added the ability to group containers under a single shim after the initial integration so existing runc containers used one shim per container.

Groups

The grouping feature in this shim allows for using Annotations on the runtime spec to group multiple containers and only launch one shim for this group of containers.

Two ways of grouping are supported using the following labels:

io.containerd.runc.v2.group=<group>
io.kubernetes.cri.sandbox-id=<group>

Using the sandbox-id label, this shim with automatically group containers in the same k8s pod into one shim.

Examples

Place two containers in a group, resulting in one shim being the parent of the redis-server and sh processes.

Commands:

> sudo ctr run --label io.containerd.runc.v2.group=test   --rm --runtime io.containerd.runc.v2  docker.io/library/redis:alpine demo1

> sudo ctr run -t --label io.containerd.runc.v2.group=test   --rm --runtime io.containerd.runc.v2  docker.io/library/redis:alpine demo2 sh

Output:

containerd-shim(182608)─┬─redis-server(182631)
                        └─sh(183307)

Random-Liu · 2019-02-11T22:03:09Z

Thanks for working on this!

ehazlett · 2019-02-11T22:09:16Z

some lint fails

runtime/v2/shim/util.go:155:5:warning: exported var ErrNoAddress should have comment or be unexported (golint)
runtime/v2/shim/util.go:157:1:warning: exported function ReadAddress should have comment or be unexported (golint)```

client.go

runtime/v2/runc/v1/service.go

fuweid · 2019-02-12T02:40:14Z

runtime/v2/runc/container.go

The container.go consumes the taskAPI in the package. How about we handle the notfound error in this Process function? I think it is part of the Process function. If we separate it, the caller must check the value before use.

runtime/v2/runc/v2/service.go

services/tasks/local.go

codecov-io · 2019-02-12T21:48:16Z

Codecov Report

Merging #3004 into master will decrease coverage by 0.08%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master    #3004      +/-   ##
==========================================
- Coverage   43.96%   43.87%   -0.09%     
==========================================
  Files         102      102              
  Lines       10881    10903      +22     
==========================================
  Hits         4784     4784              
- Misses       5362     5384      +22     
  Partials      735      735

Flag	Coverage Δ
#linux	`47.48% <0%> (-0.1%)`	⬇️
#windows	`41.08% <0%> (-0.1%)`	⬇️

Impacted Files	Coverage Δ
runtime/v2/shim/util.go	`0% <0%> (ø)`	⬆️
oci/spec_opts.go	`26.74% <0%> (-0.33%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31438b6...84a2471. Read the comment docs.

jterry75 · 2019-02-12T22:21:19Z

Out of curiosity how does io.containerd.runc.v2.group=<group> know which one is the namespace owner?

The Kubernetes model requires two annotations one to describe the io.kubernetes.cri.sandbox-id and one to describe the io.kubernetes.cri.container-type. The correct grouping for Kubernetes would be to allow a SINGLE io.kubernetes.cri.container-type=="sandbox" and MULTIPLE io.kubernetes.cri.container-type=="container" where all io.kubernetes.cri.sandbox-id are the same correct?

runtime/v2/runc/v1/service.go

runtime/v2/runc/v2/service.go

Ace-Tang · 2019-02-13T09:25:03Z

io.containerd.runc.v2 and io.containerd.runc.v1 are both using shim v2 interface, do we need to record their define somewhere（v1 is one shim to one container, v2 is one shim to a group of containers） to make user more clearly.

crosbymichael · 2019-02-13T15:44:08Z

@jterry75 there is no need to know the namespace owner, it just launches the grouped containers under the same shim. It does not mess with namespaces

jterry75 · 2019-02-13T17:58:30Z

Oh I see. So they may "run in the same shim" but not necessarily in the same namespace (pid, network, etc). This is really just a feature to allow grouping tasks into a single shim even if the task are unrelated. I like it!

crosbymichael · 2019-02-13T18:57:50Z

Yes, that is correct. Containers in a group can share the same lifecycle and we don't need multiple shims running to service these containers. So we save on cpu and memory for these containers.

crosbymichael · 2019-02-19T16:56:09Z

Any more comments or questions on this?

task_opts.go

services/tasks/local.go

container_checkpoint_test.go

AkihiroSuda · 2019-02-20T16:30:16Z

How long will we continue to support shim v2 runc v1 (and shim v1)?

Signed-off-by: Michael Crosby <[email protected]>

crosbymichael · 2019-02-21T16:03:47Z

@AkihiroSuda addressed your feedback. I don't see an issue supporting v1 for a while.

task_opts.go

Signed-off-by: Michael Crosby <[email protected]>

dmcgowan

LGTM

sboeuf · 2019-02-22T18:24:26Z

@Random-Liu @crosbymichael
Hi! I thought, based on the previous comment here that you were not planning to implement runc support for v2?

That being said, I think that's great as it will help the whole container community to move to v2 as the de facto, right?

Random-Liu · 2019-02-22T18:32:23Z

@sboeuf containerd-shim-runc-v1 is already an implementation of shim v2. This PR is just a new implementation with per-pod shim support.

Per-pod shim is not required for runc, but still good to have. It should save a bunch of go runtime memories. :)

sboeuf · 2019-02-22T18:36:24Z

@Random-Liu agreed but it is bringing the regular container stack very close to what we have with Kata Containers, and that's great to maintain good compatibility with the stack.
Glad to see this happening.

And btw I had missed the implementation of containerd-shim-runc-v1 for shim-v2. When did this happen?

crosbymichael · 2019-02-23T16:31:01Z

@sboeuf it happened at the very start of the shim v2 work ;)

sboeuf · 2019-02-25T15:07:47Z

@crosbymichael oh okay I completely missed that... Well this brings me to another question, do you expect that runc-v2 will become the new standard at some point?
I'm asking because that's my hope so that Kata Containers would not have to track potential modifications to runc-v1, but instead it should only make sure that it fits the containerd-shim-v2 API.

crosbymichael · 2019-02-25T15:09:37Z

@crosbymichael all you should care about is the v2 shim api, nothing else

sboeuf · 2019-02-25T15:10:44Z

@crosbymichael That's good, that's all I wanted to hear 👍

@corhere

After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. For some historical reasons, the use of s.mu is a bit confusing, sometimes used to protect s.containers only, and sometimes used as a mutex for some functions. According to analysis of @corhere in containerd#9103, we can know that: Locking s.mu for Create() handler is introduced in containerd#1452, which was a solution for the missed early-exits problem: Create() holds the mutex to block checkProcesses() from handling exit events until after Create() has added the container process to the s.processes map. This locking logic was copied into the v2 shim implementation. containerd#8617 solves the same problem using a strategy that does not rely on mutual exclusion between container create and exit-event handling. As for Shutdown(), the implementation was added in containerd#3004. And in this initial implementation, the mutex is locked to safely access s.containers, it is not unlocked because nothing matters after os.Exit(0). Then ae87730 changed Shutdown() to return instead of exiting, followed by containerd#4988 to unlock upon return. If the intention is to block containers from being created while the shim's task service is shutting down, locking a mutex is a poor solution since it makes the create requests hang instead of returning an error, and is ineffective after Shutdown() returns as the mutex is unlocked. If Create() or handleProcessExit() enters again after Shutdown() unlocking s.mu and returning, shim service will panic by sending to a closed channel. So I remove the unnecessary lock guards in service, and rename mu to containersMu to make it clear that it is used to protect s.containers only. Fix: containerd#9103 Signed-off-by: Chen Yiyang <[email protected]>

@corhere

After pr containerd#8617, create handler of containerd-shim-runc-v2 will call handleStarted() to record the init process and handle its exit. Init process wouldn't quit so early in normal circumstances. But if this screnario occurs, handleStarted() will call handleProcessExit(), which will cause deadlock because create() had acquired s.mu, and handleProcessExit() will try to lock it again. For some historical reasons, the use of s.mu is a bit confusing, sometimes used to protect s.containers only, and sometimes used as a mutex for some functions. According to analysis of @corhere in containerd#9103, we can know that: Locking s.mu for Create() handler is introduced in containerd#1452, which was a solution for the missed early-exits problem: Create() holds the mutex to block checkProcesses() from handling exit events until after Create() has added the container process to the s.processes map. This locking logic was copied into the v2 shim implementation. containerd#8617 solves the same problem using a strategy that does not rely on mutual exclusion between container create and exit-event handling. As for Shutdown(), the implementation was added in containerd#3004. And in this initial implementation, the mutex is locked to safely access s.containers, it is not unlocked because nothing matters after os.Exit(0). Then ae87730 changed Shutdown() to return instead of exiting, followed by containerd#4988 to unlock upon return. If the intention is to block containers from being created while the shim's task service is shutting down, locking a mutex is a poor solution since it makes the create requests hang instead of returning an error, and is ineffective after Shutdown() returns as the mutex is unlocked. If Create() or handleProcessExit() enters again after Shutdown() unlocking s.mu and returning, shim service will panic by sending to a closed channel. So I remove the unnecessary lock guards in service, and rename mu to containersMu to make it clear that it is used to protect s.containers only. Fix: containerd#9103 Signed-off-by: Chen Yiyang <[email protected]>

crosbymichael added this to the 1.3 milestone Feb 11, 2019

crosbymichael requested a review from Random-Liu February 11, 2019 21:55

crosbymichael force-pushed the multi-shim branch from 02b08f0 to 047772f Compare February 11, 2019 22:12

fuweid reviewed Feb 12, 2019

View reviewed changes

crosbymichael force-pushed the multi-shim branch 6 times, most recently from 7b5d7d6 to f74d71a Compare February 12, 2019 20:36

ehazlett reviewed Feb 12, 2019

View reviewed changes

runtime/v2/runc/v1/service.go Outdated Show resolved Hide resolved

ehazlett reviewed Feb 12, 2019

View reviewed changes

runtime/v2/runc/v1/service.go Outdated Show resolved Hide resolved

Ace-Tang reviewed Feb 13, 2019

View reviewed changes

runtime/v2/runc/v2/service.go Outdated Show resolved Hide resolved

crosbymichael force-pushed the multi-shim branch from f74d71a to 2001104 Compare February 13, 2019 15:35

AkihiroSuda reviewed Feb 20, 2019

View reviewed changes

task_opts.go Outdated Show resolved Hide resolved

AkihiroSuda reviewed Feb 20, 2019

View reviewed changes

services/tasks/local.go Outdated Show resolved Hide resolved

AkihiroSuda reviewed Feb 20, 2019

View reviewed changes

container_checkpoint_test.go Outdated Show resolved Hide resolved

Move runc shim code into common package

6bcbf88

Signed-off-by: Michael Crosby <[email protected]>

crosbymichael force-pushed the multi-shim branch from 2001104 to 8a56943 Compare February 21, 2019 16:03

AkihiroSuda reviewed Feb 21, 2019

View reviewed changes

task_opts.go Outdated Show resolved Hide resolved

AkihiroSuda reviewed Feb 21, 2019

View reviewed changes

task_opts.go Outdated Show resolved Hide resolved

AkihiroSuda approved these changes Feb 21, 2019

View reviewed changes

crosbymichael force-pushed the multi-shim branch from 8a56943 to 8d5e8e5 Compare February 21, 2019 16:08

Add runc.v2 multi-shim

84a2471

Signed-off-by: Michael Crosby <[email protected]>

crosbymichael force-pushed the multi-shim branch from 8d5e8e5 to 84a2471 Compare February 21, 2019 16:10

dmcgowan approved these changes Feb 21, 2019

View reviewed changes

crosbymichael merged commit 0769763 into containerd:master Feb 21, 2019

crosbymichael deleted the multi-shim branch February 21, 2019 20:40

Ace-Tang mentioned this pull request Feb 22, 2019

containerd-shim-v2: new shim comes kata-containers/runtime#1273

Closed

This was referenced Mar 4, 2019

[test failed] "regular resource usage tracking resource tracking for 100 pods per node" for containerd kubernetes/kubernetes#74917

Closed

Use per-pod shim. containerd/cri#1075

Closed

samuelkarp mentioned this pull request Apr 10, 2019

Group containers into different VMs by VMID firecracker-microvm/firecracker-containerd#160

Merged

sipsma mentioned this pull request Apr 10, 2019

Consider supporting external container grouping OCI annotations firecracker-microvm/firecracker-containerd#162

Open

thaJeztah mentioned this pull request Jun 14, 2019

[release/1.2 backport] update Go 1.12 and travis/appveyor changes #3342

Merged

gosoon mentioned this pull request Aug 13, 2022

containerd support container groups per shim,how to use Docker support in Kubernetes moby/moby#43958

Closed

fuweid mentioned this pull request Nov 11, 2022

CheckRuntime(): fix panic and simplify logic #7628

Closed

corhere mentioned this pull request Sep 27, 2023

containerd-shim-runc-v2: avoid potential deadlock in create handler #9104

Merged

samuelkarp mentioned this pull request Jul 25, 2024

[CRI] pod sandbox without a container image ("pause-less") #10505

Closed

io.containerd.runc.v2 with Container Groups #3004

io.containerd.runc.v2 with Container Groups #3004

Uh oh!

Conversation

crosbymichael commented Feb 11, 2019

Uh oh!

Random-Liu commented Feb 11, 2019

Uh oh!

ehazlett commented Feb 11, 2019

Uh oh!

Uh oh!

Uh oh!

fuweid Feb 12, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov-io commented Feb 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jterry75 commented Feb 12, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ace-Tang commented Feb 13, 2019

Uh oh!

crosbymichael commented Feb 13, 2019

Uh oh!

jterry75 commented Feb 13, 2019

Uh oh!

crosbymichael commented Feb 13, 2019

Uh oh!

crosbymichael commented Feb 19, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AkihiroSuda commented Feb 20, 2019

Uh oh!

crosbymichael commented Feb 21, 2019

Uh oh!

Uh oh!

Uh oh!

dmcgowan left a comment

Choose a reason for hiding this comment

Uh oh!

sboeuf commented Feb 22, 2019

Uh oh!

Random-Liu commented Feb 22, 2019

Uh oh!

sboeuf commented Feb 22, 2019

Uh oh!

crosbymichael commented Feb 23, 2019

Uh oh!

sboeuf commented Feb 25, 2019

Uh oh!

crosbymichael commented Feb 25, 2019

Uh oh!

sboeuf commented Feb 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

codecov-io commented Feb 12, 2019 •

edited

Loading

sboeuf commented Feb 25, 2019 •

edited

Loading