Skip to content

Conversation

@crosbymichael
Copy link
Member

History
We added the ability to group containers under a single shim after the initial integration so existing runc containers used one shim per container.

Groups

The grouping feature in this shim allows for using Annotations on the runtime spec to group multiple containers and only launch one shim for this group of containers.

Two ways of grouping are supported using the following labels:

  • io.containerd.runc.v2.group=<group>
  • io.kubernetes.cri.sandbox-id=<group>

Using the sandbox-id label, this shim with automatically group containers in the same k8s pod into one shim.

Examples

Place two containers in a group, resulting in one shim being the parent of the redis-server and sh processes.

Commands:

> sudo ctr run --label io.containerd.runc.v2.group=test   --rm --runtime io.containerd.runc.v2  docker.io/library/redis:alpine demo1

> sudo ctr run -t --label io.containerd.runc.v2.group=test   --rm --runtime io.containerd.runc.v2  docker.io/library/redis:alpine demo2 sh

Output:

containerd-shim(182608)─┬─redis-server(182631)
                        └─sh(183307)

@crosbymichael crosbymichael added this to the 1.3 milestone Feb 11, 2019
@Random-Liu
Copy link
Member

Thanks for working on this!

@ehazlett
Copy link
Member

some lint fails

runtime/v2/shim/util.go:155:5:warning: exported var ErrNoAddress should have comment or be unexported (golint)
runtime/v2/shim/util.go:157:1:warning: exported function ReadAddress should have comment or be unexported (golint)```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The container.go consumes the taskAPI in the package. How about we handle the notfound error in this Process function? I think it is part of the Process function. If we separate it, the caller must check the value before use.

@crosbymichael crosbymichael force-pushed the multi-shim branch 6 times, most recently from 7b5d7d6 to f74d71a Compare February 12, 2019 20:36
@codecov-io
Copy link

codecov-io commented Feb 12, 2019

Codecov Report

Merging #3004 into master will decrease coverage by 0.08%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3004      +/-   ##
==========================================
- Coverage   43.96%   43.87%   -0.09%     
==========================================
  Files         102      102              
  Lines       10881    10903      +22     
==========================================
  Hits         4784     4784              
- Misses       5362     5384      +22     
  Partials      735      735
Flag Coverage Δ
#linux 47.48% <0%> (-0.1%) ⬇️
#windows 41.08% <0%> (-0.1%) ⬇️
Impacted Files Coverage Δ
runtime/v2/shim/util.go 0% <0%> (ø) ⬆️
oci/spec_opts.go 26.74% <0%> (-0.33%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31438b6...84a2471. Read the comment docs.

@jterry75
Copy link
Contributor

Out of curiosity how does io.containerd.runc.v2.group=<group> know which one is the namespace owner?

The Kubernetes model requires two annotations one to describe the io.kubernetes.cri.sandbox-id and one to describe the io.kubernetes.cri.container-type. The correct grouping for Kubernetes would be to allow a SINGLE io.kubernetes.cri.container-type=="sandbox" and MULTIPLE io.kubernetes.cri.container-type=="container" where all io.kubernetes.cri.sandbox-id are the same correct?

@Ace-Tang
Copy link
Contributor

io.containerd.runc.v2 and io.containerd.runc.v1 are both using shim v2 interface, do we need to record their define somewhere(v1 is one shim to one container, v2 is one shim to a group of containers) to make user more clearly.

@crosbymichael
Copy link
Member Author

@jterry75 there is no need to know the namespace owner, it just launches the grouped containers under the same shim. It does not mess with namespaces

@jterry75
Copy link
Contributor

Oh I see. So they may "run in the same shim" but not necessarily in the same namespace (pid, network, etc). This is really just a feature to allow grouping tasks into a single shim even if the task are unrelated. I like it!

@crosbymichael
Copy link
Member Author

Yes, that is correct. Containers in a group can share the same lifecycle and we don't need multiple shims running to service these containers. So we save on cpu and memory for these containers.

@crosbymichael
Copy link
Member Author

Any more comments or questions on this?

@AkihiroSuda
Copy link
Member

How long will we continue to support shim v2 runc v1 (and shim v1)?

@crosbymichael
Copy link
Member Author

@AkihiroSuda addressed your feedback. I don't see an issue supporting v1 for a while.

Signed-off-by: Michael Crosby <[email protected]>
Copy link
Member

@dmcgowan dmcgowan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@crosbymichael crosbymichael merged commit 0769763 into containerd:master Feb 21, 2019
@crosbymichael crosbymichael deleted the multi-shim branch February 21, 2019 20:40
@sboeuf
Copy link

sboeuf commented Feb 22, 2019

@Random-Liu @crosbymichael
Hi! I thought, based on the previous comment here that you were not planning to implement runc support for v2?

That being said, I think that's great as it will help the whole container community to move to v2 as the de facto, right?

@Random-Liu
Copy link
Member

@sboeuf containerd-shim-runc-v1 is already an implementation of shim v2. This PR is just a new implementation with per-pod shim support.

Per-pod shim is not required for runc, but still good to have. It should save a bunch of go runtime memories. :)

@sboeuf
Copy link

sboeuf commented Feb 22, 2019

@Random-Liu agreed but it is bringing the regular container stack very close to what we have with Kata Containers, and that's great to maintain good compatibility with the stack.
Glad to see this happening.

And btw I had missed the implementation of containerd-shim-runc-v1 for shim-v2. When did this happen?

@crosbymichael
Copy link
Member Author

@sboeuf it happened at the very start of the shim v2 work ;)

@sboeuf
Copy link

sboeuf commented Feb 25, 2019

@crosbymichael oh okay I completely missed that... Well this brings me to another question, do you expect that runc-v2 will become the new standard at some point?
I'm asking because that's my hope so that Kata Containers would not have to track potential modifications to runc-v1, but instead it should only make sure that it fits the containerd-shim-v2 API.

@crosbymichael
Copy link
Member Author

@crosbymichael all you should care about is the v2 shim api, nothing else

@sboeuf
Copy link

sboeuf commented Feb 25, 2019

@crosbymichael That's good, that's all I wanted to hear 👍

cyyzero added a commit to cyyzero/containerd that referenced this pull request Oct 1, 2023
After pr containerd#8617, create handler of containerd-shim-runc-v2 will
call handleStarted() to record the init process and handle its exit.
Init process wouldn't quit so early in normal circumstances. But if
this screnario occurs, handleStarted() will call
handleProcessExit(), which will cause deadlock because create() had
acquired s.mu, and handleProcessExit() will try to lock it again.

For some historical reasons, the use of s.mu is a bit confusing,
sometimes used to protect s.containers only, and sometimes used as a
mutex for some functions.

According to analysis of @corhere in containerd#9103, we can know that:

Locking s.mu for Create() handler is introduced in containerd#1452, which was a
solution for the missed early-exits problem: Create() holds the mutex to
block checkProcesses() from handling exit events until after Create()
has added the container process to the s.processes map. This locking
logic was copied into the v2 shim implementation. containerd#8617 solves the same
problem using a strategy that does not rely on mutual exclusion between
container create and exit-event handling.

As for Shutdown(), the implementation was added in containerd#3004. And in this
initial implementation, the mutex is locked to safely access
s.containers, it is not unlocked because nothing matters after
os.Exit(0). Then ae87730 changed Shutdown() to return instead of
exiting, followed by containerd#4988 to unlock upon return. If the intention is
to block containers from being created while the shim's task service is
shutting down, locking a mutex is a poor solution since it makes the
create requests hang instead of returning an error, and is ineffective
after Shutdown() returns as the mutex is unlocked.

If Create() or handleProcessExit() enters again after Shutdown()
unlocking s.mu and returning, shim service will panic by sending to
a closed channel.

So I remove the unnecessary lock guards in service, and rename mu to
containersMu to make it clear that it is used to protect s.containers
only.

Fix: containerd#9103
Signed-off-by: Chen Yiyang <[email protected]>
cyyzero added a commit to cyyzero/containerd that referenced this pull request Oct 1, 2023
After pr containerd#8617, create handler of containerd-shim-runc-v2 will
call handleStarted() to record the init process and handle its exit.
Init process wouldn't quit so early in normal circumstances. But if
this screnario occurs, handleStarted() will call
handleProcessExit(), which will cause deadlock because create() had
acquired s.mu, and handleProcessExit() will try to lock it again.

For some historical reasons, the use of s.mu is a bit confusing,
sometimes used to protect s.containers only, and sometimes used as a
mutex for some functions.

According to analysis of @corhere in containerd#9103, we can know that:

Locking s.mu for Create() handler is introduced in containerd#1452, which was a
solution for the missed early-exits problem: Create() holds the mutex to
block checkProcesses() from handling exit events until after Create()
has added the container process to the s.processes map. This locking
logic was copied into the v2 shim implementation. containerd#8617 solves the same
problem using a strategy that does not rely on mutual exclusion between
container create and exit-event handling.

As for Shutdown(), the implementation was added in containerd#3004. And in this
initial implementation, the mutex is locked to safely access
s.containers, it is not unlocked because nothing matters after
os.Exit(0). Then ae87730 changed Shutdown() to return instead of
exiting, followed by containerd#4988 to unlock upon return. If the intention is
to block containers from being created while the shim's task service is
shutting down, locking a mutex is a poor solution since it makes the
create requests hang instead of returning an error, and is ineffective
after Shutdown() returns as the mutex is unlocked.

If Create() or handleProcessExit() enters again after Shutdown()
unlocking s.mu and returning, shim service will panic by sending to
a closed channel.

So I remove the unnecessary lock guards in service, and rename mu to
containersMu to make it clear that it is used to protect s.containers
only.

Fix: containerd#9103
Signed-off-by: Chen Yiyang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants