Skip to content

gVisor containerd shim incompatible w/ containerd 2.0 #11708

@zkoopmans

Description

@zkoopmans

Description

Hey All,

gVisor project member here.

We're seeing issues on containerd 2.0 with our existing shim. The main method for our shim is here.

I observed the issue a k8s cluster on GKE. kubectl delete operations hang indefinitely because containerd fails to delete the pause container. kubectl delete --force will delete the pod from k8s, but on the node, the pause container (via a runsc sandbox) and the shim stay around. I'll post the debug logs here in a subsequent post.

Our shim does not implement the Manager interface in case that is a problem.

Note that our shim is currently linked against containerd v1.6.36

Steps to reproduce the issue

  1. Start a cluster w/ containerd 2.0
  2. Add gVisor as a runtime class (follow instructions here).
  3. Add a runtimeClass object to the cluster
  4. Update the containers/config.toml file to the above.
  5. Run a hello-world pod on the cluster (kubectl apply -f hello.yaml).
  6. Delete the pod (kubectl delete pod/hello)

Describe the results you received and expected

Expected results: kubectl delete deletes the pod.

The delete command hangs indefinitely. kubectl delete pod/hello --force will terminate, and the pod will not be registered in the cluster

On the node, the runsc instance and the runsc shim remain for both instances (ps -ef | grep runsc)

What version of containerd are you using?

containerd github.com/containerd/containerd/v2 2.0.0 207ad71

Any other relevant information

Simple hello-world pod that I used to debug:

apiVersion: v1
kind: Pod
metadata:
  name: hello
spec:
  runtimeClassName: gvisor
  restartPolicy: Never
  containers:
  - name: hello
    image: alpine
    args: ["echo", "hello" ]
  - name: pause
    image: registry.k8s.io/pause:latest

Show configuration if it is related to CRI plugin.

version = 2
required_plugins = ["io.containerd.grpc.v1.cri"]
# Kubernetes doesn't use containerd restart manager.
disabled_plugins = ["io.containerd.internal.v1.restart"]
oom_score = -999

[debug]
  level = "info"

[grpc]
  gid = 412

[plugins."io.containerd.grpc.v1.cri"]
  stream_server_address = "127.0.0.1"
  max_container_log_line_size = 262144
  sandbox_image = "us-central1-artifactregistry.gcr.io/gke-release/gke-release/pause:3.8@sha256:880e63f94b145e46f1b1082bb71b85e21f16b99b180b9996407d61240ceb9830"
  image_pull_progress_timeout = "5m"
[plugins."io.containerd.grpc.v1.cri".cni]
  bin_dir = "/home/kubernetes/bin"
  conf_dir = "/etc/cni/net.d"
  conf_template = "/home/containerd/cni.template"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
  endpoint = ["https://mirror.gcr.io","https://registry-1.docker.io"]
[metrics]
 address = "127.0.0.1:1338"
[plugins."io.containerd.grpc.v1.cri".containerd]
  default_runtime_name = "runc"
  discard_unpacked_layers = true
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.gvisor]
  runtime_type = "io.containerd.runsc.v1"
  pod_annotations = [ "dev.gvisor.*" ]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.gvisor.options]
  TypeUrl = "io.containerd.runsc.v1.options"
  ConfigPath = "/run/containerd/runsc/config.toml"

[plugins."io.containerd.internal.v1.opt"]
  path = "/home/containerd/opt/containerd"

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    Status

    Done

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions