Skip to content

Deadlock in pulling/unpacking image with containerd 1.3.0 #3816

@ungureanuvladvictor

Description

@ungureanuvladvictor

Description

I'm running k8s 1.14.6 and recently upgraded to containerd 1.3.0 form 1.2.4. From time to time when pods get scheduled on a node I observe that the kubelet is waiting on containerd to pull an image. I took a goroutine dump (https://gist.github.com/ungureanuvladvictor/39a5e37754dc7969233288331232a258) when this is happening and you can observe that goroutine 40397 is stuck on a semacquire:

if err := eg.Wait(); err != nil {

This runs in a defer that got triggered via:

containerd/pull.go

Lines 88 to 91 in 36cf5b6

img, err := c.fetch(ctx, pullCtx, ref, 1)
if err != nil {
return nil, err
}

I think the actual goroutine to focus on is goroutine 40508. This one is stuck at a select call at

select {

Until now we observed this happening just to one container image which has 19 layers. If it's useful I can share the manifest but need to anonymize it a bit since it has sensitive info.

Let me know if this issue is better opened on containerd/cri.

Steps to reproduce the issue:
Unfortunately I do not have clear repro steps, I could not figure out how to trigger this.

Describe the results you received:
containerd is stuck at pulling/unpacking the image.

Describe the results you expected:
containerd is pulling/unpacking the image successfully.

Output of containerd --version:

containerd github.com/containerd/containerd v1.3.0 36cf5b690dcc00ff0f34ff7799209050c3d0c59a

Any other relevant information:
output of crictl info:

{
  "status": {
    "conditions": [
      {
        "type": "RuntimeReady",
        "status": true,
        "reason": "",
        "message": ""
      },
      {
        "type": "NetworkReady",
        "status": true,
        "reason": "",
        "message": ""
      }
    ]
  },
  "cniconfig": {
    "PluginDirs": [
      "/opt/cni/bin"
    ],
    "PluginConfDir": "/etc/cni/net.d",
    "PluginMaxConfNum": 1,
    "Prefix": "eth",
    "Networks": [
      {
        "Config": {
          "Name": "cni-loopback",
          "CNIVersion": "0.3.1",
          "Plugins": [
            {
              "Network": {
                "type": "loopback",
                "ipam": {},
                "dns": {}
              },
              "Source": "{\"type\":\"loopback\"}"
            }
          ],
          "Source": "{\n\"cniVersion\": \"0.3.1\",\n\"name\": \"cni-loopback\",\n\"plugins\": [{\n  \"type\": \"loopback\"\n}]\n}"
        },
        "IFName": "lo"
      },
      {
        "Config": {
          "Name": "cilium",
          "CNIVersion": "0.3.1",
          "Plugins": [
            {
              "Network": {
                "cniVersion": "0.3.1",
                "type": "cilium-cni",
                "ipam": {},
                "dns": {}
              },
              "Source": "{\"cniVersion\":\"0.3.1\",\"eni\":{\"delete-on-termination\":true,\"first-interface-index\":1,\"min-allocate\":13,\"pre-allocate\":1,\"security-groups\":[\"sg-051f6d0c957db0281\"],\"subnet-tags\":{\"kubernetes_kubelet\":\"true\"}},\"type\":\"cilium-cni\"}"
            }
          ],
          "Source": "{\n  \"cniVersion\":\"0.3.1\",\n  \"name\":\"cilium\",\n  \"plugins\": [\n    {\n      \"cniVersion\":\"0.3.1\",\n      \"type\":\"cilium-cni\",\n      \"eni\": {\n        \"pre-allocate\": 1,\n        \"min-allocate\": 13,\n        \"first-interface-index\":1,\n        \"security-groups\":[\n          \"sg-051f6d0c957db0281\"\n        ],\n        \"subnet-tags\":{\n          \"kubernetes_kubelet\":\"true\"\n        },\n        \"delete-on-termination\": true\n      }\n    }\n  ]\n}\n"
        },
        "IFName": "eth0"
      }
    ]
  },
  "config": {
    "containerd": {
      "snapshotter": "overlayfs",
      "defaultRuntimeName": "runc",
      "defaultRuntime": {
        "runtimeType": "io.containerd.runtime.v1.linux",
        "runtimeEngine": "",
        "PodAnnotations": null,
        "runtimeRoot": "",
        "options": null,
        "privileged_without_host_devices": false
      },
      "untrustedWorkloadRuntime": {
        "runtimeType": "",
        "runtimeEngine": "",
        "PodAnnotations": null,
        "runtimeRoot": "",
        "options": null,
        "privileged_without_host_devices": false
      },
      "runtimes": {
        "default": {
          "runtimeType": "io.containerd.runtime.v1.linux",
          "runtimeEngine": "",
          "PodAnnotations": null,
          "runtimeRoot": "",
          "options": null,
          "privileged_without_host_devices": false
        },
        "runc": {
          "runtimeType": "io.containerd.runc.v1",
          "runtimeEngine": "",
          "PodAnnotations": null,
          "runtimeRoot": "",
          "options": null,
          "privileged_without_host_devices": false
        }
      },
      "noPivot": false
    },
    "cni": {
      "binDir": "/opt/cni/bin",
      "confDir": "/etc/cni/net.d",
      "maxConfNum": 1,
      "confTemplate": ""
    },
    "registry": {
      "mirrors": {
        "docker.io": {
          "endpoint": [
            "https://REDACTED.COM"
          ]
        }
      },
      "configs": {
        "REDACTED.COM": {
          "auth": null,
          "tls": {
            "caFile": "/etc/ssl/ca.pem",
            "certFile": "/etc/ssl/cert.pem",
            "keyFile": "/etc/ssl/key.pem"
          }
        }
      },
      "auths": null
    },
    "disableTCPService": true,
    "streamServerAddress": "127.0.0.1",
    "streamServerPort": "0",
    "streamIdleTimeout": "4h0m0s",
    "enableSelinux": false,
    "sandboxImage": "pause:3.1.0",
    "statsCollectPeriod": 10,
    "systemdCgroup": false,
    "enableTLSStreaming": false,
    "x509KeyPairStreaming": {
      "tlsCertFile": "",
      "tlsKeyFile": ""
    },
    "maxContainerLogSize": 2621440,
    "disableCgroup": false,
    "disableApparmor": false,
    "restrictOOMScoreAdj": false,
    "maxConcurrentDownloads": 10,
    "disableProcMount": false,
    "containerdRootDir": "/var/lib/container-runtime/containerd",
    "containerdEndpoint": "/run/containerd/containerd.sock",
    "rootDir": "/var/lib/container-runtime/containerd/io.containerd.grpc.v1.cri",
    "stateDir": "/run/containerd/io.containerd.grpc.v1.cri"
  },
  "golang": "go1.12.10"
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions