Until now we observed this happening just to one container image which has 19 layers. If it's useful I can share the manifest but need to anonymize it a bit since it has sensitive info.
containerd github.com/containerd/containerd v1.3.0 36cf5b690dcc00ff0f34ff7799209050c3d0c59a
{
"status": {
"conditions": [
{
"type": "RuntimeReady",
"status": true,
"reason": "",
"message": ""
},
{
"type": "NetworkReady",
"status": true,
"reason": "",
"message": ""
}
]
},
"cniconfig": {
"PluginDirs": [
"/opt/cni/bin"
],
"PluginConfDir": "/etc/cni/net.d",
"PluginMaxConfNum": 1,
"Prefix": "eth",
"Networks": [
{
"Config": {
"Name": "cni-loopback",
"CNIVersion": "0.3.1",
"Plugins": [
{
"Network": {
"type": "loopback",
"ipam": {},
"dns": {}
},
"Source": "{\"type\":\"loopback\"}"
}
],
"Source": "{\n\"cniVersion\": \"0.3.1\",\n\"name\": \"cni-loopback\",\n\"plugins\": [{\n \"type\": \"loopback\"\n}]\n}"
},
"IFName": "lo"
},
{
"Config": {
"Name": "cilium",
"CNIVersion": "0.3.1",
"Plugins": [
{
"Network": {
"cniVersion": "0.3.1",
"type": "cilium-cni",
"ipam": {},
"dns": {}
},
"Source": "{\"cniVersion\":\"0.3.1\",\"eni\":{\"delete-on-termination\":true,\"first-interface-index\":1,\"min-allocate\":13,\"pre-allocate\":1,\"security-groups\":[\"sg-051f6d0c957db0281\"],\"subnet-tags\":{\"kubernetes_kubelet\":\"true\"}},\"type\":\"cilium-cni\"}"
}
],
"Source": "{\n \"cniVersion\":\"0.3.1\",\n \"name\":\"cilium\",\n \"plugins\": [\n {\n \"cniVersion\":\"0.3.1\",\n \"type\":\"cilium-cni\",\n \"eni\": {\n \"pre-allocate\": 1,\n \"min-allocate\": 13,\n \"first-interface-index\":1,\n \"security-groups\":[\n \"sg-051f6d0c957db0281\"\n ],\n \"subnet-tags\":{\n \"kubernetes_kubelet\":\"true\"\n },\n \"delete-on-termination\": true\n }\n }\n ]\n}\n"
},
"IFName": "eth0"
}
]
},
"config": {
"containerd": {
"snapshotter": "overlayfs",
"defaultRuntimeName": "runc",
"defaultRuntime": {
"runtimeType": "io.containerd.runtime.v1.linux",
"runtimeEngine": "",
"PodAnnotations": null,
"runtimeRoot": "",
"options": null,
"privileged_without_host_devices": false
},
"untrustedWorkloadRuntime": {
"runtimeType": "",
"runtimeEngine": "",
"PodAnnotations": null,
"runtimeRoot": "",
"options": null,
"privileged_without_host_devices": false
},
"runtimes": {
"default": {
"runtimeType": "io.containerd.runtime.v1.linux",
"runtimeEngine": "",
"PodAnnotations": null,
"runtimeRoot": "",
"options": null,
"privileged_without_host_devices": false
},
"runc": {
"runtimeType": "io.containerd.runc.v1",
"runtimeEngine": "",
"PodAnnotations": null,
"runtimeRoot": "",
"options": null,
"privileged_without_host_devices": false
}
},
"noPivot": false
},
"cni": {
"binDir": "/opt/cni/bin",
"confDir": "/etc/cni/net.d",
"maxConfNum": 1,
"confTemplate": ""
},
"registry": {
"mirrors": {
"docker.io": {
"endpoint": [
"https://REDACTED.COM"
]
}
},
"configs": {
"REDACTED.COM": {
"auth": null,
"tls": {
"caFile": "/etc/ssl/ca.pem",
"certFile": "/etc/ssl/cert.pem",
"keyFile": "/etc/ssl/key.pem"
}
}
},
"auths": null
},
"disableTCPService": true,
"streamServerAddress": "127.0.0.1",
"streamServerPort": "0",
"streamIdleTimeout": "4h0m0s",
"enableSelinux": false,
"sandboxImage": "pause:3.1.0",
"statsCollectPeriod": 10,
"systemdCgroup": false,
"enableTLSStreaming": false,
"x509KeyPairStreaming": {
"tlsCertFile": "",
"tlsKeyFile": ""
},
"maxContainerLogSize": 2621440,
"disableCgroup": false,
"disableApparmor": false,
"restrictOOMScoreAdj": false,
"maxConcurrentDownloads": 10,
"disableProcMount": false,
"containerdRootDir": "/var/lib/container-runtime/containerd",
"containerdEndpoint": "/run/containerd/containerd.sock",
"rootDir": "/var/lib/container-runtime/containerd/io.containerd.grpc.v1.cri",
"stateDir": "/run/containerd/io.containerd.grpc.v1.cri"
},
"golang": "go1.12.10"
}
Description
I'm running k8s 1.14.6 and recently upgraded to containerd 1.3.0 form 1.2.4. From time to time when pods get scheduled on a node I observe that the kubelet is waiting on containerd to pull an image. I took a goroutine dump (https://gist.github.com/ungureanuvladvictor/39a5e37754dc7969233288331232a258) when this is happening and you can observe that
goroutine 40397is stuck on asemacquire:containerd/pull.go
Line 73 in 36cf5b6
This runs in a defer that got triggered via:
containerd/pull.go
Lines 88 to 91 in 36cf5b6
I think the actual goroutine to focus on is
goroutine 40508. This one is stuck at a select call atcontainerd/unpacker.go
Line 106 in 36cf5b6
Until now we observed this happening just to one container image which has 19 layers. If it's useful I can share the manifest but need to anonymize it a bit since it has sensitive info.
Let me know if this issue is better opened on
containerd/cri.Steps to reproduce the issue:
Unfortunately I do not have clear repro steps, I could not figure out how to trigger this.
Describe the results you received:
containerd is stuck at pulling/unpacking the image.
Describe the results you expected:
containerd is pulling/unpacking the image successfully.
Output of
containerd --version:Any other relevant information:
output of
crictl info:{ "status": { "conditions": [ { "type": "RuntimeReady", "status": true, "reason": "", "message": "" }, { "type": "NetworkReady", "status": true, "reason": "", "message": "" } ] }, "cniconfig": { "PluginDirs": [ "/opt/cni/bin" ], "PluginConfDir": "/etc/cni/net.d", "PluginMaxConfNum": 1, "Prefix": "eth", "Networks": [ { "Config": { "Name": "cni-loopback", "CNIVersion": "0.3.1", "Plugins": [ { "Network": { "type": "loopback", "ipam": {}, "dns": {} }, "Source": "{\"type\":\"loopback\"}" } ], "Source": "{\n\"cniVersion\": \"0.3.1\",\n\"name\": \"cni-loopback\",\n\"plugins\": [{\n \"type\": \"loopback\"\n}]\n}" }, "IFName": "lo" }, { "Config": { "Name": "cilium", "CNIVersion": "0.3.1", "Plugins": [ { "Network": { "cniVersion": "0.3.1", "type": "cilium-cni", "ipam": {}, "dns": {} }, "Source": "{\"cniVersion\":\"0.3.1\",\"eni\":{\"delete-on-termination\":true,\"first-interface-index\":1,\"min-allocate\":13,\"pre-allocate\":1,\"security-groups\":[\"sg-051f6d0c957db0281\"],\"subnet-tags\":{\"kubernetes_kubelet\":\"true\"}},\"type\":\"cilium-cni\"}" } ], "Source": "{\n \"cniVersion\":\"0.3.1\",\n \"name\":\"cilium\",\n \"plugins\": [\n {\n \"cniVersion\":\"0.3.1\",\n \"type\":\"cilium-cni\",\n \"eni\": {\n \"pre-allocate\": 1,\n \"min-allocate\": 13,\n \"first-interface-index\":1,\n \"security-groups\":[\n \"sg-051f6d0c957db0281\"\n ],\n \"subnet-tags\":{\n \"kubernetes_kubelet\":\"true\"\n },\n \"delete-on-termination\": true\n }\n }\n ]\n}\n" }, "IFName": "eth0" } ] }, "config": { "containerd": { "snapshotter": "overlayfs", "defaultRuntimeName": "runc", "defaultRuntime": { "runtimeType": "io.containerd.runtime.v1.linux", "runtimeEngine": "", "PodAnnotations": null, "runtimeRoot": "", "options": null, "privileged_without_host_devices": false }, "untrustedWorkloadRuntime": { "runtimeType": "", "runtimeEngine": "", "PodAnnotations": null, "runtimeRoot": "", "options": null, "privileged_without_host_devices": false }, "runtimes": { "default": { "runtimeType": "io.containerd.runtime.v1.linux", "runtimeEngine": "", "PodAnnotations": null, "runtimeRoot": "", "options": null, "privileged_without_host_devices": false }, "runc": { "runtimeType": "io.containerd.runc.v1", "runtimeEngine": "", "PodAnnotations": null, "runtimeRoot": "", "options": null, "privileged_without_host_devices": false } }, "noPivot": false }, "cni": { "binDir": "/opt/cni/bin", "confDir": "/etc/cni/net.d", "maxConfNum": 1, "confTemplate": "" }, "registry": { "mirrors": { "docker.io": { "endpoint": [ "https://REDACTED.COM" ] } }, "configs": { "REDACTED.COM": { "auth": null, "tls": { "caFile": "/etc/ssl/ca.pem", "certFile": "/etc/ssl/cert.pem", "keyFile": "/etc/ssl/key.pem" } } }, "auths": null }, "disableTCPService": true, "streamServerAddress": "127.0.0.1", "streamServerPort": "0", "streamIdleTimeout": "4h0m0s", "enableSelinux": false, "sandboxImage": "pause:3.1.0", "statsCollectPeriod": 10, "systemdCgroup": false, "enableTLSStreaming": false, "x509KeyPairStreaming": { "tlsCertFile": "", "tlsKeyFile": "" }, "maxContainerLogSize": 2621440, "disableCgroup": false, "disableApparmor": false, "restrictOOMScoreAdj": false, "maxConcurrentDownloads": 10, "disableProcMount": false, "containerdRootDir": "/var/lib/container-runtime/containerd", "containerdEndpoint": "/run/containerd/containerd.sock", "rootDir": "/var/lib/container-runtime/containerd/io.containerd.grpc.v1.cri", "stateDir": "/run/containerd/io.containerd.grpc.v1.cri" }, "golang": "go1.12.10" }