Skip to content

kubectl exec or crictl exec cause goroutine and thus memory leak when using WebSocket connection #10568

@alfredkrohmer

Description

@alfredkrohmer

Description

Please also refer to this initial bug report in the Kubernetes repo: kubernetes/kubernetes#126608

kubectl has feature gate to run the exec command over WebSocket instead of Spdy that can be controlled via the environment variable KUBECTL_REMOTE_COMMAND_WEBSOCKETS and that's by default set to false (= Spdy) on kubectl < v1.30.0 and to true (= WebSocket) on kubectl >= v1.30.0.

crictl >= v1.30.0 has a command line flag --transport for the exec flag (still defaults to Spdy).

If either kubectl exec with KUBECTL_REMOTE_COMMAND_WEBSOCKETS enabled or crictl exec --transport websocket is used against containerd, a goroutine with the following stack trace is leaked on every execution of (kubectl|crictl) exec:

goroutine 82060 [chan receive]:
k8s.io/apiserver/pkg/util/wsstream.(*Conn).Open(0xc006ba0af0, {0x55fe91acc880?, 0xc0055d8ee0}, 0xc00a1bcb00)
	/builddir/build/BUILD/containerd-1.6.19-1.amzn2.0.1/src/github.com/containerd/containerd/vendor/k8s.io/apiserver/pkg/util/wsstream/conn.go:185 +0xc5
github.com/containerd/containerd/pkg/cri/streaming/remotecommand.createWebSocketStreams(0xc00a1bcb00?, {0x55fe91acc880, 0xc0055d8ee0}, 0xc0085c17f4, 0xd18c2e28000)
	/builddir/build/BUILD/containerd-1.6.19-1.amzn2.0.1/src/github.com/containerd/containerd/pkg/cri/streaming/remotecommand/websocket.go:114 +0x32c
github.com/containerd/containerd/pkg/cri/streaming/remotecommand.createStreams(0xc00a1bcb00, {0x55fe91acc880, 0xc0055d8ee0}, 0xc0085c1690?, {0x55fe926be380, 0x4, 0x4}, 0x203001?, 0xc006d8bb00?)
	/builddir/build/BUILD/containerd-1.6.19-1.amzn2.0.1/src/github.com/containerd/containerd/pkg/cri/streaming/remotecommand/httpstream.go:126 +0x9b
github.com/containerd/containerd/pkg/cri/streaming/remotecommand.ServeExec({0x55fe91acc880?, 0xc0055d8ee0?}, 0x6?, {0x55fe91ab5af8, 0xc0006b0ed0}, {0x0, 0x0}, {0x0, 0x0}, {0xc006b918c0, ...}, ...)
	/builddir/build/BUILD/containerd-1.6.19-1.amzn2.0.1/src/github.com/containerd/containerd/pkg/cri/streaming/remotecommand/exec.go:61 +0xc5
github.com/containerd/containerd/pkg/cri/streaming.(*server).serveExec(0xc00040a090, 0xc004cfa630, 0xc006be77a0)
	/builddir/build/BUILD/containerd-1.6.19-1.amzn2.0.1/src/github.com/containerd/containerd/pkg/cri/streaming/server.go:302 +0x19e
github.com/emicklei/go-restful.(*Container).dispatch(0xc00040a1b0, {0x55fe91acc880, 0xc0055d8ee0}, 0xc00a1bcb00)
	/builddir/build/BUILD/containerd-1.6.19-1.amzn2.0.1/src/github.com/containerd/containerd/vendor/github.com/emicklei/go-restful/container.go:288 +0x8c8
net/http.HandlerFunc.ServeHTTP(0x0?, {0x55fe91acc880?, 0xc0055d8ee0?}, 0x0?)
	/usr/lib/golang/src/net/http/server.go:2084 +0x2f
net/http.(*ServeMux).ServeHTTP(0x72?, {0x55fe91acc880, 0xc0055d8ee0}, 0xc00a1bcb00)
	/usr/lib/golang/src/net/http/server.go:2462 +0x149
github.com/emicklei/go-restful.(*Container).ServeHTTP(0x0?, {0x55fe91acc880?, 0xc0055d8ee0?}, 0xc0074c0000?)
	/builddir/build/BUILD/containerd-1.6.19-1.amzn2.0.1/src/github.com/containerd/containerd/vendor/github.com/emicklei/go-restful/container.go:303 +0x27
net/http.serverHandler.ServeHTTP({0xc004cfa510?}, {0x55fe91acc880, 0xc0055d8ee0}, 0xc00a1bcb00)
	/usr/lib/golang/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc0054b1a40, {0x55fe91acdbd8, 0xc003dc2420})
	/usr/lib/golang/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
	/usr/lib/golang/src/net/http/server.go:3071 +0x4db

Notes:

Steps to reproduce the issue

  1. Configure containerd to enable the debug Unix socket.
  2. Run in a terminal: watch "ctr pprof goroutines | grep createWebSocketStreams | wc -l"
  3. Run in a second terminal on the same host:
    crictl exec --transport websocket <any container id> true
    
    or run against a Kubernetes pod running on that host (with kubectl >= v1.30.0):
    kubectl exec <pod name> -- true
    
  4. Watch the number from the watch command in step 2 increase.

Describe the results you received and expected

goroutines and thus memory shouldn't leak

What version of containerd are you using?

v1.6.19, v1.7.11

Any other relevant information

$ crictl info
WARN[0000] runtime connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead. 
{
  "status": {
    "conditions": [
      {
        "type": "RuntimeReady",
        "status": true,
        "reason": "",
        "message": ""
      },
      {
        "type": "NetworkReady",
        "status": true,
        "reason": "",
        "message": ""
      }
    ]
  },
  "cniconfig": {
    "PluginDirs": [
      "/opt/cni/bin"
    ],
    "PluginConfDir": "/etc/cni/net.d",
    "PluginMaxConfNum": 1,
    "Prefix": "eth",
    "Networks": [
      {
        "Config": {
          "Name": "cni-loopback",
          "CNIVersion": "0.3.1",
          "Plugins": [
            {
              "Network": {
                "type": "loopback",
                "ipam": {},
                "dns": {}
              },
              "Source": "{\"type\":\"loopback\"}"
            }
          ],
          "Source": "{\n\"cniVersion\": \"0.3.1\",\n\"name\": \"cni-loopback\",\n\"plugins\": [{\n  \"type\": \"loopback\"\n}]\n}"
        },
        "IFName": "lo"
      },
      {
        "Config": {
          "Name": "aws-cni",
          "CNIVersion": "0.4.0",
          "Plugins": [
            {
              "Network": {
                "name": "aws-cni",
                "type": "aws-cni",
                "ipam": {},
                "dns": {}
              },
              "Source": "{\"mtu\":\"9001\",\"name\":\"aws-cni\",\"pluginLogFile\":\"/var/log/aws-routed-eni/plugin.log\",\"pluginLogLevel\":\"DEBUG\",\"podSGEnforcingMode\":\"strict\",\"type\":\"aws-cni\",\"vethPrefix\":\"eni\"}"
            },
            {
              "Network": {
                "name": "egress-cni",
                "type": "egress-cni",
                "ipam": {
                  "type": "host-local"
                },
                "dns": {}
              },
              "Source": "{\"enabled\":\"false\",\"ipam\":{\"dataDir\":\"/run/cni/v4pd/egress-v6-ipam\",\"ranges\":[[{\"subnet\":\"fd00::ac:00/118\"}]],\"routes\":[{\"dst\":\"::/0\"}],\"type\":\"host-local\"},\"mtu\":\"9001\",\"name\":\"egress-cni\",\"nodeIP\":\"\",\"pluginLogFile\":\"/var/log/aws-routed-eni/egress-v6-plugin.log\",\"pluginLogLevel\":\"DEBUG\",\"randomizeSNAT\":\"prng\",\"type\":\"egress-cni\"}"
            },
            {
              "Network": {
                "type": "portmap",
                "capabilities": {
                  "portMappings": true
                },
                "ipam": {},
                "dns": {}
              },
              "Source": "{\"capabilities\":{\"portMappings\":true},\"snat\":true,\"type\":\"portmap\"}"
            }
          ],
          "Source": "{\n  \"cniVersion\": \"0.4.0\",\n  \"name\": \"aws-cni\",\n  \"disableCheck\": true,\n  \"plugins\": [\n    {\n      \"name\": \"aws-cni\",\n      \"type\": \"aws-cni\",\n      \"vethPrefix\": \"eni\",\n      \"mtu\": \"9001\",\n      \"podSGEnforcingMode\": \"strict\",\n      \"pluginLogFile\": \"/var/log/aws-routed-eni/plugin.log\",\n      \"pluginLogLevel\": \"DEBUG\"\n    },\n    {\n      \"name\": \"egress-cni\",\n      \"type\": \"egress-cni\",\n      \"mtu\": \"9001\",\n      \"enabled\": \"false\",\n      \"randomizeSNAT\": \"prng\",\n      \"nodeIP\": \"\",\n      \"ipam\": {\n         \"type\": \"host-local\",\n         \"ranges\": [[{\"subnet\": \"fd00::ac:00/118\"}]],\n         \"routes\": [{\"dst\": \"::/0\"}],\n         \"dataDir\": \"/run/cni/v4pd/egress-v6-ipam\"\n      },\n      \"pluginLogFile\": \"/var/log/aws-routed-eni/egress-v6-plugin.log\",\n      \"pluginLogLevel\": \"DEBUG\"\n    },\n    {\n      \"type\": \"portmap\",\n      \"capabilities\": {\"portMappings\": true},\n      \"snat\": true\n    }\n  ]\n}\n"
        },
        "IFName": "eth0"
      }
    ]
  },
  "config": {
    "containerd": {
      "snapshotter": "overlayfs",
      "defaultRuntimeName": "runc",
      "defaultRuntime": {
        "runtimeType": "",
        "runtimePath": "",
        "runtimeEngine": "",
        "PodAnnotations": null,
        "ContainerAnnotations": null,
        "runtimeRoot": "",
        "options": null,
        "privileged_without_host_devices": false,
        "privileged_without_host_devices_all_devices_allowed": false,
        "baseRuntimeSpec": "",
        "cniConfDir": "",
        "cniMaxConfNum": 0,
        "snapshotter": "",
        "sandboxMode": ""
      },
      "untrustedWorkloadRuntime": {
        "runtimeType": "",
        "runtimePath": "",
        "runtimeEngine": "",
        "PodAnnotations": null,
        "ContainerAnnotations": null,
        "runtimeRoot": "",
        "options": null,
        "privileged_without_host_devices": false,
        "privileged_without_host_devices_all_devices_allowed": false,
        "baseRuntimeSpec": "",
        "cniConfDir": "",
        "cniMaxConfNum": 0,
        "snapshotter": "",
        "sandboxMode": ""
      },
      "runtimes": {
        "runc": {
          "runtimeType": "io.containerd.runc.v2",
          "runtimePath": "",
          "runtimeEngine": "",
          "PodAnnotations": null,
          "ContainerAnnotations": null,
          "runtimeRoot": "",
          "options": {
            "SystemdCgroup": true
          },
          "privileged_without_host_devices": false,
          "privileged_without_host_devices_all_devices_allowed": false,
          "baseRuntimeSpec": "",
          "cniConfDir": "",
          "cniMaxConfNum": 0,
          "snapshotter": "",
          "sandboxMode": "podsandbox"
        }
      },
      "noPivot": false,
      "disableSnapshotAnnotations": true,
      "discardUnpackedLayers": true,
      "ignoreBlockIONotEnabledErrors": false,
      "ignoreRdtNotEnabledErrors": false
    },
    "cni": {
      "binDir": "/opt/cni/bin",
      "confDir": "/etc/cni/net.d",
      "maxConfNum": 1,
      "setupSerially": false,
      "confTemplate": "",
      "ipPref": ""
    },
    "registry": {
      "configPath": "/etc/containerd/certs.d:/etc/docker/certs.d",
      "mirrors": null,
      "configs": null,
      "auths": null,
      "headers": null
    },
    "imageDecryption": {
      "keyModel": "node"
    },
    "disableTCPService": true,
    "streamServerAddress": "127.0.0.1",
    "streamServerPort": "0",
    "streamIdleTimeout": "4h0m0s",
    "enableSelinux": false,
    "selinuxCategoryRange": 1024,
    "sandboxImage": "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.5",
    "statsCollectPeriod": 10,
    "systemdCgroup": false,
    "enableTLSStreaming": false,
    "x509KeyPairStreaming": {
      "tlsCertFile": "",
      "tlsKeyFile": ""
    },
    "maxContainerLogSize": 16384,
    "disableCgroup": false,
    "disableApparmor": false,
    "restrictOOMScoreAdj": false,
    "maxConcurrentDownloads": 3,
    "disableProcMount": false,
    "unsetSeccompProfile": "",
    "tolerateMissingHugetlbController": true,
    "disableHugetlbController": true,
    "device_ownership_from_security_context": false,
    "ignoreImageDefinedVolumes": false,
    "netnsMountsUnderStateDir": false,
    "enableUnprivilegedPorts": false,
    "enableUnprivilegedICMP": false,
    "enableCDI": false,
    "cdiSpecDirs": [
      "/etc/cdi",
      "/var/run/cdi"
    ],
    "imagePullProgressTimeout": "5m0s",
    "drainExecSyncIOTimeout": "0s",
    "containerdRootDir": "/var/lib/containerd",
    "containerdEndpoint": "/run/containerd/containerd.sock",
    "rootDir": "/var/lib/containerd/io.containerd.grpc.v1.cri",
    "stateDir": "/run/containerd/io.containerd.grpc.v1.cri"
  },
  "golang": "go1.20.12",
  "lastCNILoadStatus": "OK",
  "lastCNILoadStatus.default": "OK"
}

Show configuration if it is related to CRI plugin.

$ cat /etc/containerd/config.toml
version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
imports = ["/etc/containerd/config.d/*.toml"]

[grpc]
address = "/run/containerd/containerd.sock"

[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
discard_unpacked_layers = true

[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.5"

[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/etc/containerd/certs.d:/etc/docker/certs.d"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true

[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/opt/cni/bin"
conf_dir = "/etc/cni/net.d"

[debug]
  address = "/run/containerd/debug.sock"

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions