Skip to content

goroutine/fd leak still present in 27.0.3 #48236

@Yinette

Description

@Yinette

Description

Noticed issue #45052 while troubleshooting a system crash, and it seems to be the same issue in 27.0.3, however I could be wrong. I've collected pprof out of the running dockerd with about 5000~ threads, and it seems to be at least related to the linked issue.

To dump the goroutines i ran curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine --output pprof-goroutines-dockerd.gz

Reproduce

We have a set of containers that collect a "device" from a redis to query, if all "devices" have a container assigned to process their data, the surplus containers will exit gracefully with 0 and be restarted to repeat the process.

In this case, dockerd will exponentially spawn threads under the parent process until it reaches the max-threads limit in the Kernel, causing it to crash.

Expected behavior

dockerd should not reach kernel process limits and crash.

docker version

Client: Docker Engine - Community
 Version:           27.0.3
 API version:       1.46
 Go version:        go1.21.11
 Git commit:        7d4bcd8
 Built:             Sat Jun 29 00:02:29 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          27.0.3
  API version:      1.46 (minimum version 1.24)
  Go version:       go1.21.11
  Git commit:       662f78c
  Built:            Sat Jun 29 00:02:29 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.18
  GitCommit:        ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
 runc:
  Version:          1.7.18
  GitCommit:        v1.1.13-0-g58aa920
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client: Docker Engine - Community
 Version:    27.0.3
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.15.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.28.1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose
  scan: Docker Scan (Docker Inc.)
    Version:  v0.21.0
    Path:     /usr/libexec/docker/cli-plugins/docker-scan

Server:
 Containers: 19
  Running: 19
  Paused: 0
  Stopped: 0
 Images: 9
 Server Version: 27.0.3
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: local
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
 runc version: v1.1.13-0-g58aa920
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
 Kernel Version: 5.4.0-131-generic
 Operating System: Ubuntu 20.04.5 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 15.51GiB
 Name: [redacted]
 ID: 702f8011-d0a1-44ce-9087-f5ccf6a0b9f8
 Docker Root Dir: /usr/local/[redacted]/var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

Additional Info

I was able to pull pprof for goroutines out of dockerd while it was running and obtain information about what it's getting stuck on:
profile001

(pprof) text
Showing nodes accounting for 10719, 100% of 10721 total
Dropped 130 nodes (cum <= 53)
Showing top 10 nodes out of 24
      flat  flat%   sum%        cum   cum%
      5487 51.18% 51.18%       5487 51.18%  runtime.gopark
      5232 48.80%   100%       5232 48.80%  syscall.Syscall6
         0     0%   100%       5220 48.69%  github.com/docker/docker/api/server/router/container.notifyClosed
         0     0%   100%       5220 48.69%  github.com/docker/docker/api/server/router/container.notifyClosed.func1
         0     0%   100%       5220 48.69%  github.com/docker/docker/daemon.(*Daemon).containerAttach.func2
         0     0%   100%       5220 48.69%  github.com/docker/docker/internal/unix_noeintr.EpollWait
         0     0%   100%         76  0.71%  github.com/docker/docker/pkg/ioutils.(*BytesPipe).Read
         0     0%   100%         76  0.71%  github.com/docker/docker/pkg/pools.Copy
         0     0%   100%       5220 48.69%  golang.org/x/sys/unix.EpollWait
         0     0%   100%       5220 48.69%  internal/poll.(*FD).RawControl

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions