-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
Description
Noticed issue #45052 while troubleshooting a system crash, and it seems to be the same issue in 27.0.3, however I could be wrong. I've collected pprof out of the running dockerd with about 5000~ threads, and it seems to be at least related to the linked issue.
To dump the goroutines i ran curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine --output pprof-goroutines-dockerd.gz
Reproduce
We have a set of containers that collect a "device" from a redis to query, if all "devices" have a container assigned to process their data, the surplus containers will exit gracefully with 0 and be restarted to repeat the process.
In this case, dockerd will exponentially spawn threads under the parent process until it reaches the max-threads limit in the Kernel, causing it to crash.
Expected behavior
dockerd should not reach kernel process limits and crash.
docker version
Client: Docker Engine - Community
Version: 27.0.3
API version: 1.46
Go version: go1.21.11
Git commit: 7d4bcd8
Built: Sat Jun 29 00:02:29 2024
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 27.0.3
API version: 1.46 (minimum version 1.24)
Go version: go1.21.11
Git commit: 662f78c
Built: Sat Jun 29 00:02:29 2024
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.7.18
GitCommit: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
runc:
Version: 1.7.18
GitCommit: v1.1.13-0-g58aa920
docker-init:
Version: 0.19.0
GitCommit: de40ad0docker info
Client: Docker Engine - Community
Version: 27.0.3
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.15.1
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.28.1
Path: /usr/libexec/docker/cli-plugins/docker-compose
scan: Docker Scan (Docker Inc.)
Version: v0.21.0
Path: /usr/libexec/docker/cli-plugins/docker-scan
Server:
Containers: 19
Running: 19
Paused: 0
Stopped: 0
Images: 9
Server Version: 27.0.3
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: local
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: runc io.containerd.runc.v2
Default Runtime: runc
Init Binary: docker-init
containerd version: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
runc version: v1.1.13-0-g58aa920
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
Kernel Version: 5.4.0-131-generic
Operating System: Ubuntu 20.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.51GiB
Name: [redacted]
ID: 702f8011-d0a1-44ce-9087-f5ccf6a0b9f8
Docker Root Dir: /usr/local/[redacted]/var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit supportAdditional Info
I was able to pull pprof for goroutines out of dockerd while it was running and obtain information about what it's getting stuck on:

(pprof) text
Showing nodes accounting for 10719, 100% of 10721 total
Dropped 130 nodes (cum <= 53)
Showing top 10 nodes out of 24
flat flat% sum% cum cum%
5487 51.18% 51.18% 5487 51.18% runtime.gopark
5232 48.80% 100% 5232 48.80% syscall.Syscall6
0 0% 100% 5220 48.69% github.com/docker/docker/api/server/router/container.notifyClosed
0 0% 100% 5220 48.69% github.com/docker/docker/api/server/router/container.notifyClosed.func1
0 0% 100% 5220 48.69% github.com/docker/docker/daemon.(*Daemon).containerAttach.func2
0 0% 100% 5220 48.69% github.com/docker/docker/internal/unix_noeintr.EpollWait
0 0% 100% 76 0.71% github.com/docker/docker/pkg/ioutils.(*BytesPipe).Read
0 0% 100% 76 0.71% github.com/docker/docker/pkg/pools.Copy
0 0% 100% 5220 48.69% golang.org/x/sys/unix.EpollWait
0 0% 100% 5220 48.69% internal/poll.(*FD).RawControl