Skip to content

Daemon hangs after starting and stopping many containers #26251

@tswift242

Description

@tswift242

Output of docker version:

Client:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:        Thu Aug 18 05:22:43 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:        Thu Aug 18 05:22:43 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 43
 Running: 17
 Paused: 0
 Stopped: 26
Images: 21
Server Version: 1.12.1
Storage Driver: overlay
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor
Kernel Version: 4.4.19-040419-generic
Operating System: Ubuntu 14.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 120.1 GiB
Name: ip-10-97-1-130
ID: NB4F:S4ON:5TH6:64OM:SOG6:JUZ4:YVLY:X4C2:XIUE:YOCL:LKKM:32K2
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 191
 Goroutines: 353
 System Time: 2016-09-01T17:53:42.395022289Z
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):

AWS EC2. We currently run 46 containers on a single EC2 instance and expect 46 to be up at all times; we use upstart to start a new container each time a container is stopped. However, currently there are far fewer than 46 up and it looks to be a Docker issue.

Steps to reproduce the issue:

  1. Start and stop a lot of containers on the same host. In particular, start a lot of containers (dozens) concurrently on the same host.

Describe the results you received:
Daemon hangs. Doing a "docker ps" has taken between 21 seconds and 8 and half minutes over the 10 trials I've done. "docker ps -a" shows many containers with the Dead status.

Describe the results you expected:
Daemon should not hang even after starting and stopping many containers.

Additional information you deem important (e.g. issue happens only occasionally):
This EC2 instance has only been up for 17 hours.

I've attached the stack trace for the daemon. I'd like to note that even after I sent SIGUSR1 to the daemon and a new dockerd process came up, "docker ps" still takes a long time.

I'm also attaching the daemon logs with debug output (I can only attach a subset of the logs, as the whole log file is over 100MB in size, so I've taken out a bunch of lines in the middle) You'll see lots of containers being created, but none of these containers seem to come/stay up. To make that clearer, I'm also attaching a part of the host's syslog. We log a line for every container that's created. The syslog shows 109 containers being created over 50 seconds. However, upstart is only logging that 9 containers exited in that same span. It seems like all these docker run requests must be being queued up and not being acted on.

The first error I see in the docker daemon logs is: "Handler for POST /v1.24/containers/66d5297807977a7e95cbf57163c4ba3c1864dd1dd4afad2b0568caadb9576809/start returned error: driver failed programming external connectivity on endpoint stoic_brown_45 (d9cd0a967f51929b7b597c1084e3086186a758098563710692f903ad39983451): failed to update bridge endpoint d9cd0a9 to store: failed to update bridge store for object type *bridge.bridgeEndpoint: timeout"

This seems to be the cause of at least some of the proceeding bad behavior, such as "Bind for 0.0.0.0:10045 failed: port is already allocated"

docker_stack_trace.txt
docker_truncated.txt
syslog_truncated.txt

Metadata

Metadata

Assignees

Labels

area/networkingNetworkingarea/runtimeRuntimekind/bugBugs are bugs. The cause may or may not be known at triage time so debugging may be needed.priority/P1Important: P1 issues are a top priority and a must-have for the next release.version/1.12

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions