Description
Occasionally the Docker daemon stops responding to interactions with a containers that have HEALTHCHECKs. The problem presents itself in several older versions of Docker and the latest packaged versions on Ubuntu and v17.12.0-ce on Amazon Linux.
By our observations, this looks to be a race condition that is met with a deadlock that prevents further calls against affected containers.
This observed issue may be related to #35933 . I'm working on bisecting the releases (using https://github.com/docker/docker-ce) using this repro to narrow down the problem further in any case.
Steps to reproduce the issue:
A repro case has been built and run against several version of docker with positive results after a few rounds of execution (I recommend 10-20 rounds to tickle the bug). There likely isn't anything specific about the 2 containers, but it has been positively triggering the bug for this test.
- Build container image with HEALTHCHECK defined (
echo hello every 1s)
- Start 2 containers using image
- Wait some time (
10s in our test)
- Stop containers
- Inspect containers
Describe the results you received:
Started containers appear to continue running and to be healthy despite being non-responsive.
ubuntu@ip-172-31-37-156:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0cf518c205f7 docker-poke:healthchecks "sh -c 'sleep 30m'" 17 minutes ago Up 17 minutes (healthy) sad_hugle
ubuntu@ip-172-31-37-156:~$ docker inspect 0cf518c205f7
^C
Additionally, the output of docker ps will continue reporting that the container is still up and running even though the process will exit after 30m (started with sleep 30m).
0cf518c205f7 docker-poke:healthchecks "sh -c 'sleep 30m'" 37 minutes ago Up 37 minutes (healthy) sad_hugle
Describe the results you expected:
I expected that I would be able to inspect this container.
docker inspect 0cf518c205f7
{
...
}
Additional information you deem important (e.g. issue happens only occasionally):
This issue is readily made apparent with a few concurrent runs, but otherwise lies dormant even with many serial runs.
Output of docker version:
Client:
Version: 17.12.1-ce
API version: 1.35
Go version: go1.9.4
Git commit: 7390fc6
Built: Tue Feb 27 22:17:40 2018
OS/Arch: linux/amd64
Server:
Engine:
Version: 17.12.1-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.4
Git commit: 7390fc6
Built: Tue Feb 27 22:16:13 2018
OS/Arch: linux/amd64
Experimental: false
Output of docker info:
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 5
Server Version: 17.12.1-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 4
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-1052-aws
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.625GiB
Name: ip-172-31-37-156
ID: L4W4:V4WA:OHSS:QTGL:DRJG:32GX:7DKK:FFLO:WKR2:IJYV:NKDG:GWRA
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Additional environment details (AWS, VirtualBox, physical, etc.):
Ubuntu 16.04 on AWS EC2 using the repro runner
Thanks @samuelkarp!
| Package |
Result |
17.09.1~ce-0~ubuntu |
pass |
17.10.0~ce-0~ubuntu |
pass |
17.11.0~ce-0~ubuntu |
pass |
17.12.0~ce~rc1-0~ubuntu |
fail |
17.12.0~ce-0~ubuntu |
fail |
17.12.1~ce-0~ubuntu |
fail |
18.01.0~ce-0~ubuntu |
fail |
18.02.0~ce-0~ubuntu |
fail |
18.03.0~ce~rc4-0~ubuntu |
fail |
Amazon Linux on AWS EC2 using the repro runner
Thanks @jhaynes!
| Package |
Result |
17.12.0-ce |
fail |
17.09.1-ce |
pass |
Description
Occasionally the Docker daemon stops responding to interactions with a containers that have HEALTHCHECKs. The problem presents itself in several older versions of Docker and the latest packaged versions on Ubuntu and v17.12.0-ce on Amazon Linux.
By our observations, this looks to be a race condition that is met with a deadlock that prevents further calls against affected containers.
This observed issue may be related to #35933 . I'm working on bisecting the releases (using https://github.com/docker/docker-ce) using this repro to narrow down the problem further in any case.
Steps to reproduce the issue:
A repro case has been built and run against several version of docker with positive results after a few rounds of execution (I recommend 10-20 rounds to tickle the bug). There likely isn't anything specific about the 2 containers, but it has been positively triggering the bug for this test.
echo helloevery1s)10sin our test)Describe the results you received:
Started containers appear to continue running and to be healthy despite being non-responsive.
Additionally, the output of
docker pswill continue reporting that the container is still up and running even though the process will exit after 30m (started withsleep 30m).0cf518c205f7 docker-poke:healthchecks "sh -c 'sleep 30m'" 37 minutes ago Up 37 minutes (healthy) sad_hugleDescribe the results you expected:
I expected that I would be able to inspect this container.
docker inspect 0cf518c205f7 { ... }Additional information you deem important (e.g. issue happens only occasionally):
This issue is readily made apparent with a few concurrent runs, but otherwise lies dormant even with many serial runs.
Output of
docker version:Output of
docker info:Additional environment details (AWS, VirtualBox, physical, etc.):
Ubuntu 16.04 on AWS EC2 using the repro runner
Thanks @samuelkarp!
17.09.1~ce-0~ubuntu17.10.0~ce-0~ubuntu17.11.0~ce-0~ubuntu17.12.0~ce~rc1-0~ubuntu17.12.0~ce-0~ubuntu17.12.1~ce-0~ubuntu18.01.0~ce-0~ubuntu18.02.0~ce-0~ubuntu18.03.0~ce~rc4-0~ubuntuAmazon Linux on AWS EC2 using the repro runner
Thanks @jhaynes!
17.12.0-ce17.09.1-ce