-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
Description
I am working on an application that was implemented before the various docker system prune commands were available and therefore had manually implemented pruning from the client side. The logic is pretty straightforward:
- Request a list of all docker containers
- For each container, inspect it for more details
- Make a decision about deleting the container or not (e.g., status, error code, creation date)
This process fails with some frequency as there are times when attempting to inspect a container that was returned by list returns a 404 code. I've noticed that it consistently fails to inspect containers that are in a "created" state.
In the application, there is one thread that is regularly performing this prune process, and another thread that is launching containers by first creating the container and then starting it.
# Example container: aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b
# Thread 1 - Creating the container
Nov 22 13:29:42 host.domain.com dockerd[1766]: time="2022-11-22T13:29:42.814004115Z" level=debug msg="copying image data from aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b:/var, to f2044d2badae69af56bcc84b739b7e
Nov 22 13:29:43 host.domain.com dockerd[1766]: time="2022-11-22T13:29:43.064228098Z" level=debug msg="copying image data from aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b:/root, to 16bb75cc1a51a5cd69a371b9b6cda
Nov 22 13:29:43 host.domain.com dockerd[1766]: time="2022-11-22T13:29:43.064312480Z" level=debug msg="copying image data from aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b:/opt, to 844b97098daaf1040aa2cafcf1d3a4
Nov 22 13:29:44 host.domain.com dockerd[1766]: time="2022-11-22T13:29:44.381082225Z" level=debug msg="copying image data from aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b:/run, to 87c3f7a34557a0b08e3be1d9404943
Nov 22 13:29:44 host.domain.com dockerd[1766]: time="2022-11-22T13:29:44.381892484Z" level=debug msg="copying image data from aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b:/private, to 424c0af91b3c2cde0f7f44ab82
Nov 22 13:29:44 host.domain.com dockerd[1766]: time="2022-11-22T13:29:44.381916606Z" level=debug msg="copying image data from aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b:/tmp, to 5a77fcd42f70e14a8652a0a96df739
# Thread 2 - Prune process - List all of the containers and then inspect each container in a loop
Nov 22 13:29:47 host.domain.com dockerd[1766]: time="2022-11-22T13:29:47.211366766Z" level=debug msg="Calling GET /v1.23/containers/json?all=true"
# Thread 2 - Now returns a 404 error to the client - note these are subsequent calls in the logs
Nov 22 13:29:47 host.domain.com dockerd[1766]: time="2022-11-22T13:29:47.218430962Z" level=debug msg="Calling GET /v1.23/containers/aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b/json"
# Thread 2 - The container is in a very unexpected state - so try to delete it (also fails with a 404)
Nov 22 13:29:47 host.domain.com dockerd[1766]: time="2022-11-22T13:29:47.224637054Z" level=debug msg="Calling DELETE /v1.23/containers/aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b?v=true&force=true"
# Thread 2 - More pruning activity
Nov 22 13:29:47 host.domain.com dockerd[1766]: time="2022-11-22T13:29:47.230754029Z" level=debug msg="Calling GET /v1.23/volumes?filters=%7B%22dangling%22%3A%5B%22true%22%5D%7D"
Nov 22 13:29:47 host.domain.com dockerd[1766]: time="2022-11-22T13:29:47.230796460Z" level=debug msg=VolumeStore.Find ByType=service.andCombinator ByValue="[0x5581d5dc3500 false]"
# Thread 1 - Now the container gets started
Nov 22 13:29:50 host.domain.com dockerd[1766]: time="2022-11-22T13:29:50.190019073Z" level=debug msg="Calling POST /v1.23/containers/aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b/start"
Nov 22 13:29:50 host.domain.com dockerd[1766]: time="2022-11-22T13:29:50.190411705Z" level=warning msg="Specifying a kernel memory limit is deprecated and will be removed in a future release."
Nov 22 13:29:50 host.domain.com dockerd[1766]: time="2022-11-22T13:29:50.190692420Z" level=debug msg="container mounted via layerStore: &{/local/docker/200000.200000/overlay2/dac334114dcbd314826c1b4a2531b9036b8a7a6a399067c4333d85e
Nov 22 13:29:50 host.domain.com dockerd[1766]: time="2022-11-22T13:29:50.190905625Z" level=debug msg="Assigning addresses for endpoint confident_kilby's interface on network bridge"
Nov 22 13:29:50 host.domain.com dockerd[1766]: time="2022-11-22T13:29:50.190928606Z" level=debug msg="RequestAddress(LocalDefault/X.X.0.0/16/X.X.X.X/16, <nil>, map[])"
Nov 22 13:29:50 host.domain.com dockerd[1766]: time="2022-11-22T13:29:50.190957426Z" level=debug msg="Request address PoolID:X.X.X.X/16 App: ipam/default/data, ID: LocalDefault/X.X.X.X/16, DBIndex: 0x0, Bits: 65536, Unselect
Nov 22 13:29:50 host.domain.com dockerd[1766]: time="2022-11-22T13:29:50.206877322Z" level=debug msg="Assigning addresses for endpoint confident_kilby's interface on network bridge"
Nov 22 13:29:50 host.domain.com dockerd[1766]: time="2022-11-22T13:29:50.210671885Z" level=debug msg="Programming external connectivity on endpoint confident_kilby (acde711aa29fb7c1a507e54744e19051cfb3af21d0d43d2a234efbd58284ddb5)
Nov 22 13:29:50 host.domain.com dockerd[1766]: time="2022-11-22T13:29:50.211956905Z" level=debug msg="EnableService aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b START"
Nov 22 13:29:50 host.domain.com dockerd[1766]: time="2022-11-22T13:29:50.211970514Z" level=debug msg="EnableService aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b DONE"
Subsequent attempts to prune the container are successful. It seems perhaps that there is a small window of time where the container is in a state that makes it a candidate for list but not for inspect.
Reproduce
Happens randomly. I was unable to mimic this with CLI.
Expected behavior
Assuming no external modification between calls, all containers returned by list are available to inspect.
docker version
Client:
Version: 20.10.7
API version: 1.41
Go version: go1.15.14
Git commit: f0df350
Built: Wed Nov 17 03:05:36 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server:
Engine:
Version: 20.10.7
API version: 1.41 (minimum version 1.12)
Go version: go1.15.14
Git commit: b0f5bc3
Built: Wed Nov 17 03:06:14 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.6
GitCommit: d71fcd7d8303cbf684402823e425e9dd2e99285d
runc:
Version: 1.0.0
GitCommit: 84113eef6fc27af1b01b3181f31bbaf708715301
docker-init:
Version: 0.19.0
GitCommit: de40ad0docker info
Client:
Context: default
Debug Mode: false
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 15
Server Version: 20.10.7
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan (Redacted) null overlay
Log: (Redacted)
Swarm: inactive
Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
Default Runtime: runc
Init Binary: docker-init
containerd version: d71fcd7d8303cbf684402823e425e9dd2e99285d
runc version: 84113eef6fc27af1b01b3181f31bbaf708715301
init version: de40ad0
Security Options: (Redacted)
Kernel Version: 5.10.144-111.639.amzn2int.x86_64
Operating System: Amazon Linux 2
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 30.84GiB
Name: host.domain.com
ID: (Redacted)
Docker Root Dir: /local/docker/200000.200000
Debug Mode: true
File Descriptors: 38
Goroutines: 48
System Time: 2022-11-22T21:01:45.968617867Z
EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries: (Redacted)
Live Restore Enabled: falseAdditional Info
This is from the client perspective (docker-java):
# First inspect failure
2022-11-22T13:29:47.218Z [INFO] (Timer-0) Was going to remove container aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b but it no longer exists
2022-11-22T13:29:47.218Z [DEBUG] (Timer-0) Container aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b detailed error
com.github.dockerjava.api.exception.NotFoundException: Status 404: No such container: aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b
# Since inspect failed unexpectedly, things are weird, just try to delete the container
2022-11-22T13:29:47.224Z [WARN] (Timer-0) Container aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b cannot be force removed. Manual removal is required.
2022-11-22T13:29:47.224Z [DEBUG] (Timer-0) Container aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b detailed error
com.github.dockerjava.api.exception.NotFoundException: Status 404: No such container: aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b