Skip to content

List API includes partially-created containers which cannot be inspected #44512

@belugabehr

Description

@belugabehr

Description

I am working on an application that was implemented before the various docker system prune commands were available and therefore had manually implemented pruning from the client side. The logic is pretty straightforward:

  1. Request a list of all docker containers
  2. For each container, inspect it for more details
  3. Make a decision about deleting the container or not (e.g., status, error code, creation date)

This process fails with some frequency as there are times when attempting to inspect a container that was returned by list returns a 404 code. I've noticed that it consistently fails to inspect containers that are in a "created" state.

In the application, there is one thread that is regularly performing this prune process, and another thread that is launching containers by first creating the container and then starting it.

# Example container: aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b

# Thread 1 - Creating the container
Nov 22 13:29:42 host.domain.com dockerd[1766]: time="2022-11-22T13:29:42.814004115Z" level=debug msg="copying image data from aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b:/var, to f2044d2badae69af56bcc84b739b7e
Nov 22 13:29:43 host.domain.com dockerd[1766]: time="2022-11-22T13:29:43.064228098Z" level=debug msg="copying image data from aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b:/root, to 16bb75cc1a51a5cd69a371b9b6cda
Nov 22 13:29:43 host.domain.com dockerd[1766]: time="2022-11-22T13:29:43.064312480Z" level=debug msg="copying image data from aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b:/opt, to 844b97098daaf1040aa2cafcf1d3a4
Nov 22 13:29:44 host.domain.com dockerd[1766]: time="2022-11-22T13:29:44.381082225Z" level=debug msg="copying image data from aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b:/run, to 87c3f7a34557a0b08e3be1d9404943
Nov 22 13:29:44 host.domain.com dockerd[1766]: time="2022-11-22T13:29:44.381892484Z" level=debug msg="copying image data from aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b:/private, to 424c0af91b3c2cde0f7f44ab82
Nov 22 13:29:44 host.domain.com dockerd[1766]: time="2022-11-22T13:29:44.381916606Z" level=debug msg="copying image data from aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b:/tmp, to 5a77fcd42f70e14a8652a0a96df739

# Thread 2 - Prune process - List all of the containers and then inspect each container in a loop
Nov 22 13:29:47 host.domain.com dockerd[1766]: time="2022-11-22T13:29:47.211366766Z" level=debug msg="Calling GET /v1.23/containers/json?all=true"

# Thread 2 - Now returns a 404 error to the client - note these are subsequent calls in the logs
Nov 22 13:29:47 host.domain.com dockerd[1766]: time="2022-11-22T13:29:47.218430962Z" level=debug msg="Calling GET /v1.23/containers/aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b/json"

# Thread 2 - The container is in a very unexpected state - so try to delete it (also fails with a 404)
Nov 22 13:29:47 host.domain.com dockerd[1766]: time="2022-11-22T13:29:47.224637054Z" level=debug msg="Calling DELETE /v1.23/containers/aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b?v=true&force=true"

# Thread 2 - More pruning activity
Nov 22 13:29:47 host.domain.com dockerd[1766]: time="2022-11-22T13:29:47.230754029Z" level=debug msg="Calling GET /v1.23/volumes?filters=%7B%22dangling%22%3A%5B%22true%22%5D%7D"
Nov 22 13:29:47 host.domain.com dockerd[1766]: time="2022-11-22T13:29:47.230796460Z" level=debug msg=VolumeStore.Find ByType=service.andCombinator ByValue="[0x5581d5dc3500 false]"

# Thread 1 - Now the container gets started
Nov 22 13:29:50 host.domain.com dockerd[1766]: time="2022-11-22T13:29:50.190019073Z" level=debug msg="Calling POST /v1.23/containers/aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b/start"
Nov 22 13:29:50 host.domain.com dockerd[1766]: time="2022-11-22T13:29:50.190411705Z" level=warning msg="Specifying a kernel memory limit is deprecated and will be removed in a future release."
Nov 22 13:29:50 host.domain.com dockerd[1766]: time="2022-11-22T13:29:50.190692420Z" level=debug msg="container mounted via layerStore: &{/local/docker/200000.200000/overlay2/dac334114dcbd314826c1b4a2531b9036b8a7a6a399067c4333d85e
Nov 22 13:29:50 host.domain.com dockerd[1766]: time="2022-11-22T13:29:50.190905625Z" level=debug msg="Assigning addresses for endpoint confident_kilby's interface on network bridge"
Nov 22 13:29:50 host.domain.com dockerd[1766]: time="2022-11-22T13:29:50.190928606Z" level=debug msg="RequestAddress(LocalDefault/X.X.0.0/16/X.X.X.X/16, <nil>, map[])"
Nov 22 13:29:50 host.domain.com dockerd[1766]: time="2022-11-22T13:29:50.190957426Z" level=debug msg="Request address PoolID:X.X.X.X/16 App: ipam/default/data, ID: LocalDefault/X.X.X.X/16, DBIndex: 0x0, Bits: 65536, Unselect
Nov 22 13:29:50 host.domain.com dockerd[1766]: time="2022-11-22T13:29:50.206877322Z" level=debug msg="Assigning addresses for endpoint confident_kilby's interface on network bridge"
Nov 22 13:29:50 host.domain.com dockerd[1766]: time="2022-11-22T13:29:50.210671885Z" level=debug msg="Programming external connectivity on endpoint confident_kilby (acde711aa29fb7c1a507e54744e19051cfb3af21d0d43d2a234efbd58284ddb5)
Nov 22 13:29:50 host.domain.com dockerd[1766]: time="2022-11-22T13:29:50.211956905Z" level=debug msg="EnableService aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b START"
Nov 22 13:29:50 host.domain.com dockerd[1766]: time="2022-11-22T13:29:50.211970514Z" level=debug msg="EnableService aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b DONE"

Subsequent attempts to prune the container are successful. It seems perhaps that there is a small window of time where the container is in a state that makes it a candidate for list but not for inspect.

Reproduce

Happens randomly. I was unable to mimic this with CLI.

Expected behavior

Assuming no external modification between calls, all containers returned by list are available to inspect.

docker version

Client:
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.15.14
 Git commit:        f0df350
 Built:             Wed Nov 17 03:05:36 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.15.14
  Git commit:       b0f5bc3
  Built:            Wed Nov 17 03:06:14 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.6
  GitCommit:        d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc:
  Version:          1.0.0
  GitCommit:        84113eef6fc27af1b01b3181f31bbaf708715301
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 15
 Server Version: 20.10.7
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan (Redacted) null overlay
  Log: (Redacted)
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc version: 84113eef6fc27af1b01b3181f31bbaf708715301
 init version: de40ad0
 Security Options: (Redacted)
 Kernel Version: 5.10.144-111.639.amzn2int.x86_64
 Operating System: Amazon Linux 2
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 30.84GiB
 Name: host.domain.com
 ID: (Redacted)
 Docker Root Dir: /local/docker/200000.200000
 Debug Mode: true
  File Descriptors: 38
  Goroutines: 48
  System Time: 2022-11-22T21:01:45.968617867Z
  EventsListeners: 0
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries: (Redacted)
 Live Restore Enabled: false

Additional Info

This is from the client perspective (docker-java):

# First inspect failure
2022-11-22T13:29:47.218Z [INFO]   (Timer-0)  Was going to remove container aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b but it no longer exists
2022-11-22T13:29:47.218Z [DEBUG]  (Timer-0)  Container aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b detailed error
com.github.dockerjava.api.exception.NotFoundException: Status 404: No such container: aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b

# Since inspect failed unexpectedly, things are weird, just try to delete the container
2022-11-22T13:29:47.224Z [WARN]  (Timer-0)  Container aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b cannot be force removed. Manual removal is required.
2022-11-22T13:29:47.224Z [DEBUG] (Timer-0)  Container aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b detailed error
com.github.dockerjava.api.exception.NotFoundException: Status 404: No such container: aee19890438494eeecbf587b11221198ff199fe29437ecaa466c76156177b63b

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/apiAPIkind/bugBugs are bugs. The cause may or may not be known at triage time so debugging may be needed.status/confirmed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions