Skip to content

Multiple parallel docker build runs leak disk space that can't be recovered (with reproduction) #46136

@intgr

Description

@intgr

Description

My CI machines, that run lots of docker build and docker run commands, often in parallel, keep running out of disk space.

I have figured out that when running multiple docker build commands in parallel, Docker loses track of some directories and files it creates under the /var/lib/docker/overlay2 directory. This issue does not occur when the "build" commands are run in sequence (e.g. remove the trailing & in repro.sh).

After the build, despite running docker system prune -af --volumes to delete all build cache/artifacts and using docker system df to verify that there should be no disk space in use, the size of Docker's overlay2 directory grows every time with no limit.

Reproduce

I have published a shell script and Dockerfile that systematically reproduces this issue at https://github.com/intgr/bug-reports/tree/main/docker-build-disk-space-leak

Run the ./repro.sh shell script multiple times and notice overlay2 directory increasing in size.

The script needs to run docker commands and uses sudo to monitor the size of the overlay2 directory.

It can be tested in the public playground https://labs.play-with-docker.com/ for example.

git clone https://github.com/intgr/bug-reports
cd bug-reports/docker-build-disk-space-leak
./repro.sh

Example output when running the script

Notice that the ACTUAL number of items and disk space keeps growing every time when running ./repro.sh, despite Docker reporting 0 bytes used.

$ ./repro.sh
BUILDING...
build done!

pruned everything. Docker THINKS this much disk space is in use:
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          0         0         0B        0B
Containers      0         0         0B        0B
Local Volumes   0         0         0B        0B
Build Cache     0         0         0B        0B

ACTUAL disk space used: 7.4M	/var/lib/docker/overlay2            <-- !!!
ACTUAL number of items in /var/lib/docker/overlay2: 12              <-- !!!

$ ./repro.sh
BUILDING...
build done!

pruned everything. Docker THINKS this much disk space is in use:
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          0         0         0B        0B
Containers      0         0         0B        0B
Local Volumes   0         0         0B        0B
Build Cache     0         0         0B        0B

ACTUAL disk space used: 7.5M	/var/lib/docker/overlay2            <-- !!!
ACTUAL number of items in /var/lib/docker/overlay2: 21              <-- !!!

$ ./repro.sh
BUILDING...
build done!

pruned everything. Docker THINKS this much disk space is in use:
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          0         0         0B        0B
Containers      0         0         0B        0B
Local Volumes   0         0         0B        0B
Build Cache     0         0         0B        0B

ACTUAL disk space used: 7.6M	/var/lib/docker/overlay2            <-- !!!
ACTUAL number of items in /var/lib/docker/overlay2: 30              <-- !!!

Expected behavior

When I delete all containers, all images, volumes, caches, everything, then Docker disk usage should return back near to what it uses after a clean installation.

docker version

Client:
 Version:           24.0.2
 API version:       1.43
 Go version:        go1.20.4
 Git commit:        cb74dfc
 Built:             Thu May 25 21:50:49 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.2
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.4
  Git commit:       659604f
  Built:            Thu May 25 21:35:04 2023
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          v1.7.1
  GitCommit:        1677a17964311325ed1c31e2c0a3589ce6d5c30d
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client:
 Version:    24.0.2
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.5
    Path:     /usr/local/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.18.1
    Path:     /usr/local/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 24.0.2
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 1677a17964311325ed1c31e2c0a3589ce6d5c30d
 runc version: v1.1.7-0-g860f061
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
 Kernel Version: 4.4.0-210-generic
 Operating System: Alpine Linux v3.18 (containerized)
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 31.42GiB
 Name: node2
 ID: 39aba340-eb4a-4ddf-b7bf-f2a4d3a52192
 Docker Root Dir: /var/lib/docker
 Debug Mode: true
  File Descriptors: 27
  Goroutines: 43
  System Time: 2023-08-01T15:48:58.988555479Z
  EventsListeners: 0
 Experimental: true
 Insecure Registries:
  127.0.0.1
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

WARNING: API is accessible on http://0.0.0.0:2375 without encryption.
         Access to the remote API is equivalent to root access on the host. Refer
         to the 'Docker daemon attack surface' section in the documentation for
         more information: https://docs.docker.com/go/attack-surface/
WARNING: No swap limit support
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

Additional Info

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions