Description
My CI machines, that run lots of docker build and docker run commands, often in parallel, keep running out of disk space.
I have figured out that when running multiple docker build commands in parallel, Docker loses track of some directories and files it creates under the /var/lib/docker/overlay2 directory. This issue does not occur when the "build" commands are run in sequence (e.g. remove the trailing & in repro.sh).
After the build, despite running docker system prune -af --volumes to delete all build cache/artifacts and using docker system df to verify that there should be no disk space in use, the size of Docker's overlay2 directory grows every time with no limit.
Reproduce
I have published a shell script and Dockerfile that systematically reproduces this issue at https://github.com/intgr/bug-reports/tree/main/docker-build-disk-space-leak
Run the ./repro.sh shell script multiple times and notice overlay2 directory increasing in size.
The script needs to run docker commands and uses sudo to monitor the size of the overlay2 directory.
It can be tested in the public playground https://labs.play-with-docker.com/ for example.
git clone https://github.com/intgr/bug-reports
cd bug-reports/docker-build-disk-space-leak
./repro.sh
Example output when running the script
Notice that the ACTUAL number of items and disk space keeps growing every time when running ./repro.sh, despite Docker reporting 0 bytes used.
$ ./repro.sh
BUILDING...
build done!
pruned everything. Docker THINKS this much disk space is in use:
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 0 0 0B 0B
Containers 0 0 0B 0B
Local Volumes 0 0 0B 0B
Build Cache 0 0 0B 0B
ACTUAL disk space used: 7.4M /var/lib/docker/overlay2 <-- !!!
ACTUAL number of items in /var/lib/docker/overlay2: 12 <-- !!!
$ ./repro.sh
BUILDING...
build done!
pruned everything. Docker THINKS this much disk space is in use:
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 0 0 0B 0B
Containers 0 0 0B 0B
Local Volumes 0 0 0B 0B
Build Cache 0 0 0B 0B
ACTUAL disk space used: 7.5M /var/lib/docker/overlay2 <-- !!!
ACTUAL number of items in /var/lib/docker/overlay2: 21 <-- !!!
$ ./repro.sh
BUILDING...
build done!
pruned everything. Docker THINKS this much disk space is in use:
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 0 0 0B 0B
Containers 0 0 0B 0B
Local Volumes 0 0 0B 0B
Build Cache 0 0 0B 0B
ACTUAL disk space used: 7.6M /var/lib/docker/overlay2 <-- !!!
ACTUAL number of items in /var/lib/docker/overlay2: 30 <-- !!!
Expected behavior
When I delete all containers, all images, volumes, caches, everything, then Docker disk usage should return back near to what it uses after a clean installation.
docker version
Client:
Version: 24.0.2
API version: 1.43
Go version: go1.20.4
Git commit: cb74dfc
Built: Thu May 25 21:50:49 2023
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 24.0.2
API version: 1.43 (minimum version 1.12)
Go version: go1.20.4
Git commit: 659604f
Built: Thu May 25 21:35:04 2023
OS/Arch: linux/amd64
Experimental: true
containerd:
Version: v1.7.1
GitCommit: 1677a17964311325ed1c31e2c0a3589ce6d5c30d
runc:
Version: 1.1.7
GitCommit: v1.1.7-0-g860f061
docker-init:
Version: 0.19.0
GitCommit: de40ad0
docker info
Client:
Version: 24.0.2
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.10.5
Path: /usr/local/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.18.1
Path: /usr/local/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 24.0.2
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc io.containerd.runc.v2
Default Runtime: runc
Init Binary: docker-init
containerd version: 1677a17964311325ed1c31e2c0a3589ce6d5c30d
runc version: v1.1.7-0-g860f061
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
Kernel Version: 4.4.0-210-generic
Operating System: Alpine Linux v3.18 (containerized)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.42GiB
Name: node2
ID: 39aba340-eb4a-4ddf-b7bf-f2a4d3a52192
Docker Root Dir: /var/lib/docker
Debug Mode: true
File Descriptors: 27
Goroutines: 43
System Time: 2023-08-01T15:48:58.988555479Z
EventsListeners: 0
Experimental: true
Insecure Registries:
127.0.0.1
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
WARNING: API is accessible on http://0.0.0.0:2375 without encryption.
Access to the remote API is equivalent to root access on the host. Refer
to the 'Docker daemon attack surface' section in the documentation for
more information: https://docs.docker.com/go/attack-surface/
WARNING: No swap limit support
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Additional Info
No response
Description
My CI machines, that run lots of
docker buildanddocker runcommands, often in parallel, keep running out of disk space.I have figured out that when running multiple
docker buildcommands in parallel, Docker loses track of some directories and files it creates under the/var/lib/docker/overlay2directory. This issue does not occur when the "build" commands are run in sequence (e.g. remove the trailing&inrepro.sh).After the build, despite running
docker system prune -af --volumesto delete all build cache/artifacts and usingdocker system dfto verify that there should be no disk space in use, the size of Docker'soverlay2directory grows every time with no limit.Reproduce
I have published a shell script and Dockerfile that systematically reproduces this issue at https://github.com/intgr/bug-reports/tree/main/docker-build-disk-space-leak
Run the
./repro.shshell script multiple times and noticeoverlay2directory increasing in size.The script needs to run
dockercommands and usessudoto monitor the size of theoverlay2directory.It can be tested in the public playground https://labs.play-with-docker.com/ for example.
git clone https://github.com/intgr/bug-reports cd bug-reports/docker-build-disk-space-leak ./repro.shExample output when running the script
Notice that the ACTUAL number of items and disk space keeps growing every time when running
./repro.sh, despite Docker reporting 0 bytes used.Expected behavior
When I delete all containers, all images, volumes, caches, everything, then Docker disk usage should return back near to what it uses after a clean installation.
docker version
Client: Version: 24.0.2 API version: 1.43 Go version: go1.20.4 Git commit: cb74dfc Built: Thu May 25 21:50:49 2023 OS/Arch: linux/amd64 Context: default Server: Docker Engine - Community Engine: Version: 24.0.2 API version: 1.43 (minimum version 1.12) Go version: go1.20.4 Git commit: 659604f Built: Thu May 25 21:35:04 2023 OS/Arch: linux/amd64 Experimental: true containerd: Version: v1.7.1 GitCommit: 1677a17964311325ed1c31e2c0a3589ce6d5c30d runc: Version: 1.1.7 GitCommit: v1.1.7-0-g860f061 docker-init: Version: 0.19.0 GitCommit: de40ad0docker info
Additional Info
No response