Description
I'm using a Swarm scheduler service that creates and disposes of one-shot services to execute scheduled tasks (like a swarm-cronjob). With different tasks scheduled at various intervals (e.g., once a day, every 5 minutes, etc.), I trigger around 200 tasks per day, in addition to my 50 regular services.
This setup has been working well for two years, and I’ve been satisfied with everything.
However, since a hardware update on the nodes in my Docker Swarm, I occasionally encounter container startup failures (about 1 or 2 failures per day out of 200 successes) when creating new services.
The Docker daemon error reads:
starting container failed: unable to remount dir as readonly: mount tmpfs:/var/lib/docker/containers/XXXXXX/mounts/secrets, flags: 0x21, data: uid=0,gid=0: device or resource busy
The services I’m running don’t use secrets but do use configs (though the source code indicates that secrets and configs are handled in the same way). From studying the source code, I understand the process is as follows:
createSecretsDir
* mkdir -p
* mount tmpfs, nodev, nosuid, noexec, SHARED mount by default
then setupSecretDir
* foreach config
** os.WriteFile (no SYNC is done)
then remountSecretDir
* mount -o remount,ro
// this triggers the EBUSY error
What I’ve Tried So Far
- Retrying (
mount -o remount,ro, if error = EBUSY, sleep and retry) resolves the issue temporarily, but this is not an acceptable long-term solution.
- Running
lsof after the EBUSY error gives me no additional information.
- I’m awaiting results from running
lsof before the remount, but it’s been 3 days without an error, likely because it releases the lock issue in the meantime.
- I also tried switching the initial
createSecretDir/mount tmpfs to PRIVATE propagation and then running a mount.MakeShared before the readonly toggle, but this also crashes without success.
What i think
Even though it might look like a race condition or a hardware error (CPU usage is fine, hardware memory tests are fine), I think the os.WriteFile / immediate toggle mount to read-only with no flush/sync is a risky operation. I believe a sync/flush might be needed here. However, I haven't succeeded in suggesting a patch in this way, so I'm filing a ticket here.
Reproduce
docker service create (a service with configs)
Expected behavior
the service is created and the taks is running ok
docker version
Client: Docker Engine - Community
Version: 27.3.1
API version: 1.47
Go version: go1.22.7
Git commit: ce12230
Built: Fri Sep 20 11:41:11 2024
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 27.3.1
API version: 1.47 (minimum version 1.24)
Go version: go1.22.7
Git commit: 41ca978
Built: Fri Sep 20 11:41:11 2024
OS/Arch: linux/amd64
Experimental: true
containerd:
Version: 1.7.22
GitCommit: 7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c
runc:
Version: 1.1.14
GitCommit: v1.1.14-0-g2c9f560
docker-init:
Version: 0.19.0
GitCommit: de40ad0
docker info
Client: Docker Engine - Community
Version: 27.3.1
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.17.1
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.29.7
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 143
Running: 68
Paused: 0
Stopped: 75
Images: 79
Server Version: 27.3.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: active
NodeID: w1rlrws3wg2dwhd9h9fqokbo5
Is Manager: false
Node Address: 10.34.0.15
Manager Addresses:
10.33.0.50:2377
10.33.0.51:2377
10.33.0.52:2377
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c
runc version: v1.1.14-0-g2c9f560
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.1.0-26-amd64
Operating System: Debian GNU/Linux 12 (bookworm)
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 30.96GiB
Name: docker-metal-ns3241043
ID: c73f6d33-f9ea-4e87-809e-7f4f8f56fc1e
Docker Root Dir: /var/lib/docker
Debug Mode: false
Labels:
metal-cluster=true
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Additional Info
starting container failed: unable to remount dir as readonly: mount tmpfs:/var/lib/docker/containers/XXXXXX/mounts/secrets, flags: 0x21, data: uid=0,gid=0: device or resource busy
Description
I'm using a Swarm scheduler service that creates and disposes of one-shot services to execute scheduled tasks (like a swarm-cronjob). With different tasks scheduled at various intervals (e.g., once a day, every 5 minutes, etc.), I trigger around 200 tasks per day, in addition to my 50 regular services.
This setup has been working well for two years, and I’ve been satisfied with everything.
However, since a hardware update on the nodes in my Docker Swarm, I occasionally encounter container startup failures (about 1 or 2 failures per day out of 200 successes) when creating new services.
The Docker daemon error reads:
The services I’m running don’t use secrets but do use configs (though the source code indicates that secrets and configs are handled in the same way). From studying the source code, I understand the process is as follows:
What I’ve Tried So Far
mount -o remount,ro, if error = EBUSY, sleep and retry) resolves the issue temporarily, but this is not an acceptable long-term solution.lsofafter the EBUSY error gives me no additional information.lsofbefore the remount, but it’s been 3 days without an error, likely because it releases the lock issue in the meantime.createSecretDir/mount tmpfsto PRIVATE propagation and then running amount.MakeSharedbefore the readonly toggle, but this also crashes without success.What i think
Even though it might look like a race condition or a hardware error (CPU usage is fine, hardware memory tests are fine), I think the os.WriteFile / immediate toggle mount to read-only with no flush/sync is a risky operation. I believe a sync/flush might be needed here. However, I haven't succeeded in suggesting a patch in this way, so I'm filing a ticket here.
Reproduce
docker service create (a service with configs)
Expected behavior
the service is created and the taks is running ok
docker version
Client: Docker Engine - Community Version: 27.3.1 API version: 1.47 Go version: go1.22.7 Git commit: ce12230 Built: Fri Sep 20 11:41:11 2024 OS/Arch: linux/amd64 Context: default Server: Docker Engine - Community Engine: Version: 27.3.1 API version: 1.47 (minimum version 1.24) Go version: go1.22.7 Git commit: 41ca978 Built: Fri Sep 20 11:41:11 2024 OS/Arch: linux/amd64 Experimental: true containerd: Version: 1.7.22 GitCommit: 7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c runc: Version: 1.1.14 GitCommit: v1.1.14-0-g2c9f560 docker-init: Version: 0.19.0 GitCommit: de40ad0docker info
Additional Info
starting container failed: unable to remount dir as readonly: mount tmpfs:/var/lib/docker/containers/XXXXXX/mounts/secrets, flags: 0x21, data: uid=0,gid=0: device or resource busy