Skip to content

Create service fail / starting container failed: unable to remount [secrets] dir as readonly #48783

@131

Description

@131

Description

I'm using a Swarm scheduler service that creates and disposes of one-shot services to execute scheduled tasks (like a swarm-cronjob). With different tasks scheduled at various intervals (e.g., once a day, every 5 minutes, etc.), I trigger around 200 tasks per day, in addition to my 50 regular services.

This setup has been working well for two years, and I’ve been satisfied with everything.

However, since a hardware update on the nodes in my Docker Swarm, I occasionally encounter container startup failures (about 1 or 2 failures per day out of 200 successes) when creating new services.

The Docker daemon error reads:

starting container failed: unable to remount dir as readonly: mount tmpfs:/var/lib/docker/containers/XXXXXX/mounts/secrets, flags: 0x21, data: uid=0,gid=0: device or resource busy

The services I’m running don’t use secrets but do use configs (though the source code indicates that secrets and configs are handled in the same way). From studying the source code, I understand the process is as follows:

createSecretsDir
 * mkdir -p
 * mount tmpfs, nodev, nosuid, noexec, SHARED mount by default
then setupSecretDir
  * foreach config
  ** os.WriteFile (no SYNC is done)
then remountSecretDir
* mount -o remount,ro
// this triggers the EBUSY error

What I’ve Tried So Far

  • Retrying (mount -o remount,ro, if error = EBUSY, sleep and retry) resolves the issue temporarily, but this is not an acceptable long-term solution.
  • Running lsof after the EBUSY error gives me no additional information.
  • I’m awaiting results from running lsof before the remount, but it’s been 3 days without an error, likely because it releases the lock issue in the meantime.
  • I also tried switching the initial createSecretDir/mount tmpfs to PRIVATE propagation and then running a mount.MakeShared before the readonly toggle, but this also crashes without success.

What i think

Even though it might look like a race condition or a hardware error (CPU usage is fine, hardware memory tests are fine), I think the os.WriteFile / immediate toggle mount to read-only with no flush/sync is a risky operation. I believe a sync/flush might be needed here. However, I haven't succeeded in suggesting a patch in this way, so I'm filing a ticket here.

Reproduce

docker service create (a service with configs)

Expected behavior

the service is created and the taks is running ok

docker version

Client: Docker Engine - Community
 Version:           27.3.1
 API version:       1.47
 Go version:        go1.22.7
 Git commit:        ce12230
 Built:             Fri Sep 20 11:41:11 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          27.3.1
  API version:      1.47 (minimum version 1.24)
  Go version:       go1.22.7
  Git commit:       41ca978
  Built:            Fri Sep 20 11:41:11 2024
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          1.7.22
  GitCommit:        7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c
 runc:
  Version:          1.1.14
  GitCommit:        v1.1.14-0-g2c9f560
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client: Docker Engine - Community
 Version:    27.3.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.17.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.29.7
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 143
  Running: 68
  Paused: 0
  Stopped: 75
 Images: 79
 Server Version: 27.3.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: active
  NodeID: w1rlrws3wg2dwhd9h9fqokbo5
  Is Manager: false
  Node Address: 10.34.0.15
  Manager Addresses:
   10.33.0.50:2377
   10.33.0.51:2377
   10.33.0.52:2377
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c
 runc version: v1.1.14-0-g2c9f560
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.1.0-26-amd64
 Operating System: Debian GNU/Linux 12 (bookworm)
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 30.96GiB
 Name: docker-metal-ns3241043
 ID: c73f6d33-f9ea-4e87-809e-7f4f8f56fc1e
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Labels:
  metal-cluster=true
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional Info

starting container failed: unable to remount dir as readonly: mount tmpfs:/var/lib/docker/containers/XXXXXX/mounts/secrets, flags: 0x21, data: uid=0,gid=0: device or resource busy

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/swarmkind/bugBugs are bugs. The cause may or may not be known at triage time so debugging may be needed.status/0-triageversion/27.3

    Type

    Projects

    Status

    Complete

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions