Skip to content

swarm mode: Manager(s) cascading failures after docker service scale and daemon running out of memory #24027

@abronan

Description

@abronan

Output of docker version:

Client:
 Version:      1.12.0-dev
 API version:  1.25
 Go version:   go1.6.2
 Git commit:   cccfe63
 Built:        Mon Jun 27 17:46:02 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.0-dev
 API version:  1.25
 Go version:   go1.6.2
 Git commit:   cccfe63
 Built:        Mon Jun 27 17:46:02 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 241
 Running: 5
 Paused: 0
 Stopped: 236
Images: 1
Server Version: 1.12.0-dev
Storage Driver: overlay
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: host bridge overlay null
Swarm: active
 NodeID: bnpa9guxfqznkkp6g9qmtbiim
 IsManager: No
Runtimes: default
Default Runtime: default
Security Options: apparmor seccomp
Kernel Version: 4.4.0-22-generic
Operating System: Ubuntu 16.04 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 992.6 MiB
Name: node05
ID: VPNA:RUIV:3HML:CTPL:2ZHA:7FXL:IPY7:JZTK:FJYX:KM77:MHZ2:W22H
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):

Digital Ocean VMs:

3 Managers: 2 GB Memory / 40 GB Disk / Ubuntu 16.04 x64
3 Agents: 1 GB Memory / 30 GB Disk / Ubuntu 16.04 x64

Steps to reproduce the issue:

  1. Create the cluster (docker swarm init <...> / docker swarm join <...>)
  2. Create one service (redis for example)
  3. Scale to a ridiculous amount of tasks with docker service scale redis=3000

Describe the results you received:

Managers panic one after the other until we lose the quorum.

What happens in order:

  • The leader schedules the tasks and we see the counter going up using docker service ls
  • The leader reaches the point where it has too many containers running and the daemon runs out of memory or out of fds. Ultimately it crashes.
  • Raft elects a new Leader amongst the Managers which picks up the scheduling logic.
  • Same scenario, reaches to the point where there are too many open files or runs out of memory because of the tasks running.
  • etc.
  • We lose the quorum and the cluster becomes unusable.

One single command triggered a chain reaction that could put the cluster out of use.

Describe the results you expected:

I expect the daemon to keep enough space and not schedule more tasks on the Leader or other Managers if this could put the cluster stability in danger.

Further thoughts:

I'm not sure if there is any good solution for this, but at least we should keep the Managers safe.

Some proposals and actionable items:

  • We clearly document the behavior and warn users to preserve exclusive resources for the daemon not to crash.
  • We document that if you want the set of Managers to stay "safe", you should opt them out of the cluster by draining them and turning them off as Agents.
  • We take a much finer grained decision on scheduling and stop scaling with a Warning when we approach the memory limit or max number of fds. This could give the opportunity for the user to correct that mistake and revert to a reasonable amount of tasks. In this case we control the amount of resources left at the daemon level to disable the Agent if we approach a given threshold.

/cc @aluzzardi @aaronlehmann @stevvooe @icecrime @tiborvass

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions