Backoff time not honored by error message causes system log flood

**Description**

The `dockerd` process of a worker node in a docker swarm cluster filled my logs with a barrage of the following error:
`level=error msg="agent: session failed" backoff=8s error="no remote hosts provided" module=node/agent node.id=tzah7lertx0mdg9cudrmmjhcg`, completely ignoring the supposed backoff time specified in the message, eventually filling the disk and causing docker to crash (among other things).

**Steps to reproduce the issue:**
Not sure, it just started happening out of the blue yesterday night. Right before the errors, the logs indicate that a process (an application running in one of the containers) was killed due to OOM, but that might be unrelated.

**Describe the results you received:**
`dockerd` kept logging the message non-stop (~500 times per second), even though it apparently should have been using something like exponential back-off (the first message says `backoff=100ms`, the second `300ms`, then `700ms`, `1.5s`, `3.1s`, `6,3s`, and finally `8s`).

**Describe the results you expected:**
This error message is logged according to the backoff policy it is stating, instead of flooding the logs.

**Additional information you deem important (e.g. issue happens only occasionally):**
My swarm is setup with 1 manager (this is a just test environment) and 3 worker nodes. The manager is set to availability `Pause` so it doesn't run tasks, and all three worker nodes are set to `Active`. I just upgraded all of them to docker 18.03.0-ce yesterday night

Here's a screenshot of the logs, when the errors started happening:
![image](https://user-images.githubusercontent.com/716334/37846324-52b0f148-2e9b-11e8-87a1-9092f6b10e05.png)

**Output of `docker version`:**
In the worker node:
```
Client:
 Version:       18.03.0-ce
 API version:   1.37
 Go version:    go1.9.4
 Git commit:    0520e24
 Built: Wed Mar 21 23:10:01 2018
 OS/Arch:       linux/amd64
 Experimental:  false
 Orchestrator:  swarm

Server:
 Engine:
  Version:      18.03.0-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   0520e24
  Built:        Wed Mar 21 23:08:31 2018
  OS/Arch:      linux/amd64
  Experimental: false
```

In the manager node:
```
Client:
 Version:       18.03.0-ce
 API version:   1.37
 Go version:    go1.9.4
 Git commit:    0520e24
 Built: Wed Mar 21 23:10:01 2018
 OS/Arch:       linux/amd64
 Experimental:  false
 Orchestrator:  swarm

Server:
 Engine:
  Version:      18.03.0-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   0520e24
  Built:        Wed Mar 21 23:08:31 2018
  OS/Arch:      linux/amd64
  Experimental: false
```

**Output of `docker info`:**
In the worker node:
```
Containers: 269
 Running: 0
 Paused: 0
 Stopped: 269
Images: 217
Server Version: 18.03.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: pending
 NodeID: tzah7lertx0mdg9cudrmmjhcg
 Is Manager: false
 Node Address: 10.0.0.7
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfd04396dc68220d1cecbe686a6cc3aa5ce3667c
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.13.0-1011-azure
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.863GiB
Name: swarmm-agent-13957614000003
ID: N5OP:KAVU:DGGS:Y3D7:CTTZ:VXF7:A7OV:6CLA:ZNR5:CAWU:5BMM:DHMI
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support
```

In the manager node:
```
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 216
Server Version: 18.03.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: 09kaqkkjwsx4yq6ve9jxn8n3h
 Is Manager: true
 ClusterID: xozu5p2qf7kkyad3q9224schk
 Managers: 0
 Nodes: 4
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 172.16.0.5
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfd04396dc68220d1cecbe686a6cc3aa5ce3667c
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-116-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.873GiB
Name: swarmm-master-13957614-0
ID: IMAS:YBUM:ILQH:6GEI:XVCC:QOWP:H6YI:JZ74:JONP:LKI6:YIGW:HPB2
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: alexvy86
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support
```

**Additional environment details (AWS, VirtualBox, physical, etc.):**
The manager and the nodes are Virtual Machine Scale Sets in Azure, originally deployed using [these instructions](https://azure.microsoft.com/en-us/resources/templates/101-acsengine-swarmmode/).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backoff time not honored by error message causes system log flood #36683

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Backoff time not honored by error message causes system log flood #36683

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions