-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
When I try to add a node as a second manager in swarm
docker swarm join --listen-addr THIS_NODE_IP:2377 --advertise-addr THIS_NODE_IP --token "MANAGER_TOKEN" MANAGER-1_NODE_IP:2377
or promote an existing worker
(on a worker "master-2") docker swarm join --listen-addr THIS_NODE_IP:2377 --advertise-addr THIS_NODE_IP --token "WORKER_TOKEN" MANAGER-1_NODE_IP:2377
>>> node has successfully joined the swarm
(on swarm manager & leader) docker node promote master-2
I am getting errors can't initialize raft node: rpc error: code = Unknown desc and node becomes DOWN for Swarm Leader.
Setup:
Docker 1.17.07 (Swarm Mode), RHEL 7.3, Kernel 4.13.3 (same behavior on 3.10), behind proxy.
Expand below items for docker info output:
docker info (swarm master-1)
root@master-1 # docker info
Containers: 3
Running: 3
Paused: 0
Stopped: 0
Images: 4
Server Version: 17.07.0-ce
Storage Driver: overlay
Backing Filesystem: extfs
Supports d_type: true
Logging Driver: journald
Cgroup Driver: cgroupfs
Plugins:
Volume: local (*using volume driver=local, type=nfs*)
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: aw5r20gz2pr18m9yglk4qwjh4
Is Manager: true
ClusterID: b5gds92q3otj9ego2en95qm34
Managers: 1
Nodes: 1
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Root Rotation In Progress: false
Node Address: <manager-1-ip>
Manager Addresses:
<manager-1-ip>:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 3addd840653146c90a254301d6c3a663c7fd6429
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.13.3-1.el7.elrepo.x86_64
Operating System: Red Hat Enterprise Linux Server 7.3 (Maipo)
OSType: linux
Architecture: x86_64
Name: master-1
ID: DMJT:S6EF:LUAF:PQKX:I5R6:TU2W:JZRH:MVMM:YJC5:MC2K:SGSQ:PXKM
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Http Proxy: http://<proxy>
Https Proxy: http://<proxy>
No Proxy: <nodes-ips>,<master-1-ip>,<master-2-ip>,<proxy-ip>,127.0.0.1,localhost,sonar,jenkins
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
docker info (swarm master-2)
root@master-2 # docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 17.07.0-ce
Storage Driver: overlay
Backing Filesystem: xfs
Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 3addd840653146c90a254301d6c3a663c7fd6429
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-514.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.3 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 8GiB
Name: master-2
ID: EY7E:5F7I:OEDA:PBXW:ETM2:IQZU:USYK:6UQM:NGVT:HEFU:TJ3Q:QAUP
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Http Proxy: http://<proxy>
Https Proxy: http://<proxy>
No Proxy: <nodes-ips>,<master-1-ip>,<master-2-ip>,<proxy-ip>,127.0.0.1,localhost,sonar,jenkins
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
docker info (swarm node)
Containers: 5
Running: 5
Paused: 0
Stopped: 0
Images: 8
Server Version: 17.07.0-ce
Storage Driver: overlay
Backing Filesystem: extfs
Supports d_type: true
Logging Driver: journald
Cgroup Driver: cgroupfs
Plugins:
Volume: local (*use volume driver=local type=nfs*)
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: ph3qqr9ti750cc3m741rs1dsm
Is Manager: false
Node Address: <node-ip>
Manager Addresses:
<manager-ip>:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 3addd840653146c90a254301d6c3a663c7fd6429
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.13.3-1.el7.elrepo.x86_64
Operating System: Red Hat Enterprise Linux Server 7.3 (Maipo)
OSType: linux
Architecture: x86_64
Name: node1
ID: KR3X:353P:3XHC:Z7M3:ZXU3:W4NI:7N4O:VYMI:MTYR:D47M:UMT4:LKH5
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Http Proxy: http://<proxy>
Https Proxy: http://<proxy>
No Proxy: <nodes-ips>,<master-1-ip>,<master-2-ip>,<proxy-ip>,127.0.0.1,localhost,sonar,jenkins
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Steps to reproduce:
- on master-1:
docker swarm init - on master-1:
docker swarm join-token worker
on node-1:docker swarm join --listen-addr THIS_NODE_IP:2377 --advertise-addr THIS_NODE_IP --token "WORKER_TOKEN" MANAGER-1_NODE_IP:2377 - on master-1:
docker swarm join-token manager
on master-2:docker swarm join --listen-addr THIS_NODE_IP:2377 --advertise-addr THIS_NODE_IP --token "MANAGER_TOKEN" MANAGER-1_NODE_IP:2377 - "alternative for 3 - join as worker, then promote":
on master-1:docker swarm join-token worker
on master-2:docker swarm join --listen-addr THIS_NODE_IP:2377 --advertise-addr THIS_NODE_IP --token "WORKER_TOKEN" MANAGER-1_NODE_IP:2377
on master-1:docker swarm promote master-2
Expected Result:
Node should successfully join Swarm as Manager and on Leader node docker node ls should display this node as Reachable.
Actual Result
Either joining via manager token or "worker"+"promote" method, node fails with the message cluster exited with error: manager stopped: can't initialize raft node: rpc error: code = Unknown desc = could not connect to prospective new cluster member using its advertised address: rpc error: code = Unavailable desc = grpc: the connection is unavailable,
docker node ls on the Leader shows a manager candidate node status DOWN.
logs:
Sep 30 19:10:46 master-2 dockerd: time="2017-09-30T19:10:46.400925292+02:00" level=info msg="Stopping manager" module=node node.id=7r92yw3pcfcjy4f299dwfwy4l
Sep 30 19:10:46 master-2 dockerd: time="2017-09-30T19:10:46.401013876+02:00" level=info msg="Manager shut down" module=node node.id=7r92yw3pcfcjy4f299dwfwy4l
Sep 30 19:10:46 master-2 dockerd: time="2017-09-30T19:10:46.401086367+02:00" level=info msg="shutting down certificate renewal routine" module="node/tls" node.id=7r92yw3pcfcjy4f299dwfwy4l node.role=swarm-manager
Sep 30 19:10:46 master-2 dockerd: time="2017-09-30T19:10:46.401551984+02:00" level=error msg="cluster exited with error: manager stopped: can't initialize raft node: rpc error: code = Unknown desc = could not connect to prospective new cluster member using its advertised address: rpc error: code = Unavailable desc = grpc: the connection is unavailable"
Sep 30 19:10:46 master-2 dockerd: time="2017-09-30T19:10:46.401601032+02:00" level=warning msg="Restarting swarm in 0.20 seconds"
Network connectivity checked (otherwise rest of the swarm wouldn't work).
Thank you in advance.