-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
Description
In a swarm setup using overlay networks, idle connections between 2 services will end up in a broken state after 15 minutes.
The issue is related to the way docker overlay routes packets, using first iptables to mark them and use ipvs to forward them to the right hosts but the default expiration for connections on ipvs is set to 900 seconds (ipvsadm -l --timeout) after which it will stop forwarding packets even though the connection still exists; If this happens then any new packet on this connection will now try to go to the virtual IP for that service that has no valid resolution, resulting in a broken state where it is stuck in limbo while the kernel forever tries to resolve that virtual IP.
Steps to reproduce the issue:
- Start 2 services on the same network (on different hosts, though it should be reproducible even on a single host?)
docker execin both of them, in one start anccommand in listen mode, in the other one connect to thatncserver by using the service name DNS.- Send a packet from the client to the server, everything is fine
- Find your
netnsand find your connection by doingnsenter --net=2cc18e502f81 ipvsadm -lnc - Wait for the connection to expire and be removed from the list
- Send another packet, nothing ever gets there and the connection doesn't timeout,
tcpdumpshows lots of ARP packets going out
Describe the results you received:
Packet never reaches the target, kernel is stuck doing ARP requests over and over.
Describe the results you expected:
Either have the connection properly timeout, or find a way to restore the routing in ipvs.
Additional information you deem important (e.g. issue happens only occasionally):
Currently can be resolved by setting net.ipv4.tcp_keepalive_time to less than 900 seconds, to make sure the TCP connection doesn't expire but I'm not sure if it's a valid way to deal with this; At the very least this behavior should be documented.
Output of docker version:
Client:
Version: 1.13.1
API version: 1.26
Go version: go1.7.5
Git commit: 092cba3
Built: Wed Feb 8 06:38:28 2017
OS/Arch: linux/amd64
Server:
Version: 1.13.1
API version: 1.26 (minimum version 1.12)
Go version: go1.7.5
Git commit: 092cba3
Built: Wed Feb 8 06:38:28 2017
OS/Arch: linux/amd64
Experimental: false
Output of docker info:
Containers: 2
Running: 2
Paused: 0
Stopped: 0
Images: 2
Server Version: 1.13.1
Storage Driver: overlay
Backing Filesystem: xfs
Supports d_type: true
Logging Driver: fluentd
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: active
NodeID: l3e2evjei4cvcdgjqavtrztgo
Is Manager: false
Node Address: 172.24.0.100
Manager Addresses:
172.24.0.200:2377
172.24.0.50:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1
runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-514.2.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.796 GiB
Name: worker-1
ID: DR4G:LZEQ:YSQ7:CYTR:FAXW:ZNVJ:E4AZ:BX5L:QYYG:ZDY5:SO7U:TFZW
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-ip6tables is disabled
Labels:
dawn.node.type=worker
dawn.node.subtype=app
Experimental: false
Insecure Registries:
172.24.0.50:5000
127.0.0.0/8
Live Restore Enabled: false
Additional environment details (AWS, VirtualBox, physical, etc.):
My current test setup is 5 vagrant boxes (2 managers + 3 workers), but it should happen in any environment.