Hi everyone,
I'd like to share a similar issue - the issue was resolved when I downgraded from 28.2.2 to 28.1.1. Special thanks to the helpful members of the Docker Discord community who mentioned this issue!
Docker Swarm Overlay Network IP Exhaustion Due to Frequent Service Redeployments
Problem Description
The Docker Swarm Overlay Network experiences instability and eventual breakdown, characterized by IP address exhaustion, when services are frequently redeployed. After approximately 24 hours of continuous frequent redeployments, the internal
container network becomes unstable and eventually fails permanently. Subsequent attempts to remove and redeploy the affected services do not resolve the underlying network issue. This problem has not been observed on internal container networks where services are not frequently redeployed.
Environment
Client: Docker Engine - Community
Version: 28.2.2
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.24.0
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.36.2
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 5
Running: 4
Paused: 0
Stopped: 1
Images: 8
Server Version: 28.2.2
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
CDI spec directories:
/etc/cdi
/var/run/cdi
Swarm: active
NodeID: [REDACTED]
Is Manager: true
ClusterID: [REDACTED]
Managers: 3
Nodes: 6
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Data Path Port: 4789
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: [REDACTED]
Manager Addresses:
[REDACTED]
[REDACTED]
[REDACTED]
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 05044ec0a9a75232cad458027ca83437aae3f4da
runc version: v1.2.5-0-g59923ef
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.1.0-37-amd64
Operating System: Debian GNU/Linux 12 (bookworm)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.823GiB
Name: [REDACTED]
ID: [REDACTED]
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
::1/128
127.0.0.0/8
Live Restore Enabled: false
Default Address Pools:
Base: 172.16.0.0/12, Size: 20
Base: 172.32.0.0/16, Size: 24
Eclusions
Several potential causes have been investigated and ruled out:
- MTU fragmentation: MTU was adjusted from 1500 to 1450 and configured in Docker.
- Firewall: Firewall rules were reviewed and correct. Public traffic on
eth0 and private on eth1.
- VIP: endpoint_mode for all services was set to dnsrr to eliminate a potential complication.
- UDP Checksum: UDP checksum was disabled.
- Docker Overlay Encryption: Encryption was disabled on all Docker networks.
- Conntrack & ARP: The conntrack table and ARP table were not full.
# Conntrack
$ sysctl net.netfilter.nf_conntrack_max
net.netfilter.nf_conntrack_max = 65536
$ cat /proc/net/nf_conntrack | wc -l
65
# ARP
$ sysctl net.ipv4.neigh.default.gc_thresh1
net.ipv4.neigh.default.gc_thresh1 = 8192
$ sysctl net.ipv4.neigh.default.gc_thresh2
net.ipv4.neigh.default.gc_thresh2 = 49152
$ sysctl net.ipv4.neigh.default.gc_thresh3
net.ipv4.neigh.default.gc_thresh3 = 65536
$ ip neigh | wc -l
11
- IPv6 disabling: IPv6 was initially disabled but re-enabled due to potential dependencies.
Relevant Logs
The dockerd process continuously logs warnings about its inability to delete or create certain
VxLAN-related interfaces. For example, specific internal IPs were in use and unreachable
from other containers on the same overlay network:
2025-06-15T21:32:22+02:00 bicapp-manager1 dockerd[650]: level=warning msg="Neighbor entry already present" ifc=vx-001007-krurm ip=10.0.4.182 mac="02:42:0a:00:04:b6" neigh="10.0.4.182 02:42:0a:00:04:b6"
2025-06-15T21:32:22+02:00 bicapp-manager1 dockerd[650]: level=warning msg="Peer add operation failed" error="could not add neighbor entry for nid:krurmdqxnvtrevgsv9y9xqktq eid:[REDACTED] into the sandbox:neighbor entry already exists for IP 10.0.4.182, mac 02:42:0a:00:04:b6, link vx-001007-krurm"
2025-06-15T21:32:22+02:00 bicapp-manager1 dockerd[650]: level=warning msg="Neighbor entry already present" ifc=vx-001002-dur6y ip=10.0.2.184 mac="02:42:0a:00:02:b8" neigh="10.0.2.184 02:42:0a:00:02:b8"
2025-06-15T21:32:22+02:00 bicapp-manager1 dockerd[650]: level=warning msg="Peer add operation failed" error="could not add neighbor entry for nid:dur6y38j182w2v5r9mgtblftd eid:[REDACTED] into the sandbox:neighbor entry already exists for IP 10.0.2.184, mac 02:42:0a:00:02:b8, link vx-001002-dur6y"
2025-06-15T21:32:33+02:00 bicapp-manager1 dockerd[650]: level=warning msg="Neighbor entry already present" ifc=vx-001007-krurm ip=10.0.4.183 mac="02:42:0a:00:04:b7" neigh="10.0.4.183 02:42:0a:00:04:b7"
2025-06-15T21:32:33+02:00 bicapp-manager1 dockerd[650]: level=warning msg="Peer add operation failed" error="could not add neighbor entry for nid:krurmdqxnvtrevgsv9y9xqktq eid:[REDACTED] into the sandbox:neighbor entry already exists for IP 10.0.4.183, mac 02:42:0a:00:04:b7, link vx-001007-krurm"
2025-06-15T21:32:33+02:00 bicapp-manager1 dockerd[650]: level=warning msg="Neighbor entry already present" ifc=vx-001002-dur6y ip=10.0.2.185 mac="02:42:0a:00:02:b9" neigh="10.0.2.185 02:42:0a:00:02:b9"
2025-06-15T21:32:33+02:00 bicapp-manager1 dockerd[650]: level=warning msg="Peer add operation failed" error="could not add neighbor entry for nid:dur6y38j182w2v5r9mgtblftd eid:[REDACTED] into the sandbox:neighbor entry already exists for IP 10.0.2.185, mac 02:42:0a:00:02:b9, link vx-001002-dur6y"
2025-06-15T21:32:36+02:00 bicapp-manager1 dockerd[650]: level=warning msg="rmServiceBinding [REDACTED] possible transient state ok:false entries:0 set:false "
2025-06-15T21:32:36+02:00 bicapp-manager1 dockerd[650]: level=warning msg="error deleting neighbor entry" error="no such file or directory" ifc=vx-001002-dur6y ip=[REDACTED] mac="02:42:0a:00:02:b5"
2025-06-15T21:32:36+02:00 bicapp-manager1 dockerd[650]: level=warning msg="Peer delete operation failed" error="could not delete fdb entry for nid:dur6y38j182w2v5r9mgtblftd eid:[REDACTED] into the sandbox:neighbor entry not found for IP [REDACTED], mac 02:42:0a:00:02:b5, link vx-001002-dur6y"
2025-06-15T21:32:36+02:00 bicapp-manager1 dockerd[650]: level=warning msg="error deleting neighbor entry" error="no such file or directory" ifc=vx-001007-krurm ip=[REDACTED] mac="02:42:0a:00:04:b3"
2025-06-15T21:32:36+02:00 bicapp-manager1 dockerd[650]: level=warning msg="Peer delete operation failed" error="could not delete fdb entry for nid:krurmdqxnvtrevgsv9y9xqktq eid:[REDACTED] into the sandbox:neighbor entry not found for IP [REDACTED], mac 02:42:0a:00:04:b3, link vx-001007-krurm"
2025-06-15T21:32:36+02:00 bicapp-manager1 dockerd[650]: level=warning msg="rmServiceBinding [REDACTED] possible transient state ok:false entries:0 set:false "
Problem Identification
It appears that the Docker Swarm Overlay Network is failing to properly tear down old network interfaces and release
associated IP addresses, particularly after frequent service redeployments. These "stale" or "ghost" IP addresses
remain active within the subnet, leading to eventual IP exhaustion. This was confirmed during nmap -sP scans:
# nmap -sP 10.0.4.194/24
Starting Nmap 7.80 ( https://nmap.org ) at 2025-06-18 10:55 UTC
Nmap scan report for 10.0.4.1
Host is up (0.000035s latency).
MAC Address: [REDACTED] (Unknown)
Nmap scan report for 10.0.4.2
Host is up (0.0000080s latency).
MAC Address: [REDACTED] (Unknown)
Nmap scan report for 10.0.4.3
Host is up (0.0000050s latency).
MAC Address: [REDACTED] (Unknown)
Nmap scan report for prod_service_name.1.[REDACTED].cluster_network (10.0.4.4)
Host is up (0.0000040s latency).
MAC Address: [REDACTED] (Unknown)
Nmap scan report for prod_service_name.2.[REDACTED].cluster_network (10.0.4.5)
Host is up (0.0000040s latency).
[...]
Nmap scan report for [REDACTED] (10.0.4.202)
Host is up.
Nmap done: 256 IP addresses (252 hosts up) scanned in 0.34 seconds <--- 😱😱😱
The scan revealed that the subnet was nearly full of "up" hosts, despite there being only a few legitimate active containers. This indicates that old, decommissioned container IPs are not being released, leading to subnet saturation. When the subnet becomes saturated with these "ghost" IPs, Docker struggles to allocate new, functional IP addresses, or it reuses IPs it believes are free, resulting in network communication failures for new or redeployed containers.
Reproducibility
⚠️ Do NOT run these commands in a production environment as they will cause service downtime!
On any manager host (not a PROD server), start continuous redeployments of a service configured to use an overlay network (e.g., my_service):
$ while :; do docker service update my_service --force; done
It's crucial that the containers involved in the redeployments are on an overlay network and span different nodes.
Access a container that shares a common overlay network (e.g., a reverse proxy container connected to the same network as my_service):
$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
[REDACTED] my_service replicated 2/2 [REDACTED]
[REDACTED] my_other_service replicated 1/1 [REDACTED]
[REDACTED] common_proxy replicated 1/1 (max 1 per node) [REDACTED] <---
$ docker exec -it [COMMON_PROXY_CONTAINER_ID] sh
/ # apk update
[OUTPUT REDACTED]
/ # apk add nmap
[OUTPUT REDACTED]
/ # ping my_service
PING my_service (10.0.2.234): 56 data bytes
64 bytes from 10.0.2.234: seq=0 ttl=64 time=0.772 ms
64 bytes from 10.0.2.234: seq=1 ttl=64 time=0.509 ms
^C
--- my_service ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.509/0.640/0.772 ms
# ⚠️ Observe that the number of "up" hosts identified by nmap continuously increases.
# This indicates that old network interfaces are not being properly torn down!
/ # nmap -sP 10.0.2.0/24
[...]
Host is up.
Nmap done: 256 IP addresses (46 hosts up) scanned in 1.98 seconds
/ # nmap -sP 10.0.2.0/24
[...]
Host is up.
Nmap done: 256 IP addresses (53 hosts up) scanned in 1.93 seconds
# ✅ Initial network communication may still be uninterrupted:
/ # ping my_service
PING my_service (10.0.2.246): 56 data bytes
64 bytes from 10.0.2.246: seq=0 ttl=64 time=0.731 ms
64 bytes from 10.0.2.246: seq=1 ttl=64 time=0.500 ms
^C
--- my_service ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.500/0.615/0.731 ms
# ⌛ After several IP rotations and continued redeployments...
/ # nmap -sP 10.0.2.0/24
[...]
Nmap done: 256 IP addresses (67 hosts up) scanned in 1.96 seconds
# ⬆️ The count of "up" hosts remains very high (e.g., 67 "up" out of only 3 active containers).
# ⚠️ As the subnet becomes saturated, network communication begins to fail:
/ # nmap -sP 10.0.2.0/24
[...]
Nmap done: 256 IP addresses (252 hosts up) scanned in 27.55 seconds
/ # ping my_service
PING my_service (10.0.2.217): 56 data bytes
^C
--- my_service ping statistics ---
6 packets transmitted, 0 packets received, 100% packet loss
/ # ping my_service
PING my_service (10.0.2.221): 56 data bytes
64 bytes from 10.0.2.221: seq=0 ttl=64 time=0.682 ms
64 bytes from 10.0.2.221: seq=1 ttl=64 time=0.705 ms
64 bytes from 10.0.2.221: seq=2 ttl=64 time=0.573 ms
^C
--- my_service ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.573/0.653/0.705 ms
Impact
- Existing, long-running containers (tasks) will available on their allocated IPs.
- Newly deployed containers on the affected overlay network will be unable to communicate reliably. This is because Docker attempts to allocate an IP address that it incorrectly believes has been released, leading to network failures for these new containers.
Related Issues
I've collected some related issues that might be connected:
Update: Nmap still shows that 252 hosts are active, even though only a few containers are using the overlay network. However, this doesn't cause any issues in Docker 28.1.1.
Originally posted by @bencurio in #50129 (comment)
Hi everyone,
I'd like to share a similar issue - the issue was resolved when I downgraded from 28.2.2 to 28.1.1. Special thanks to the helpful members of the Docker Discord community who mentioned this issue!
Docker Swarm Overlay Network IP Exhaustion Due to Frequent Service Redeployments
Problem Description
The Docker Swarm Overlay Network experiences instability and eventual breakdown, characterized by IP address exhaustion, when services are frequently redeployed. After approximately 24 hours of continuous frequent redeployments, the internal
container network becomes unstable and eventually fails permanently. Subsequent attempts to remove and redeploy the affected services do not resolve the underlying network issue. This problem has not been observed on internal container networks where services are not frequently redeployed.
Environment
Eclusions
Several potential causes have been investigated and ruled out:
eth0and private oneth1.Relevant Logs
The
dockerdprocess continuously logs warnings about its inability to delete or create certainVxLAN-related interfaces. For example, specific internal IPs were in use and unreachable
from other containers on the same overlay network:
Problem Identification
It appears that the Docker Swarm Overlay Network is failing to properly tear down old network interfaces and release
associated IP addresses, particularly after frequent service redeployments. These "stale" or "ghost" IP addresses
remain active within the subnet, leading to eventual IP exhaustion. This was confirmed during
nmap -sPscans:The scan revealed that the subnet was nearly full of "up" hosts, despite there being only a few legitimate active containers. This indicates that old, decommissioned container IPs are not being released, leading to subnet saturation. When the subnet becomes saturated with these "ghost" IPs, Docker struggles to allocate new, functional IP addresses, or it reuses IPs it believes are free, resulting in network communication failures for new or redeployed containers.
Reproducibility
On any manager host (not a PROD server), start continuous redeployments of a service configured to use an overlay network (e.g.,
my_service):It's crucial that the containers involved in the redeployments are on an overlay network and span different nodes.
Access a container that shares a common overlay network (e.g., a reverse proxy container connected to the same network as
my_service):Impact
Related Issues
I've collected some related issues that might be connected:
Update: Nmap still shows that 252 hosts are active, even though only a few containers are using the overlay network. However, this doesn't cause any issues in Docker 28.1.1.
Originally posted by @bencurio in #50129 (comment)