Entries are never deleted from overlay ARP table

Hi everyone,

I'd like to share a similar issue - the issue was resolved when I downgraded from 28.2.2 to 28.1.1. Special thanks to the helpful members of the Docker Discord community who mentioned this issue!

# Docker Swarm Overlay Network IP Exhaustion Due to Frequent Service Redeployments

## Problem Description

The Docker Swarm Overlay Network experiences instability and eventual breakdown, characterized by IP address exhaustion, when services are frequently redeployed. After approximately 24 hours of continuous frequent redeployments, the internal 
container network becomes unstable and eventually fails permanently. Subsequent attempts to remove and redeploy the affected services do not resolve the underlying network issue. This problem has not been observed on internal container networks where services are not frequently redeployed.

## Environment

```plaintext
Client: Docker Engine - Community
 Version:    28.2.2
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
     Version:  v0.24.0
     Path:    /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
     Version:  v2.36.2
     Path:    /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 5
  Running: 4
  Paused: 0
  Stopped: 1
 Images: 8
 Server Version: 28.2.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 CDI spec directories:
  /etc/cdi
  /var/run/cdi
 Swarm: active
  NodeID: [REDACTED]
  Is Manager: true
  ClusterID: [REDACTED]
  Managers: 3
  Nodes: 6
  Default Address Pool: 10.0.0.0/8
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
    Task History Retention Limit: 5
  Raft:
    Snapshot Interval: 10000
    Number of Old Snapshots to Retain: 0
    Heartbeat Tick: 1
    Election Tick: 10
  Dispatcher:
    Heartbeat Period: 5 seconds
  CA Configuration:
    Expiry Duration: 3 months
    Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: [REDACTED]
  Manager Addresses:
    [REDACTED]
    [REDACTED]
    [REDACTED]
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 05044ec0a9a75232cad458027ca83437aae3f4da
 runc version: v1.2.5-0-g59923ef
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
    Profile: builtin
  cgroupns
 Kernel Version: 6.1.0-37-amd64
 Operating System: Debian GNU/Linux 12 (bookworm)
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 3.823GiB
 Name: [REDACTED]
 ID: [REDACTED]
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  ::1/128
  127.0.0.0/8
 Live Restore Enabled: false
 Default Address Pools:
   Base: 172.16.0.0/12, Size: 20
   Base: 172.32.0.0/16, Size: 24
```

## Eclusions

Several potential causes have been investigated and ruled out:

- **MTU fragmentation:** MTU was adjusted from 1500 to 1450 and configured in Docker.
- **Firewall:** Firewall rules were reviewed and correct. Public traffic on `eth0` and private on `eth1`.
- **VIP:** endpoint_mode for all services was set to dnsrr to eliminate a potential complication.
- **UDP Checksum:** UDP checksum was disabled.
- **Docker Overlay Encryption:** Encryption was disabled on all Docker networks.
- **Conntrack & ARP:** The conntrack table and ARP table were not full.

```shell
# Conntrack
$ sysctl net.netfilter.nf_conntrack_max
net.netfilter.nf_conntrack_max = 65536
$ cat /proc/net/nf_conntrack | wc -l
65

# ARP
$ sysctl net.ipv4.neigh.default.gc_thresh1
net.ipv4.neigh.default.gc_thresh1 = 8192
$ sysctl net.ipv4.neigh.default.gc_thresh2
net.ipv4.neigh.default.gc_thresh2 = 49152
$ sysctl net.ipv4.neigh.default.gc_thresh3
net.ipv4.neigh.default.gc_thresh3 = 65536
$ ip neigh | wc -l
11
```

- **IPv6 disabling:** IPv6 was initially disabled but re-enabled due to potential dependencies. 

## Relevant Logs

The `dockerd` process continuously logs warnings about its inability to delete or create certain
VxLAN-related interfaces. For example, specific internal IPs were in use and unreachable
from other containers on the same overlay network:

```plaintext
2025-06-15T21:32:22+02:00 bicapp-manager1 dockerd[650]: level=warning msg="Neighbor entry already present" ifc=vx-001007-krurm ip=10.0.4.182 mac="02:42:0a:00:04:b6" neigh="10.0.4.182 02:42:0a:00:04:b6"
2025-06-15T21:32:22+02:00 bicapp-manager1 dockerd[650]: level=warning msg="Peer add operation failed" error="could not add neighbor entry for nid:krurmdqxnvtrevgsv9y9xqktq eid:[REDACTED] into the sandbox:neighbor entry already exists for IP 10.0.4.182, mac 02:42:0a:00:04:b6, link vx-001007-krurm"
2025-06-15T21:32:22+02:00 bicapp-manager1 dockerd[650]: level=warning msg="Neighbor entry already present" ifc=vx-001002-dur6y ip=10.0.2.184 mac="02:42:0a:00:02:b8" neigh="10.0.2.184 02:42:0a:00:02:b8"
2025-06-15T21:32:22+02:00 bicapp-manager1 dockerd[650]: level=warning msg="Peer add operation failed" error="could not add neighbor entry for nid:dur6y38j182w2v5r9mgtblftd eid:[REDACTED] into the sandbox:neighbor entry already exists for IP 10.0.2.184, mac 02:42:0a:00:02:b8, link vx-001002-dur6y"
2025-06-15T21:32:33+02:00 bicapp-manager1 dockerd[650]: level=warning msg="Neighbor entry already present" ifc=vx-001007-krurm ip=10.0.4.183 mac="02:42:0a:00:04:b7" neigh="10.0.4.183 02:42:0a:00:04:b7"
2025-06-15T21:32:33+02:00 bicapp-manager1 dockerd[650]: level=warning msg="Peer add operation failed" error="could not add neighbor entry for nid:krurmdqxnvtrevgsv9y9xqktq eid:[REDACTED] into the sandbox:neighbor entry already exists for IP 10.0.4.183, mac 02:42:0a:00:04:b7, link vx-001007-krurm"
2025-06-15T21:32:33+02:00 bicapp-manager1 dockerd[650]: level=warning msg="Neighbor entry already present" ifc=vx-001002-dur6y ip=10.0.2.185 mac="02:42:0a:00:02:b9" neigh="10.0.2.185 02:42:0a:00:02:b9"
2025-06-15T21:32:33+02:00 bicapp-manager1 dockerd[650]: level=warning msg="Peer add operation failed" error="could not add neighbor entry for nid:dur6y38j182w2v5r9mgtblftd eid:[REDACTED] into the sandbox:neighbor entry already exists for IP 10.0.2.185, mac 02:42:0a:00:02:b9, link vx-001002-dur6y"
2025-06-15T21:32:36+02:00 bicapp-manager1 dockerd[650]: level=warning msg="rmServiceBinding [REDACTED] possible transient state ok:false entries:0 set:false "
2025-06-15T21:32:36+02:00 bicapp-manager1 dockerd[650]: level=warning msg="error deleting neighbor entry" error="no such file or directory" ifc=vx-001002-dur6y ip=[REDACTED] mac="02:42:0a:00:02:b5"
2025-06-15T21:32:36+02:00 bicapp-manager1 dockerd[650]: level=warning msg="Peer delete operation failed" error="could not delete fdb entry for nid:dur6y38j182w2v5r9mgtblftd eid:[REDACTED] into the sandbox:neighbor entry not found for IP [REDACTED], mac 02:42:0a:00:02:b5, link vx-001002-dur6y"
2025-06-15T21:32:36+02:00 bicapp-manager1 dockerd[650]: level=warning msg="error deleting neighbor entry" error="no such file or directory" ifc=vx-001007-krurm ip=[REDACTED] mac="02:42:0a:00:04:b3"
2025-06-15T21:32:36+02:00 bicapp-manager1 dockerd[650]: level=warning msg="Peer delete operation failed" error="could not delete fdb entry for nid:krurmdqxnvtrevgsv9y9xqktq eid:[REDACTED] into the sandbox:neighbor entry not found for IP [REDACTED], mac 02:42:0a:00:04:b3, link vx-001007-krurm"
2025-06-15T21:32:36+02:00 bicapp-manager1 dockerd[650]: level=warning msg="rmServiceBinding [REDACTED] possible transient state ok:false entries:0 set:false "
```

## Problem Identification

It appears that the **Docker Swarm Overlay Network is failing to properly tear down old network interfaces and release 
associated IP addresses**, particularly after frequent service redeployments. These "stale" or "ghost" IP addresses 
remain active within the subnet, leading to eventual IP exhaustion. This was confirmed during `nmap -sP` scans:

```plaintext
# nmap -sP 10.0.4.194/24
Starting Nmap 7.80 ( https://nmap.org ) at 2025-06-18 10:55 UTC
Nmap scan report for 10.0.4.1
Host is up (0.000035s latency).
MAC Address: [REDACTED] (Unknown)
Nmap scan report for 10.0.4.2
Host is up (0.0000080s latency).
MAC Address: [REDACTED] (Unknown)
Nmap scan report for 10.0.4.3
Host is up (0.0000050s latency).
MAC Address: [REDACTED] (Unknown)
Nmap scan report for prod_service_name.1.[REDACTED].cluster_network (10.0.4.4)
Host is up (0.0000040s latency).
MAC Address: [REDACTED] (Unknown)
Nmap scan report for prod_service_name.2.[REDACTED].cluster_network (10.0.4.5)
Host is up (0.0000040s latency).
[...]
Nmap scan report for [REDACTED] (10.0.4.202)
Host is up.
Nmap done: 256 IP addresses (252 hosts up) scanned in 0.34 seconds <--- 😱😱😱
```

The scan revealed that the subnet was nearly full of "up" hosts, despite there being only a few legitimate active containers. This indicates that old, decommissioned container IPs are not being released, leading to subnet saturation. When the subnet becomes saturated with these "ghost" IPs, Docker struggles to allocate new, functional IP addresses, or it reuses IPs it believes are free, resulting in network communication failures for new or redeployed containers.

## Reproducibility

> ⚠️ Do NOT run these commands in a production environment as they will cause service downtime!

On any manager host (not a PROD server), start continuous redeployments of a service configured to use an overlay network (e.g., `my_service`):

```shell
$ while :; do docker service update my_service --force; done
```

It's crucial that the containers involved in the redeployments are on an overlay network and span different nodes.

Access a container that shares a common overlay network (e.g., a reverse proxy container connected to the same network as `my_service`):

```shell
$ docker service ls
ID             NAME                                            MODE         REPLICAS         IMAGE                                           PORTS
[REDACTED]     my_service                                      replicated   2/2              [REDACTED]
[REDACTED]     my_other_service                                replicated   1/1              [REDACTED]
[REDACTED]     common_proxy                                    replicated   1/1 (max 1 per node) [REDACTED] <---

$ docker exec -it [COMMON_PROXY_CONTAINER_ID] sh

/ # apk update
[OUTPUT REDACTED]

/ # apk add nmap
[OUTPUT REDACTED]

/ # ping my_service
PING my_service (10.0.2.234): 56 data bytes
64 bytes from 10.0.2.234: seq=0 ttl=64 time=0.772 ms
64 bytes from 10.0.2.234: seq=1 ttl=64 time=0.509 ms
^C
--- my_service ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.509/0.640/0.772 ms

# ⚠️ Observe that the number of "up" hosts identified by nmap continuously increases. 
#    This indicates that old network interfaces are not being properly torn down!

/ # nmap -sP 10.0.2.0/24
[...]
Host is up.
Nmap done: 256 IP addresses (46 hosts up) scanned in 1.98 seconds

/ # nmap -sP 10.0.2.0/24
[...]
Host is up.
Nmap done: 256 IP addresses (53 hosts up) scanned in 1.93 seconds

# ✅ Initial network communication may still be uninterrupted:

/ # ping my_service
PING my_service (10.0.2.246): 56 data bytes
64 bytes from 10.0.2.246: seq=0 ttl=64 time=0.731 ms
64 bytes from 10.0.2.246: seq=1 ttl=64 time=0.500 ms
^C
--- my_service ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.500/0.615/0.731 ms

# ⌛ After several IP rotations and continued redeployments...

/ # nmap -sP 10.0.2.0/24
[...]
Nmap done: 256 IP addresses (67 hosts up) scanned in 1.96 seconds

# ⬆️ The count of "up" hosts remains very high (e.g., 67 "up" out of only 3 active containers).

# ⚠️ As the subnet becomes saturated, network communication begins to fail:

/ # nmap -sP 10.0.2.0/24
[...]
Nmap done: 256 IP addresses (252 hosts up) scanned in 27.55 seconds


/ # ping my_service
PING my_service (10.0.2.217): 56 data bytes
^C
--- my_service ping statistics ---
6 packets transmitted, 0 packets received, 100% packet loss

/ # ping my_service
PING my_service (10.0.2.221): 56 data bytes
64 bytes from 10.0.2.221: seq=0 ttl=64 time=0.682 ms
64 bytes from 10.0.2.221: seq=1 ttl=64 time=0.705 ms
64 bytes from 10.0.2.221: seq=2 ttl=64 time=0.573 ms
^C
--- my_service ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.573/0.653/0.705 ms
```

## Impact

- Existing, long-running containers (tasks) will available on their allocated IPs.
- **Newly deployed containers on the affected overlay network will be unable to communicate reliably.** This is because Docker attempts to allocate an IP address that it incorrectly believes has been released, leading to network failures for these new containers.

## Related Issues

I've collected some related issues that might be connected:

- [Peer operation failed:could not delete fdb entry into the sandbox and other errors](https://github.com/moby/moby/issues/35239)
- [NetworkDB does not always reliably converge](https://github.com/moby/moby/issues/47728)
- [Containers on overlay network cannot reach other containers](https://github.com/moby/moby/issues/35807#issuecomment-396298834)
- [Container cannot access another containers in one overlay network on same node](https://github.com/moby/moby/issues/34957)
- [FDB for overlay network link falls out of sync with NetworkDB state](https://github.com/moby/moby/issues/49908)
- [Container creation failed due to stray vxlan interface in the host's netns](https://github.com/moby/moby/issues/49555)

**Update:** Nmap still shows that 252 hosts are active, even though only a few containers are using the overlay network. However, this doesn't cause any issues in Docker 28.1.1.

_Originally posted by @bencurio in https://github.com/moby/moby/issues/50129#issuecomment-2986531271_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Entries are never deleted from overlay ARP table #50232

Docker Swarm Overlay Network IP Exhaustion Due to Frequent Service Redeployments

Problem Description

Environment

Eclusions

Relevant Logs

Problem Identification

Reproducibility

Impact

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Entries are never deleted from overlay ARP table #50232

Description

Docker Swarm Overlay Network IP Exhaustion Due to Frequent Service Redeployments

Problem Description

Environment

Eclusions

Relevant Logs

Problem Identification

Reproducibility

Impact

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions