Skip to content

Containers Networking Fails after 2-4 days #7116

@Junto026

Description

@Junto026

Description

I am pulling my hair out... any ideas for things I should look into would be helpful!! I am all out of ideas.

I have 17 containers running on a custom network (bridge driver). I point containers to my own DNS server. Everything runs great for a few days, but at some point I'll notice essentially all connections between containers on the network, and between the containers and the host OS, are failing. Restarting the docker engine clears the issue for another 2-4 days.

External connections to internet (both coming in and going out) seem to be fine.

I've poured through log files and cannot find a culprit.

  1. Ping between containers/host/internet all work fine
  2. DNS lookups to my DNS server (which is running on the same host, on the host OS) fail. DNS lookups to an external server (e.g. google's) succeed. Here's a blurb from dockerd.log

[2023-12-14T20:21:43.728658844Z][dockerd][I] time="2023-12-14T20:21:43.727917261Z" level=error msg="[resolver] failed to query DNS server: 10.0.0.100:53, query: ;ports.ubuntu.com.\tIN\t A" error="read udp 192.168.32.103:53420->10.0.0.100:53: i/o timeout"
[2023-12-14T20:21:47.731933221Z][dockerd][I] time="2023-12-14T20:21:47.731412846Z" level=error msg="[resolver] failed to query DNS server: 10.0.0.100:53, query: ;ports.ubuntu.com.\tIN\t A" error="read udp 192.168.32.103:58153->10.0.0.100:53: i/o timeout"

The DNS queries do not reach the software running on the host OS. I've restarted the DNS software and that does not fix the issue. The DNS server is responding to all other devices' queries on my network, so I'm certain the error is occurring somewhere within the Docker networking and/or host OS networking.

  1. Websocket connections between containers, on the same docker network, fail. Here's an example of one container trying to open a web socket to another:

UnboundLocalError: cannot access local variable 'r' where it is not associated with a value
2023-12-14 14:42:56 - ERROR :: CP Server Thread-12 : Failed to access uri endpoint /users/account. Connection error: HTTPSConnectionPool(host='plex.tv', port=443): Max retries exceeded with url: /users/account (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0xffff781e8790>: Failed to establish a new connection: [Errno -3] Try again'))
2023-12-14 14:42:56 - WARNING :: CP Server Thread-12 : Tautulli PlexTV :: Unable to parse XML for get_plexpass_status: 'NoneType' object has no attribute 'getElementsByTagName'.
2023-12-14 14:42:57 - DEBUG :: CP Server Thread-8 : IP Checker :: Resolved plex.private to 192.168.32.106.
2023-12-14 14:42:57 - INFO :: CP Server Thread-8 : Tautulli PlexTV :: Requesting resources for server...
2023-12-14 14:42:57 - INFO :: CP Server Thread-8 : Tautulli PlexTV :: Pinging Plex.tv to refresh token.
2023-12-14 14:43:02 - ERROR :: CP Server Thread-8 : Failed to access uri endpoint /api/v2/ping. Connection error: HTTPSConnectionPool(host='plex.tv', port=443): Max retries exceeded with url: /api/v2/ping (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0xffff7a04dc90>: Failed to establish a new connection: [Errno -3] Try again'))
2023-12-14 14:43:02 - WARNING :: CP Server Thread-8 : Tautulli PlexTV :: Unable to parse XML for ping: 'NoneType' object has no attribute 'getElementsByTagName'.
2023-12-14 14:43:04 - ERROR :: CP Server Thread-13 : Failed to access uri endpoint /api/downloads/1.json. Connection error: HTTPSConnectionPool(host='plex.tv', port=443): Max retries exceeded with url: /api/downloads/1.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0xffff78207850>: Failed to establish a new connection: [Errno -3] Try again'))
2023-12-14 14:43:04 - WARNING :: CP Server Thread-13 : Tautulli PlexTV :: Unable to load JSON for get_plex_updates: the JSON object must be str, bytes or bytearray, not NoneType
2023-12-14 14:43:04 - ERROR :: CP Server Thread-6 : WebUI :: /settings#tabs_tabs-plex_media_server : TypeError: undefined is not an object (evaluating 'downloads.computer[platform]'). (settings:5505)
2023-12-14 14:43:07 - ERROR :: CP Server Thread-8 : Failed to access uri endpoint /api/resources?includeHttps=1. Connection error: HTTPSConnectionPool(host='plex.tv', port=443): Max retries exceeded with url: /api/resources?includeHttps=1 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0xffff78207590>: Failed to establish a new connection: [Errno -3] Try again'))
2023-12-14 14:43:07 - WARNING :: CP Server Thread-8 : Tautulli PlexTV :: Unable to parse XML for get_server_urls: 'NoneType' object has no attribute 'getElementsByTagName'.
2023-12-14 14:43:12 - ERROR :: CP Server Thread-8 : Failed to access uri endpoint /users/account. Connection error: HTTPSConnectionPool(host='plex.tv', port=443): Max retries exceeded with url: /users/account (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0xffff792eb690>: Failed to establish a new connection: [Errno -3] Try again'))
2023-12-14 14:43:12 - WARNING :: CP Server Thread-8 : Tautulli PlexTV :: Unable to parse XML for get_plexpass_status: 'NoneType' object has no attribute 'getElementsByTagName'.
2023-12-14 14:43:12 - DEBUG :: CP Server Thread-8 : Testing secure websocket connection...
2023-12-14 14:43:17 - ERROR :: CP Server Thread-8 : Websocket connection test failed: [Errno -3] Try again

Possible clues:

The docker subnet defined in the desktop app settings is 192.168.65.0/24 (the default). I've noticed in init.log repeating these repeating errors.

[2023-12-14T20:48:56.106136461Z][init][I] [2023-12-14T20:48:56.105950586Z][init][I] eth1: DHCPv4
[2023-12-14T20:48:59.118487629Z][init][I] [2023-12-14T20:48:59.118322379Z][init][I] eth1: DHCPv4 failure: timed out while listening for replies
[2023-12-14T20:49:00.122143379Z][init][I] [2023-12-14T20:49:00.122028296Z][init][I] eth1: DHCPv4
[2023-12-14T20:49:03.130142047Z][init][I] [2023-12-14T20:49:03.130094464Z][init][I] eth1: DHCPv4 failure: timed out while listening for replies

My DHCP server shows repeating DHCP requests coming from 192.168.65.3. unbolt.000-dhcpcd.log shows that IP address was offered from vpnkit, but I don't know exactly what is using that IP address or why it's sending DHCP messages to my DHCP server. I suspect this is not related to my issue but I can't find any other relevant errors in the docker logs.

Reproduce

It will be difficult for others to re-produce. In my case, it's allowing a stable multi-docker installation to run for a period of days until failure.

Expected behavior

Network connections between the containers and the host using the custom bridge network should remain stable.

docker version

Client:
 Cloud integration: v1.0.35+desktop.5
 Version:           24.0.6
 API version:       1.43
 Go version:        go1.20.7
 Git commit:        ed223bc
 Built:             Mon Sep  4 12:28:49 2023
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.25.2 (129061)
 Engine:
  Version:          24.0.6
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.7
  Git commit:       1a79695
  Built:            Mon Sep  4 12:31:36 2023
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.22
  GitCommit:        8165feabfdfe38c65b599c4993d227328c231fca
 runc:
  Version:          1.1.8
  GitCommit:        v1.1.8-0-g82f18fe
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client:
 Version:    24.0.6
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.2-desktop.5
    Path:     /Users/jacob/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.23.0-desktop.1
    Path:     /Users/jacob/.docker/cli-plugins/docker-compose
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.0
    Path:     /Users/jacob/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.20
    Path:     /Users/jacob/.docker/cli-plugins/docker-extension
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v0.1.0-beta.9
    Path:     /Users/jacob/.docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /Users/jacob/.docker/cli-plugins/docker-sbom
  scan: Docker Scan (Docker Inc.)
    Version:  v0.26.0
    Path:     /Users/jacob/.docker/cli-plugins/docker-scan
  scout: Docker Scout (Docker Inc.)
    Version:  v1.0.9
    Path:     /Users/jacob/.docker/cli-plugins/docker-scout

Server:
 Containers: 17
  Running: 17
  Paused: 0
  Stopped: 0
 Images: 18
 Server Version: 24.0.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8165feabfdfe38c65b599c4993d227328c231fca
 runc version: v1.1.8-0-g82f18fe
 init version: de40ad0
 Security Options:
  seccomp
   Profile: unconfined
  cgroupns
 Kernel Version: 6.4.16-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 8
 Total Memory: 8.729GiB
 Name: linuxkit-0e3b4880c424
 ID: 8998ca3f-fffc-43a8-8471-66e3ec92d688
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: daemon is not using the default seccomp profile

Diagnostics ID

C4EA34A4-CB8E-4BE0-81EE-9F972775E736/20231214205155

Additional Info

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions