Skip to content

fix(systemd): start Docker engine *after* DNS resolution is ready#48812

Merged
thaJeztah merged 1 commit intomoby:masterfrom
Octol1ttle:patch-1
Nov 5, 2024
Merged

fix(systemd): start Docker engine *after* DNS resolution is ready#48812
thaJeztah merged 1 commit intomoby:masterfrom
Octol1ttle:patch-1

Conversation

@Octol1ttle
Copy link
Copy Markdown
Contributor

On systems using systemd to autostart Docker on boot, containers might encounter a problem where they will not have any DNS access until the container is restarted manually.

- What I did
Added an additional target for the Docker Engine service: nss-lookup.target. This target is reached when DNS resolution is ready (see https://wiki.archlinux.org/title/Systemd#Running_services_after_the_network_is_up, paragraph "If a service needs to perform DNS queries...")
- How I did it
Using a text editor.
- How to verify it
Check that containers launched during Docker Engine's systemd autostart can successfully perform DNS queries.
- Description for the changelog

Fix DNS queries failing when containers are launched via `systemd` autostart on boot

- A picture of a cute animal (not mandatory but encouraged)
image

On systems using systemd to autostart Docker on boot, containers might encounter a problem where they will not have any DNS access until the container is restarted manually. This PR fixes this issue by requiring that the Docker engine service starts after nss-lookup.target. This target is reached when DNS resolution is available. See https://wiki.archlinux.org/title/Systemd#Running_services_after_the_network_is_up (paragraph "If a service needs to perform DNS queries...")

Signed-off-by: Octol1ttle <[email protected]>
Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM from reading those docs, and systemd's own reference (nss-lookup.target). I could use some extra eyes to be sure though (we had some targets that were added at some point that caused severe delays when running in headless situations, but not sure if that applies here).

Looking at these, I'm also wondering if we should have nss-user-lookup.target (definitely not for this PR though).

@thaJeztah
Copy link
Copy Markdown
Member

cc @tianon 🤗

@tianon
Copy link
Copy Markdown
Member

tianon commented Nov 4, 2024

My reading is that this is technically a subset of network-online.target, especially for Docker's purposes. 🤔

From that perspective, it should be mostly harmless to add, but I do wonder how one might reproduce the issue this purportedly fixes? In other words, in what situations would network-online.target be satisfied (which we have in both After= and Wants=), but not have nss-lookup.target satisified yet? (and such that a simple ordering relationship between us and nss-lookup.target "fixes" it)

@Octol1ttle
Copy link
Copy Markdown
Contributor Author

Octol1ttle commented Nov 5, 2024

My theory is that this happens on systems where two different services are responsible for network connectivity and DNS configuration respectively. This is the case on our machine, where we (my team) use systemd-networkd for network and systemd-resolved for DNS.

We encountered this issue while using a WireGuard image: https://github.com/jordanpotter/docker-wireguard

Upon machine restart, WireGuard was unable to connect using its configuration, throwing "Failed to connect to external DNS server". Restarting the container manually would fix this. Adding nss-lookup.target to Docker Service file eliminated this issue completely

@vvoland
Copy link
Copy Markdown
Contributor

vvoland commented Nov 5, 2024

The failure is unrelated to this PR.

Opened a separate ticket: #48818

Copy link
Copy Markdown
Member

@akerouanton akerouanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at systemd repo, it seems there's no particular relationship between nss-lookup.target and network-online.target (see here). It's up to each distro / user to define which service should be started before nss-lookup.target. That's what systemd-resolved.service does.

@thaJeztah
Copy link
Copy Markdown
Member

This is the case on our machine, where we (my team) use systemd-networkd for network and systemd-resolved for DNS.

I wonder if this would be worth bringing up to the systemd maintainers; my train of thought there is that systemd-resolved is part of systemd, and I wonder if it's within the line of expectations for other packages to have to update their systemd units depending on whether systemd-resolved is installed or not (i.e., should other units have the same expectation for DNS to be available in both situations?)

@thaJeztah
Copy link
Copy Markdown
Member

In either case, I think this change is fine to bring in, so let me merge it

@thaJeztah thaJeztah merged commit e49bce5 into moby:master Nov 5, 2024
@thaJeztah thaJeztah added this to the 28.0.0 milestone Nov 5, 2024
@akerouanton
Copy link
Copy Markdown
Member

I wonder if this would be worth bringing up to the systemd maintainers; my train of thought there is that systemd-resolved is part of systemd, and I wonder if it's within the line of expectations for other packages to have to update their systemd units depending on whether systemd-resolved is installed or not (i.e., should other units have the same expectation for DNS to be available in both situations?)

When the Engine starts up, it starts all containers marked as --restart=always, and for every container started, it looks at /etc/resolv.conf to get the list of upstream resolvers the embedded resolver should forward to. Things are a bit special with systemd-resolved because it puts 127.0.0.53 in /etc/resolv.conf, and in that case we look at /run/systemd/resolve/resolv.conf to get the real upstream servers.

Since the engine unit wasn't targeting nss-lookup.target, there wasn't any guarantee that /etc/resolv.conf was modified by systemd-resolved by the time the Engine tried to start containers. So we'd use whatever upstream server was set in the pristine /etc/resolv.conf instead of what's configured through systemd-resolve CLI tool, or through /etc/systemd/resolved.conf.

So I think systemd-resolved only plays an indirect role in this issue. That is, if any other stub resolver was used, the same issue could happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants