Skip to content

nix-daemon.socket.in: fix condition unit properties#6282

Closed
flokli wants to merge 1 commit intoNixOS:masterfrom
flokli:nix-daemon-socket-remove-condition-path-is-read-write
Closed

nix-daemon.socket.in: fix condition unit properties#6282
flokli wants to merge 1 commit intoNixOS:masterfrom
flokli:nix-daemon-socket-remove-condition-path-is-read-write

Conversation

@flokli
Copy link
Member

@flokli flokli commented Mar 17, 2022

nix-daemon.socket is used to socket-activate nix-daemon.service when
/nix/var/nix/daemon-socket/socket is accessed.

Having a ConditionPathIsReadWrite on the /nix/var/nix/daemon-socket
directory will cause systemd to just skip if it's not present yet.

[ 237.187747] systemd[1]: Nix Daemon Socket was skipped because of a failed condition check (ConditionPathIsReadWrite=/nix/var/nix/daemon-socket).

As it's the nix-daemon itself that creates this directory, we're in a
chicken-and-egg problem - as long as the folder isn't created,
nix-daemon won't start (as it's only socket-activated), and the socket
unit will get skipped, as the directory doesn't exist yet.

I think we don't actually want to skip starting the socket unit when the
directory doesn't exist yet.

This has surfaced in the systemd 250 bump (which probably did apply some
more rigid checks), where tests with containers that bind-mount
/nix/var/nix/daemon-socket started to fail.

@flokli flokli mentioned this pull request Mar 17, 2022
13 tasks
@edolstra
Copy link
Member

But if the socket is bind-mounted (like in a NixOS container), then we shouldn't start the daemon right?

@flokli
Copy link
Member Author

flokli commented Mar 18, 2022

We could probably skip nix-daemon.service and nix-daemon.socket if /nix/var/nix/daemon-socket/socket itself exist?

Another option would be to have a NixOS option to disable nix-daemon.{service,socket}, and default that to true in containers.

@flokli
Copy link
Member Author

flokli commented Mar 18, 2022

Yup, ConditionPathExists=!/nix/var/nix/daemon-socket/socket will accomplish what we want to do here.

On the host:

? nix-daemon.socket - Nix Daemon Socket
     Loaded: loaded (/etc/systemd/system/nix-daemon.socket; enabled; vendor preset: enabled)
    Drop-In: /nix/store/6wzxsisa5m6w1rrfzrg3wp1qzshlcg6i-system-units/nix-daemon.socket.d
             ??overrides.conf
     Active: active (listening) since Fri 2022-03-18 12:01:55 UTC; 5s ago
      Until: Fri 2022-03-18 12:01:55 UTC; 5s ago
   Triggers: ? nix-daemon.service
     Listen: /nix/var/nix/daemon-socket/socket (Stream)
     CGroup: /system.slice/nix-daemon.socket

In the container:

○ nix-daemon.socket - Nix Daemon Socket
     Loaded: loaded (/etc/systemd/system/nix-daemon.socket; enabled; vendor preset: enabled)
    Drop-In: /nix/store/f6ch4bc0szmyg6xzb6n0pazc0f08xjzx-system-units/nix-daemon.socket.d
             └─overrides.conf
     Active: inactive (dead)
   Triggers: ● nix-daemon.service
  Condition: start condition failed at Fri 2022-03-18 12:02:21 UTC; 2min 19s ago
             └─ ConditionPathExists=!/nix/var/nix/daemon-socket/socket was not met
     Listen: /nix/var/nix/daemon-socket/socket (Stream)

Mar 18 12:02:21 foo systemd[1]: Nix Daemon Socket was skipped because of a…ket).
Hint: Some lines were ellipsized, use -l to show in full.

I'll update this PR.

flokli added a commit to flokli/nix that referenced this pull request Mar 18, 2022
nix-daemon.socket is used to socket-activate nix-daemon.service when
/nix/var/nix/daemon-socket/socket is accessed.

Having a ConditionPathIsReadWrite on the /nix/var/nix/daemon-socket
directory will cause systemd to just skip if it's not present yet.

> [  237.187747] systemd[1]: Nix Daemon Socket was skipped because of a failed condition check (ConditionPathIsReadWrite=/nix/var/nix/daemon-socket).

As it's the nix-daemon itself that creates this directory, we're in a
chicken-and-egg problem - as long as the folder isn't created,
nix-daemon won't start (as it's only socket-activated), and the socket
unit will get skipped, as the directory doesn't exist yet.

As elaborated in
NixOS#6282 (comment), the
initial goal of this statement was to detect the
"/nix/var/nix/daemon-socket is bind-mounted into a container" situation.
In these cases, we want to skip starting nix-daemon.socket, as something
the host is providing the `socket` file in there.

This can be better described by skipping `nix-daemon.socket` if
`/nix/var/nix/daemon-socket/socket` already exists.

This has surfaced in the systemd 250 bump (which probably did apply some
more rigid checks), where tests with containers that bind-mount
/nix/var/nix/daemon-socket started to fail.
@flokli flokli force-pushed the nix-daemon-socket-remove-condition-path-is-read-write branch from 925703b to 633fbe1 Compare March 18, 2022 12:17
nix-daemon.socket is used to socket-activate nix-daemon.service when
/nix/var/nix/daemon-socket/socket is accessed.

Having a ConditionPathIsReadWrite on the /nix/var/nix/daemon-socket
directory will cause systemd to just skip if it's not present yet.

> [  237.187747] systemd[1]: Nix Daemon Socket was skipped because of a failed condition check (ConditionPathIsReadWrite=/nix/var/nix/daemon-socket).

As it's the nix-daemon itself that creates this directory, we're in a
chicken-and-egg problem - as long as the folder isn't created,
nix-daemon won't start (as it's only socket-activated), and the socket
unit will get skipped, as the directory doesn't exist yet.

As elaborated in
NixOS#6282 (comment), the
initial goal of this statement was to detect the
"/nix/var/nix/daemon-socket is bind-mounted into a container" situation.
In these cases, we want to skip starting nix-daemon.socket, as something
the host is providing the `socket` file in there.

This can be better described by skipping `nix-daemon.socket` if
`/nix/var/nix/daemon-socket/socket` already exists.

This has surfaced in the systemd 250 bump (which probably did apply some
more rigid checks), where tests with containers that bind-mount
/nix/var/nix/daemon-socket started to fail.
@flokli flokli force-pushed the nix-daemon-socket-remove-condition-path-is-read-write branch from 633fbe1 to 021753f Compare March 18, 2022 12:19
@flokli
Copy link
Member Author

flokli commented Mar 18, 2022

@edolstra I updated this PR, as well as the downstream nixpkgs workaround in NixOS/nixpkgs#164644. PTAL.

@flokli flokli changed the title nix-daemon.socket.in: drop ConditionPathIsReadWrite= statement nix-daemon.socket.in: fix condition unit properties Mar 18, 2022
@edolstra
Copy link
Member

ConditionPathExists=!@localstatedir@/nix/daemon-socket/socket

Doesn't this mean that if the service is ever abnormally killed leaving behind the socket, that it can't be restarted by systemd anymore?

@Ericson2314
Copy link
Member

^ Yes I was worried about that too. It seems like what we are doing is the common case, and systemd ought to support it. Maybe we can ask upstream what they recommend.

@flokli
Copy link
Member Author

flokli commented Mar 18, 2022

Doesn't this mean that if the service is ever abnormally killed leaving behind the socket, that it can't be restarted by systemd anymore?

Good catch! I quickly reproduced this, by a nix store ping from a regular user, sending a SIGKILL to the daemon, and checking the status of the units before and after.

The .socket unit won't fail if we crash the nix-daemon.service binary, but keep running. In case of a new client connection, systemd will trigger nix-daemon.service again, and successfully connect the client.

This means a manual restart of the .socket unit or .service isn't necessary.

However, of course we still want to be able to explicitly restart nix-daemon.socket, for example when switching to a new system config with a new version of Nix. There's RemoveOnStop=yes, which we could set in nix-daemon.socket unit file, which will cause systemd to remove the socket file when stopping/restarting the .socket unit, and cleaning it up when trying to start.

However, the Condition is probably checked before the cleanup is happening, so this can still break if for some reason there's a stale socket file around - for example, when unplugging the power and leaving the stale socket file around.

I'm not really sure how to fix this. It's probably too late to move this socket file to something tmpfs-backed.

@flokli
Copy link
Member Author

flokli commented Mar 18, 2022

To recap:

  • the refactoring in systemd now causes ConditionPathIsReadWrite=/nix/var/nix/daemon-socket to be not met if the folder doesn't exist yet. Which feels right. This is now causing nix-daemon on the host to get skipped. This will also cause nix-daemon on the host to get skipped if non-NixOS distros use this unit file as-is.
  • We want to skip starting nix-daemon inside containers. We previously bind-mounted the /nix/var/nix/daemon-socket directory read-only from the host.

The proposal from this PR tried to skip the socket unit from activation if the socket file (/nix/var/nix/daemon-socket/socket) is already present. However, it can cause some side-effects, and we might end up with cases where a stale socket file might cause skip of the .socket file.

Maybe the right way to fix this is letting systemd-tmpfiles create the folder on the host, by shipping a nixos-daemon.conf tmpfile containing d /nix/var/nix/daemon-socket 0755 root root - -?

We'd need to check how this effects read-only mountpoints in containers though.

What do you think?

@flokli
Copy link
Member Author

flokli commented Mar 18, 2022

Alternative PR is up at #6285

@Ericson2314
Copy link
Member

It seems like we should start with the alternative, then see, what, if anything is needed in addition?

@edolstra
Copy link
Member

We want to skip starting nix-daemon inside containers.

The problem is that we don't always want to skip starting nix-daemon in a container, but only if we're bind-mounting the host's Nix store. You can have containers with their own Nix store, which is why we can't use ConditionVirtualization.

@flokli
Copy link
Member Author

flokli commented Mar 18, 2022

We want to skip starting nix-daemon inside containers.

The problem is that we don't always want to skip starting nix-daemon in a container, but only if we're bind-mounting the host's Nix store. You can have containers with their own Nix store, which is why we can't use ConditionVirtualization.

Yes, this is addressed by #6285 - we keep the Condition statement, just ensure the directory is created. Let's close this PR in favor of #6285. @edolstra, can you take a look?

@Ericson2314
Copy link
Member

If you want to close this, I have perms to do that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants