Parsing thousands of mounts blocks processing of systemd communication leading to dhcp lease expiry and more

### systemd version the issue has been seen with

255.2-3ubuntu2

### Used distribution

Ubuntu Noble 24.04

### Linux kernel version used

6.6.0-14-generic

### CPU architectures issue was seen on

x86_64

### Component

systemd, systemd-networkd, systemd-udevd

### Expected behaviour you didn't see

When mounting a filesystem or creating a tmpfs mount, systemd should be responsive and service important messages on its sockets, such as restarting services or renewing dhcp leases in a timely fashion.

### Unexpected behaviour you saw

This issue was first seen on a cloud instance that was mounting and unmounting tmpfs filesystems every second or so by containerd collecting telemetry.

The system had about 3000 stale mountpoints of the root filesystem under /home due to a mountpoint leak by osqueryd. When the new tmpfs mounts were created, systemd is at 100% cpu and is completely blocked. It appears to parse each of the 3000 mountpoints under the udev filesystem during the tmpfs mount, which takes considerable time. While systemd is blocked, you cannot restart or query the status of services, and the dhcp lease can expire, and not be renewed leading to loss of network connectivity.

This isn't a bug per se, due to the large amount of mounts involved, but the issue is more systemd is blocked and communication to systemd sockets is not consumed, leading to issues such as dhcp lease expiry.

Is it strictly necessary to parse all these udev entries per tmpfs mount?

Is it possible to optimise the mount path to scan the udev entries in chunks, and break inbetween to let systemd service events like dhcp lease renewal etc in a timely fashion? Then it can return to the ongoing mount.

### Steps to reproduce the problem

Start a VM, easier to see the impact if you select 1 vcpu.

Create a file, `repro.sh` with the following contents:

```
#!/bin/bash

NUMDIRS=3000
DIRPREFIX="mnt"
NUM_MOUNTS=0
TDIR=""

cleanup() {
	for idx in $(seq 1 $NUM_MOUNTS); do
		DIRNAME="$DIRPREFIX/$idx"
		sudo umount "$DIRNAME"
		rmdir $DIRNAME
		echo "Removed $DIRNAME"
	done
	if [ -n "$TDIR" -a -e "$TDIR" ]; then 
	   sudo umount $TDIR
	fi
	return
}

trap "cleanup" EXIT

for idx in $(seq 1 $NUMDIRS); do
	DIRNAME="$DIRPREFIX/$idx"
	mkdir -p "$DIRNAME"
	sudo mount /dev/sda1 "$DIRNAME"
	NUM_MOUNTS=$idx
	echo "Created $DIRNAME mount"
done

for idx in $(seq 1 1000); do
	TDIR=$(mktemp -d); 
	sudo mount -o size=1k -t tmpfs tmpfs $TDIR
	sleep 1
	sudo umount "$TDIR"
done
```

Adjust /dev/sda1 to your system, e.g. /dev/vda1.

Run repro.sh, it will create 3000 mounts of the primary filesystem, and then
once per second, create a tmpfs, wait a second, and then unmount the tmpfs.

```
$ chmod +x repro.sh
$ ./repro.sh
Created mnt/1
...
```

Once all 3000 mounts have been created, if you then restart `systemd-networkd` 
to simulate a dhcp lease expiry, you can see that it is blocked waiting for 1.5 to 
2.5 minutes, and during this time, there is no dhcp lease, and networking is down.

```
$ time sudo systemctl restart systemd-networkd

real	2m28.361s
user	0m0.004s
sys	0m0.000s
$ time sudo systemctl restart systemd-networkd
Failed to restart systemd-networkd.service: Transport endpoint is not connected
See system logs and 'systemctl status systemd-networkd.service' for details.

real	1m30.145s
user	0m0.002s
sys	0m0.003s
$ time sudo systemctl restart systemd-networkd

real	2m17.425s
user	0m0.001s
sys	0m0.005s
```

```
$ ping github.com
ping: github.com: Temporary failure in name resolution
$ ping github.com
ping: github.com: Temporary failure in name resolution
$ ping github.com
ping: github.com: Temporary failure in name resolution
```

### Additional program output to the terminal or log subsystem illustrating the issue

```sh
Running an strace during mount shows many udev entries being parsed for each
mountpoint, which takes time.


19:55:06 newfstatat(AT_FDCWD, "/dev/sda1", {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x1), ...}, 0) = 0 <0.000061>
19:55:06 openat(AT_FDCWD, "/", O_RDONLY|O_CLOEXEC|O_PATH|O_DIRECTORY) = 52 <0.000034>
19:55:06 openat(52, "sys", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 53 <0.000038>
19:55:06 newfstatat(53, "", {st_mode=S_IFDIR|0555, st_size=0, ...}, AT_EMPTY_PATH) = 0 <0.000034>
19:55:06 close(52)                      = 0 <0.000032>
19:55:06 openat(53, "dev", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 52 <0.000043>
19:55:06 newfstatat(52, "", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0 <0.000222>
19:55:06 close(53)                      = 0 <0.000021>
19:55:06 openat(52, "block", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 53 <0.000042>
19:55:06 newfstatat(53, "", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0 <0.000034>
19:55:06 close(52)                      = 0 <0.000030>
19:55:06 openat(53, "8:1", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 52 <0.000048>
19:55:06 newfstatat(52, "", {st_mode=S_IFLNK|0777, st_size=0, ...}, AT_EMPTY_PATH) = 0 <0.000032>
19:55:06 readlinkat(53, "8:1", "../../devices/pci0000:00/0000:00"..., 4096) = 86 <0.000037>
19:55:06 close(52)                      = 0 <0.000030>
19:55:06 openat(53, "..", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 52 <0.000037>
19:55:06 close(53)                      = 0 <0.000037>
19:55:06 openat(52, "..", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 53 <0.000179>
19:55:06 close(52)                      = 0 <0.000060>
19:55:06 openat(53, "devices", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 52 <0.000043>
19:55:06 newfstatat(52, "", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0 <0.000033>
19:55:06 close(53)                      = 0 <0.000030>
19:55:06 openat(52, "pci0000:00", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 53 <0.000040>
19:55:06 newfstatat(53, "", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0 <0.000032>
19:55:06 close(52)                      = 0 <0.000031>
19:55:06 openat(53, "0000:00:03.0", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 52 <0.000053>
19:55:06 newfstatat(52, "", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0 <0.000032>
19:55:06 close(53)                      = 0 <0.000108>
19:55:06 openat(52, "virtio0", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 53 <0.000089>
19:55:06 newfstatat(53, "", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0 <0.000064>
19:55:06 close(52)                      = 0 <0.000030>
19:55:06 openat(53, "host0", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 52 <0.000037>
19:55:06 newfstatat(52, "", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0 <0.000029>
19:55:06 close(53)                      = 0 <0.000041>
19:55:06 openat(52, "target0:0:1", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 53 <0.000042>
19:55:06 newfstatat(53, "", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0 <0.000027>
19:55:06 close(52)                      = 0 <0.000044>
19:55:06 openat(53, "0:0:1:0", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 52 <0.000044>
19:55:06 newfstatat(52, "", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0 <0.000036>
19:55:06 close(53)                      = 0 <0.000028>
19:55:06 openat(52, "block", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 53 <0.000040>
19:55:06 newfstatat(53, "", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0 <0.000104>
19:55:06 close(52)                      = 0 <0.000034>
19:55:06 openat(53, "sda", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 52 <0.000042>
19:55:06 newfstatat(52, "", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0 <0.000033>
19:55:06 close(53)                      = 0 <0.000033>
19:55:06 openat(52, "sda1", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 53 <0.000043>
19:55:06 newfstatat(53, "", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0 <0.000062>
19:55:06 close(52)                      = 0 <0.000051>
19:55:06 close(53)                      = 0 <0.000038>
19:55:06 access("/sys/devices/pci0000:00/0000:00:03.0/virtio0/host0/target0:0:1/0:0:1:0/block/sda/sda1/uevent", F_OK) = 0 <0.000107>
19:55:06 openat(AT_FDCWD, "/sys/devices/pci0000:00/0000:00:03.0/virtio0/host0/target0:0:1/0:0:1:0/block/sda/sda1/uevent", O_RDONLY|O_NOCTTY|O_CLOEXEC) = 52 <0.000064>
19:55:06 newfstatat(52, "", {st_mode=S_IFREG|0644, st_size=4096, ...}, AT_EMPTY_PATH) = 0 <0.000038>
19:55:06 read(52, "MAJOR=8\nMINOR=1\nDEVNAME=sda1\nDEV"..., 4104) = 65 <0.000055>
19:55:06 close(52)                      = 0 <0.000068>
19:55:06 readlinkat(AT_FDCWD, "/sys/devices/pci0000:00/0000:00:03.0/virtio0/host0/target0:0:1/0:0:1:0/block/sda/sda1/subsystem", "../../../../../../../../../../cl"..., 4096) = 41 <0.000051>
19:55:06 openat(AT_FDCWD, "/run/udev/data/b8:1", O_RDONLY|O_CLOEXEC) = 52 <0.000063>
19:55:06 newfstatat(52, "", {st_mode=S_IFREG|0644, st_size=1509, ...}, AT_EMPTY_PATH) = 0 <0.000073>
19:55:06 newfstatat(52, "", {st_mode=S_IFREG|0644, st_size=1509, ...}, AT_EMPTY_PATH) = 0 <0.000043>
19:55:06 read(52, "S:disk/by-partuuid/04a52087-8ae8"..., 4096) = 1509 <0.000038>
19:55:06 read(52, "", 4096)             = 0 <0.000056>
19:55:06 close(52)                      = 0 <0.000036>
```

htop shows systemd at 100% cpu:

```
PID	USER	PRI	NI	VIRT	RES		SHR		S	CPU%	MEM%	TIME+	Command
1	root	20	0	40004	30796	9548	R	32.5	0.8	1:18.35		/sbin/init
792	ubuntu	20	0	36580	26880	9472	R	31.8	0.7	0:57.32		/usr/lib/systemd/systemd --user
1351root	20	0	23112	12544	8064	R	20.1	0.3	0:15.57		(networkd)
637	root	20	0	460M	16132	11392	R	13.6	0.4	0:53.56		/usr/libexec/udisks2/udisksd
```
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parsing thousands of mounts blocks processing of systemd communication leading to dhcp lease expiry and more #31137

systemd version the issue has been seen with

Used distribution

Linux kernel version used

CPU architectures issue was seen on

Component

Expected behaviour you didn't see

Unexpected behaviour you saw

Steps to reproduce the problem

Additional program output to the terminal or log subsystem illustrating the issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Parsing thousands of mounts blocks processing of systemd communication leading to dhcp lease expiry and more #31137

Description

systemd version the issue has been seen with

Used distribution

Linux kernel version used

CPU architectures issue was seen on

Component

Expected behaviour you didn't see

Unexpected behaviour you saw

Steps to reproduce the problem

Additional program output to the terminal or log subsystem illustrating the issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions