-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Description
systemd version the issue has been seen with
255
Used distribution
Archlinux 2024-03-30
Linux kernel version used
6.8.2-arch2-1
CPU architectures issue was seen on
x86_64
Component
systemd
Expected behaviour you didn't see
Timers running normally after 2024-03-31 00:00:00 UTC.
Unexpected behaviour you saw
At 2024-03-31 00:00:00 UTC, some (at least three) timers started going into a loop of restarting their services. Two physical hosts and a couple of virtual hosts in the GMT time zone were affected. One physical host in the GMT+1 time zone was unaffected despite practically identical setup.
I noticed the issue from a Zabbix warning notifying me that shadow.service was failing. I noticed that it was failing because of start-limit-hit, and that shadow.timer was trying to start it continuously. shadow.service is a very short-lived service, so the restart loop meant many starts per second.
I tried stopping the timer, starting the service manually (that worked), then re-starting the timer, but the symptoms re-occurred. I tried reboot one of the hosts completely, but the issue prevailed.
As I was going to sleep, I just shutdown one physical host and disabled the affected timers to check them the next day. The next day, when I checked my hosts, everything was working normally again and I was able to re-start the stopped timers without any issues.
I suspect that this has to do with the DST change that happened on that day.
I also posted a thread about this on the Arch subreddit here. Up to this point, wwo other people had symptoms that seem compatible with mine (link. link). Note that all of these peoples are in the GMT time zone, which is in line with my observation: the only host not affected on my end was in GMT+1 to start with.
One can easily check whether my hosts were impacted or not by running:
journalctl --since "2024-03-31 00:00:00" --until "2024-03-31 00:05:00"
Steps to reproduce the problem
Apologies as I didn't have time yet to reproduce this issue. Reproducing it seems a bit complex as I'll need to set up a VM and do the following:
- Set up a VM with the physical clock set to 2024-03-29 (well before the issue) in the GMT time zone
- Shut down the VM, fast forward to 2024-03-30 23:55
- Observe as the VM runs over to 2024-03-31 00:00:00
- Observe whether
shadow.servicegoes into the aforementioned restart loop
If I get around to that in the near future, I'll report here.