Skip to content

Timers started looping service at 2024-03-31 00:00:00 UTC #32039

@michael-borkowski

Description

@michael-borkowski

systemd version the issue has been seen with

255

Used distribution

Archlinux 2024-03-30

Linux kernel version used

6.8.2-arch2-1

CPU architectures issue was seen on

x86_64

Component

systemd

Expected behaviour you didn't see

Timers running normally after 2024-03-31 00:00:00 UTC.

Unexpected behaviour you saw

At 2024-03-31 00:00:00 UTC, some (at least three) timers started going into a loop of restarting their services. Two physical hosts and a couple of virtual hosts in the GMT time zone were affected. One physical host in the GMT+1 time zone was unaffected despite practically identical setup.

I noticed the issue from a Zabbix warning notifying me that shadow.service was failing. I noticed that it was failing because of start-limit-hit, and that shadow.timer was trying to start it continuously. shadow.service is a very short-lived service, so the restart loop meant many starts per second.

I tried stopping the timer, starting the service manually (that worked), then re-starting the timer, but the symptoms re-occurred. I tried reboot one of the hosts completely, but the issue prevailed.

As I was going to sleep, I just shutdown one physical host and disabled the affected timers to check them the next day. The next day, when I checked my hosts, everything was working normally again and I was able to re-start the stopped timers without any issues.

I suspect that this has to do with the DST change that happened on that day.

I also posted a thread about this on the Arch subreddit here. Up to this point, wwo other people had symptoms that seem compatible with mine (link. link). Note that all of these peoples are in the GMT time zone, which is in line with my observation: the only host not affected on my end was in GMT+1 to start with.

One can easily check whether my hosts were impacted or not by running:

journalctl --since "2024-03-31 00:00:00" --until "2024-03-31 00:05:00"

Steps to reproduce the problem

Apologies as I didn't have time yet to reproduce this issue. Reproducing it seems a bit complex as I'll need to set up a VM and do the following:

  • Set up a VM with the physical clock set to 2024-03-29 (well before the issue) in the GMT time zone
  • Shut down the VM, fast forward to 2024-03-30 23:55
  • Observe as the VM runs over to 2024-03-31 00:00:00
  • Observe whether shadow.service goes into the aforementioned restart loop

If I get around to that in the near future, I'll report here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug 🐛Programming errors, that need preferential fixingpid1

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions