Description
After upgrading from Docker Engine 29.1.4 to 29.2.1, short-lived containers can get permanently stuck in Running state even though their process has exited. docker wait hangs forever, docker inspect shows Running: true with a dead PID, and no die event is emitted. docker rm -f is the only way to unblock. This happens when the system clock steps backward (e.g., NTP correction after VM snapshot restore) during a container's lifetime.
Root cause
PR #51925 added shouldIgnoreExitEventWithLock in daemon/monitor.go to filter duplicate TaskExit events. The StateRunning case compares e.ExitedAt (recorded by the containerd shim's time.Now()) against c.State.StartedAt (recorded by dockerd's time.Now()):
case containertypes.StateRunning:
return !e.ExitedAt.IsZero() && e.ExitedAt.Before(c.State.StartedAt)
If CLOCK_REALTIME steps backward between dockerd capturing startupTime (line 225 of start.go) and the shim capturing exitedAt, a legitimate first-and-only exit event is silently dropped.
(SetRunning stores StartedAt via .UTC() which strips the monotonic clock reading, forcing Before() to use wall-clock comparison.)
Logs from production
# Only one create + start, no restart:
docker events:
04:02:04 container create c47e906d...
04:02:04 container start c47e906d...
# The exit event was dropped:
dockerd log:
"ignoring duplicate container exit event"
container=c47e906d... state=running exitCode=0
exitedAt="2026-03-06 04:02:04.096"
docker inspect:
StartedAt = 2026-03-06T04:02:15.267 (pre-step, ahead clock)
# CLOCK_REALTIME stepped backward during the container's lifetime:
journald:
"Time jumped backwards, rotating" at 04:01:58
exitedAt (04:02:04, post-step corrected clock) < StartedAt (04:02:15, pre-step ahead clock). There was no previous run, and the filter incorrectly classified a legitimate exit as a stale duplicate.
Reproduce
Any backward step of CLOCK_REALTIME during a container's lifetime triggers this. Our case:
- Boot a Firecracker VM, let NTP sync, take a snapshot
- Restore the snapshot a day later (guest clock is ahead by several seconds due to imprecise TSC offset)
systemd-timesyncd detects the offset and steps the clock backward via clock_adjtime(ADJ_SETOFFSET)
- A container started before the step whose process exits after it gets its exit event dropped (
exitedAt < startupTime)
Expected behavior
No response
docker version
Client: Docker Engine - Community
Version: 29.2.1
API version: 1.53
Go version: go1.25.6
Git commit: a5c7197
Built: Mon Feb 2 17:17:24 2026
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 29.2.1
API version: 1.53 (minimum version 1.44)
Go version: go1.25.6
Git commit: 6bc6209
Built: Mon Feb 2 17:17:24 2026
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v2.2.1
GitCommit: dea7da592f5d1d2b7755e3a161be07f43fad8f75
runc:
Version: 1.3.4
GitCommit: v1.3.4-0-gd6d73eb8
docker-init:
Version: 0.19.0
GitCommit: de40ad0
docker info
Additional Info
No response
Description
After upgrading from Docker Engine 29.1.4 to 29.2.1, short-lived containers can get permanently stuck in
Runningstate even though their process has exited.docker waithangs forever,docker inspectshowsRunning: truewith a dead PID, and nodieevent is emitted.docker rm -fis the only way to unblock. This happens when the system clock steps backward (e.g., NTP correction after VM snapshot restore) during a container's lifetime.Root cause
PR #51925 added
shouldIgnoreExitEventWithLockindaemon/monitor.goto filter duplicateTaskExitevents. TheStateRunningcase comparese.ExitedAt(recorded by the containerd shim'stime.Now()) againstc.State.StartedAt(recorded by dockerd'stime.Now()):If
CLOCK_REALTIMEsteps backward between dockerd capturingstartupTime(line 225 ofstart.go) and the shim capturingexitedAt, a legitimate first-and-only exit event is silently dropped.(
SetRunningstoresStartedAtvia.UTC()which strips the monotonic clock reading, forcingBefore()to use wall-clock comparison.)Logs from production
exitedAt(04:02:04, post-step corrected clock) <StartedAt(04:02:15, pre-step ahead clock). There was no previous run, and the filter incorrectly classified a legitimate exit as a stale duplicate.Reproduce
Any backward step of
CLOCK_REALTIMEduring a container's lifetime triggers this. Our case:systemd-timesyncddetects the offset and steps the clock backward viaclock_adjtime(ADJ_SETOFFSET)exitedAt < startupTime)Expected behavior
No response
docker version
Client: Docker Engine - Community Version: 29.2.1 API version: 1.53 Go version: go1.25.6 Git commit: a5c7197 Built: Mon Feb 2 17:17:24 2026 OS/Arch: linux/amd64 Context: default Server: Docker Engine - Community Engine: Version: 29.2.1 API version: 1.53 (minimum version 1.44) Go version: go1.25.6 Git commit: 6bc6209 Built: Mon Feb 2 17:17:24 2026 OS/Arch: linux/amd64 Experimental: false containerd: Version: v2.2.1 GitCommit: dea7da592f5d1d2b7755e3a161be07f43fad8f75 runc: Version: 1.3.4 GitCommit: v1.3.4-0-gd6d73eb8 docker-init: Version: 0.19.0 GitCommit: de40ad0docker info
Additional Info
No response