Skip to content

daemon: restore: register containers without rwlayer#51724

Merged
vvoland merged 1 commit intomoby:masterfrom
akerouanton:fix-zombie-without-rwlayer
Mar 26, 2026
Merged

daemon: restore: register containers without rwlayer#51724
vvoland merged 1 commit intomoby:masterfrom
akerouanton:fix-zombie-without-rwlayer

Conversation

@akerouanton
Copy link
Copy Markdown
Member

@akerouanton akerouanton commented Dec 15, 2025

- What I did

Under some undetermined circumstances, a container might end up with no rwlayer on daemon restart. The restore logic would add them to the containersReplica list but not to the 'containers' list. Thus, users would see them on docker ps -a, but would be unable to remove them.

To fix that, add these containers to the 'containers' list as well.

- Human readable description for the release notes

Fix a rare bug that could cause containers to become unremovable.

Comment thread daemon/daemon.go
// A container without a rwlayer is in a bad state, but we must register that container to let users
// remove it. So, log the error but do not early-return.
logger.WithError(err).Error("failed to load container mount")
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the history, looks like this logic has been there for quite a long time...

We should investigate why this is happening

Copy link
Copy Markdown
Member Author

@akerouanton akerouanton Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't put as much details in my commit message as I did in #51692.

Before 9f5f4f5, restore would scan the state directory to build the list of containers to restore (i.e. put in the restore-local containers slice). If the rwlayer was missing, it'd log the error message and the early-return I'm removing in this PR would prevent the container from being added to daemon.containers and daemon.containersReplica.

Starting with 9f5f4f5, loadContainers just scans the state directory to build the list of containers, without checking rwlayers. Then, that list is passed to restore, which does the check. If the check fails, the container isn't register()'d (i.e. not added to daemon.containers). However, the whole of containers is used to rebuild daemon.containersReplica.

Hence, I believe that this is an old bug (probably going back to the introduction of containerd snapshotter support) that no one spotted beforehand because the only and very subtle symptom was a state dir with more containers than returned by docker ps -a.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we have a sentinel error for missing layers (which .. probably should be changed to implement errdefs.NotFound); should we handle "not found" errors different from other errors here?

func (ls *layerStore) GetRWLayer(id string) (RWLayer, error) {
ls.locker.Lock(id)
defer ls.locker.Unlock(id)
ls.mountL.Lock()
mount := ls.mounts[id]
ls.mountL.Unlock()
if mount == nil {
return nil, ErrMountDoesNotExist

Actually; looks like the containerd snapshotter does have some special handling for "not-found" errors in some parts;

func (i *ImageService) GetLayerByID(cid string) (container.RWLayer, error) {
ctx := context.TODO()
sn := i.client.SnapshotService(i.StorageDriver())
if _, err := sn.Stat(ctx, cid); err != nil {
if !cerrdefs.IsNotFound(err) {
return nil, fmt.Errorf("failed to stat snapshot %s: %w", cid, err)
}
return nil, errdefs.NotFound(fmt.Errorf("RW layer for container %s not found", cid))
}
ls := i.client.LeasesService()
lss, err := ls.List(ctx, "id=="+cid)
if err != nil {
return nil, err
}
switch len(lss) {
case 0:
return nil, errdefs.NotFound(errors.New("rw layer lease not found for container " + cid))
case 1:
// found
default:
log.G(ctx).WithFields(log.Fields{"container": cid, "leases": lss}).Warn("multiple leases with the same id found, this should not happen")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we have a sentinel error for missing layers (which .. probably should be changed to implement errdefs.NotFound); should we handle "not found" errors different from other errors here?

No, we should handle all errors the same way to ensure that the container is added to both daemon.containers and daemon.containersReplica.

Comment thread daemon/daemon.go Outdated
logger.WithError(err).Error("failed to load container mount")
return
}
c.RWLayer = rwlayer
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this in an else or something similar? Currently, it means that we're using a value that was produced by a function that returned an error, which means undocumented behavior; we should probably make that explicit, and only use the value for non-error returns?

It's risky to keep this as-is, as either a linter will start complaining, or someone stumbles on this code and may change it (e.g. re-introduce an early return).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this in an else or something similar?

Done

someone stumbles on this code and may change it (e.g. re-introduce an early return).

I wrote this comment specifically to ensure that no one is going to re-add an early-return here:

// A container without a rwlayer is in a bad state, but we must register that container to let users
// remove it. So, log the error but do not early-return.

Comment thread daemon/daemon.go
// A container without a rwlayer is in a bad state, but we must register that container to let users
// remove it. So, log the error but do not early-return.
logger.WithError(err).Error("failed to load container mount")
return
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we have a sentinel error for missing layers (which .. probably should be changed to implement errdefs.NotFound); should we handle "not found" errors different from other errors here?

func (ls *layerStore) GetRWLayer(id string) (RWLayer, error) {
ls.locker.Lock(id)
defer ls.locker.Unlock(id)
ls.mountL.Lock()
mount := ls.mounts[id]
ls.mountL.Unlock()
if mount == nil {
return nil, ErrMountDoesNotExist

Actually; looks like the containerd snapshotter does have some special handling for "not-found" errors in some parts;

func (i *ImageService) GetLayerByID(cid string) (container.RWLayer, error) {
ctx := context.TODO()
sn := i.client.SnapshotService(i.StorageDriver())
if _, err := sn.Stat(ctx, cid); err != nil {
if !cerrdefs.IsNotFound(err) {
return nil, fmt.Errorf("failed to stat snapshot %s: %w", cid, err)
}
return nil, errdefs.NotFound(fmt.Errorf("RW layer for container %s not found", cid))
}
ls := i.client.LeasesService()
lss, err := ls.List(ctx, "id=="+cid)
if err != nil {
return nil, err
}
switch len(lss) {
case 0:
return nil, errdefs.NotFound(errors.New("rw layer lease not found for container " + cid))
case 1:
// found
default:
log.G(ctx).WithFields(log.Fields{"container": cid, "leases": lss}).Warn("multiple leases with the same id found, this should not happen")

Comment thread daemon/daemon.go
Comment on lines +296 to 298
// A container without a rwlayer is in a bad state, but we must register that container to let users
// remove it. So, log the error but do not early-return.
logger.WithError(err).Error("failed to load container mount")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to change the state to dead for these?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so.

dead is set in a specific code path (i.e. daemon.cleanupContainer) and shows that the Engine stopped while progressing through it. This means, either the container was autoremove, or a ContainerRm API call was in progress.

This PR is addressing another edge case. From the Desktop report:

CONTAINER ID   IMAGE                      COMMAND                  CREATED        STATUS                      PORTS     NAMES
44b9cb711bf2   postgres:17.6-alpine3.22   "docker-entrypoint.s…"   3 weeks ago    Exited (0) 3 weeks ago                
ea85756a438f   postgres:17.6-alpine3.22   "docker-entrypoint.s…"   5 weeks ago    Exited (0) 5 weeks ago                
cbfecd6aedec   88e904502786               "docker-entrypoint.s…"   2 months ago   Exited (128) 2 months ago

I have a good explanation of how a container can end up in the Dead state (cf. my previous PR), but I have no explanation for these Exited containers without a rwlayer. Transitioning containers to Dead would remove an important clue that could be useful in future bug reports and investigations.

Moreover, the current expectation is for dead containers to be reclaimed automatically by daemon.restore(). However, that method only reclaims containers which have autoremove set:

moby/daemon/daemon.go

Lines 524 to 535 in ec136d7

if cfg.AutoRestart && c.ShouldRestart() && !c.NetworkSettings.HasSwarmEndpoint && c.HasBeenStartedBefore {
mapLock.Lock()
restartContainers[c] = make(chan struct{})
mapLock.Unlock()
} else if c.HostConfig != nil && c.HostConfig.AutoRemove {
// Remove the container if live-restore is disabled or if the container has already exited.
if !cfg.LiveRestoreEnabled || !alive {
mapLock.Lock()
removeContainers[c.ID] = c
mapLock.Unlock()
}
}

You pointed out on Slack:

we’d be restoring a container without a writable layer, so we wouldn’t be able to start it (I think?)

Yes, and the restore logic might even try to restart a container without a rwlayer.

daemon.start() calls daemon.conditionalMountOnStart() which calls daemon.Mount(). That method returns an error if the container has no rwlayer:

moby/daemon/daemon.go

Lines 1510 to 1512 in ec136d7

if container.RWLayer == nil {
return errors.New("RWLayer of container " + container.ID + " is unexpectedly nil")
}

So the (re)start flow will fail correctly.

I think we should rely on that instead of doing too much changes in daemon.restore(). Users will be able to docker ps / docker inspect / docker rm their broken containers, but won't be able to start it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I have no explanation for these Exited containers without a rwlayer.

Curious; could this be containers before/after switching to the containerd image store? Do we correctly detect if a container existed, but was created with graph-drivers, and now try to find its writable layer in containerd (e.g.)?

However, that method only reclaims containers which have autoremove set:

That looks like a potential bug, or at least to my knowledge, a "DEAD" container should always be attempted to be removed, as it's no longer functional, so the only thing that can be done is for it to be garbage collected 🤔

Users will be able to docker ps / docker inspect / docker rm their broken containers, but won't be able to start it.

Hm, right; but do we correctly handle things like docker commit etc?

@vvoland
Copy link
Copy Markdown
Contributor

vvoland commented Jan 15, 2026

Moving to 29.3 to unblock 29.2. We can still backport it to 29.2.X later as it's a bugfix.

Under some undetermined circumstances, a container might end up with no
rwlayer on daemon restart. The restore logic would add them to the
containersReplica list but not to the 'containers' list. Thus, users
would see them on `docker ps -a`, but would be unable to remove them.

To fix that, add these containers to the 'containers' list as well.

Signed-off-by: Albin Kerouanton <[email protected]>
@thaJeztah thaJeztah force-pushed the fix-zombie-without-rwlayer branch from 283c686 to 968fb57 Compare March 24, 2026 22:50
Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah thaJeztah moved this from New to Complete in 🔦 Maintainer spotlight Mar 26, 2026
@vvoland vvoland merged commit b6265d0 into moby:master Mar 26, 2026
183 of 184 checks passed
@akerouanton akerouanton deleted the fix-zombie-without-rwlayer branch March 27, 2026 06:34
@thaJeztah thaJeztah modified the milestones: 29.4.0, 29.3.2 Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/daemon Core Engine impact/changelog kind/bugfix PR's that fix bugs process/cherry-pick release-blocker PRs we want to block a release on

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants