Fix file exists error when restoring remote snapshot after unexpected… by wswsmao · Pull Request #2091 · containerd/stargz-snapshotter

wswsmao · 2025-07-23T12:58:56Z

After PR #2076, in scenarios where the process restarts unexpectedly (such as due to OOM), restoring a remote snapshot may fail if the target directory already exists. For example:

{"error":"failed to create new snapshotter: failed to restore remote snapshot: failed to create remote snapshot directory: sha256:52fa3204fe00dd4d492873408e2ef89c13e142748931086998cc2eca69549b48: mkdir /var/lib/containerd-stargz-grpc/snapshotter/snapshots/1: file exists","level":"fatal","msg":"failed to configure snapshotter","time":"2025-07-23T12:48:12.070612675Z"}

This PR adjusts the logic in restoreRemoteSnapshot so that if mkdir fails because the directory already exists, it is treated as a result of an ungraceful shutdown and the process continues.

… restart Signed-off-by: abushwang <[email protected]>

wswsmao · 2025-07-23T12:59:08Z

However, this situation may lead to fscache duplicating cached data. There are two possible solutions:

Since this scenario is caused by an abnormal exit, users are expected to manually clean up the cache, and we can update the documentation to remind users of this requirement.
Clean up the cache on every startup. Currently, the logic ensures that the cache is cleaned up on graceful exit, so in theory, the cache directory should be empty on each restart.

ktock

LGTM. CI flakiness doesn't seem to related to this PR. I'll work on fixing that in a separated patch.

ktock · 2025-07-24T06:29:52Z

However, this situation may lead to fscache duplicating cached data. There are two possible solutions:

1. Since this scenario is caused by an abnormal exit, users are expected to manually clean up the cache, and we can update the documentation to remind users of this requirement.

2. Clean up the cache on every startup. Currently, the logic ensures that the cache is cleaned up on graceful exit, so in theory, the cache directory should be empty on each restart.

Let's take the approach 1 for now.

Fix file exists error when restoring remote snapshot after unexpected…

b3743e7

… restart Signed-off-by: abushwang <[email protected]>

wswsmao closed this Jul 24, 2025

wswsmao reopened this Jul 24, 2025

wswsmao closed this Jul 24, 2025

wswsmao reopened this Jul 24, 2025

wswsmao closed this Jul 24, 2025

wswsmao reopened this Jul 24, 2025

ktock approved these changes Jul 24, 2025

View reviewed changes

ktock merged commit 3aa69ea into containerd:main Jul 24, 2025
211 of 217 checks passed

wswsmao mentioned this pull request Jul 24, 2025

docs: Add Unexpected restart handling #2092

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix file exists error when restoring remote snapshot after unexpected…#2091

Fix file exists error when restoring remote snapshot after unexpected…#2091
ktock merged 1 commit intocontainerd:mainfrom
wswsmao:main

wswsmao commented Jul 23, 2025

Uh oh!

wswsmao commented Jul 23, 2025 •

edited

Loading

Uh oh!

ktock left a comment

Uh oh!

ktock commented Jul 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wswsmao commented Jul 23, 2025

Uh oh!

wswsmao commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ktock left a comment

Choose a reason for hiding this comment

Uh oh!

ktock commented Jul 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wswsmao commented Jul 23, 2025 •

edited

Loading