fusemanager: fix container fail after ttl timeout in detach mode#1905
Merged
ktock merged 1 commit intocontainerd:mainfrom Jan 7, 2025
Merged
fusemanager: fix container fail after ttl timeout in detach mode#1905ktock merged 1 commit intocontainerd:mainfrom
ktock merged 1 commit intocontainerd:mainfrom
Conversation
ktock
reviewed
Dec 23, 2024
snapshot/snapshot.go
Outdated
Comment on lines
+747
to
+748
| // In detach mode, rs is taken over by fusemanager, | ||
| // and there may be running containers, so we skip clean |
Member
There was a problem hiding this comment.
No need to mention about fusemanager as snapshot/snapshot.go is agnostic about fusemanager.
snapshot/snapshot.go
Outdated
| return fmt.Errorf("failed to unmount %s: %w", m.Mountpoint, err) | ||
| // In detach mode, rs is taken over by fusemanager, | ||
| // and there may be running containers, so we skip clean | ||
| if !o.detach { |
Member
There was a problem hiding this comment.
Why not just retruning from this function?
Contributor
Author
There was a problem hiding this comment.
done,if o.detach will return directly
docs/overview.md
Outdated
|
|
||
| ### Important Considerations | ||
|
|
||
| Before restarting the `containerd-stargz-grpc` process, if there are running containers, it is crucial to use `SIGKILL` to terminate the `containerd-stargz-grpc` process. This approach prevents the normal shutdown sequence from attempting to clean up the mountpoints of the running containers, which could lead to disruptions in their availability. By using `SIGKILL`, you ensure that the process is forcefully terminated without affecting the ongoing operations of the containers. |
Member
There was a problem hiding this comment.
Could you add a doc about when the user should use SIGTERM?
Contributor
Author
|
all done |
Contributor
Author
Signed-off-by: abushwang <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In detach mode, when
containerd-stargz-grpcexits normally, it sends an Unmount request to the fuse manager. Additionally, during the startup ofcontainerd-stargz-grpc, therestoreRemoteSnapshotfunction cleans up previous mountpoints. If there are still running containers at this time, it can lead to issues when the TTL cache expires, resulting in abnormal behavior of the containers.I considered several solutions:
containerd-stargz-grpcusingSIGKILL, and then skip the cleanup step inrestoreRemoteSnapshot.ResolveResultEntryTTLSecto an infinitely large value to leverage the TTL cache for ensuring the normal operation of containers. However, this would still lead to failures if the containers attempt to access uncached content.After careful consideration, I have decided to proceed with the first approach.