Background
Reported in #7247
The CRI plugin always creates a checkpoint for each container and sandbox status. Checkpoints are committed using atomic writes, ensuring the JSON file remains intact. However, during a containerd restart, if the filesystem runs out of space, the CRI plugin skips existing containers. Even if the task or shim plugin manages these running containers, the CRI plugin renders them invisible to the kubelet.
Once the disk is healthy and space becomes available, the kubelet will send a request to containerd, prompting it to create a new sandbox or container with the existing attempt or restartCount. Although the patch in PR k8s#99748 ensures that restartCount is incremented, running two duplicate containers within a single pod is not expected.
Fix plan
Short-Term
Introduced new helper , like WithStatusWhenRestart, to skip no-space error.
This helper should be used in restart.go only.
+func WithStatusWhenRestart(status Status, root string) Opts {
+ return func(c *Container) error {
+ s, err := StoreStatusWhenRestart(root, status)
+ if err != nil {
+ return err
+ }
+ c.Status = s
+ if s.Get().State() == runtime.ContainerState_CONTAINER_EXITED {
+ c.Stop()
+ }
+ return nil
+ }
+}
+
+func StoreStatusWhenRestart(root string, status Status) (StatusStorage, error) {
+ data, err := status.encode()
+ if err != nil {
+ return nil, fmt.Errorf("failed to encode status: %w", err)
+ }
+ path := filepath.Join(root, "status")
+ if err := continuity.AtomicWriteFile(path, data, 0600); err != nil {
+ // handle it for differnt platform
+ if !errors.Is(err, syscall.ENOSPC) {
+ return nil, fmt.Errorf("failed to checkpoint status to %q: %w", path, err)
+ }
+ // log error
+ }
+ return &statusStorage{
+ path: path,
+ status: status,
+ }, nil
+}
+
Long-Term
The checkpoint file is used to track if that container has been started, stopped.
For instance, container A has been exited and CRI plugin will delete task via shim API and store exit code in file.
If there is no checkpoint file and no extra information to track exit code, after restarting containerd, containerd A could be regarded as created status instead of exited. The kubelet will try to restart it instead of deleting it.
We don't need extra checkpoint actually, because we have metadata plugin.
We can consider moving status as annotation in container plugin.
So, all the required information can be stored in container. It's easy to construct status as we want.
Test Plan - E2E Test
- Setup and run containerd process on 100 MiB filesystem (provided by loopback device)
- Create three containers in one pod
- A: Created state
- B: Running state
- C: Exited State
- Stop containerd
- Inject failpoint to run out of space
- Restart containerd
- Check A/B/C container status
We should commit E2E test case first and then submit the fix patch.
NOTE
cc @mikebrow @dmcgowan
ping @yylt
Background
Reported in #7247
The CRI plugin always creates a checkpoint for each container and sandbox status. Checkpoints are committed using atomic writes, ensuring the JSON file remains intact. However, during a containerd restart, if the filesystem runs out of space, the CRI plugin skips existing containers. Even if the task or shim plugin manages these running containers, the CRI plugin renders them invisible to the kubelet.
Once the disk is healthy and space becomes available, the kubelet will send a request to containerd, prompting it to create a new sandbox or container with the existing attempt or restartCount. Although the patch in PR k8s#99748 ensures that restartCount is incremented, running two duplicate containers within a single pod is not expected.
Fix plan
Short-Term
Introduced new helper , like
WithStatusWhenRestart, to skip no-space error.This helper should be used in
restart.goonly.Long-Term
The checkpoint file is used to track if that container has been started, stopped.
For instance, container A has been exited and CRI plugin will delete task via shim API and store exit code in file.
If there is no checkpoint file and no extra information to track exit code, after restarting containerd, containerd A could be regarded as created status instead of exited. The kubelet will try to restart it instead of deleting it.
We don't need extra checkpoint actually, because we have metadata plugin.
We can consider moving status as annotation in container plugin.
So, all the required information can be stored in container. It's easy to construct status as we want.
Test Plan - E2E Test
We should commit E2E test case first and then submit the fix patch.
NOTE
cc @mikebrow @dmcgowan
ping @yylt