-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support container restore through CRI/Kubernetes #10365
Support container restore through CRI/Kubernetes #10365
Conversation
Hi @adrianreber. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
6d9aaed
to
f602c12
Compare
Hello, I tested in K8s with the following: Setup EnvK8s: 1.30.2 (RKE2) ResultsCheckpoint
creates a checkpoint, however,
Restore
|
f602c12
to
25da2f2
Compare
@xhejtman Thanks for testing. I rebased the PR to apply cleanly on the latest git checkout of containerd. The CRIU log file error you mentioned should be gone now. I do not think you need to explicitly set the feature gate for container checkpointing since it defaults to on since going Beta in 1.30. |
Yes, tested again, the CRIU log issue is gone. I just noticed one more thing: if checkpointed container did not specify command/entrypoint in the docker file, Pod manifest for restore needs to specify at least dummy command, or error will be raised: no command specified in restore process. |
25da2f2
to
29ec7b2
Compare
@xhejtman I added my Kubernetes test script in |
6fef265
to
1d3793e
Compare
CI looks finally happy. One more feature is missing before this is ready. The rootfs changes are not yet applied to the restored container, so a container which changes files in the container will probably fail restoring. This should be an easy change as it is not much more than applying the existing tar file to the container rootfs. |
1d3793e
to
23e2f4b
Compare
That was all; thanks! |
Hello @adrianreber, what's the status of this PR? Feedback seems to be addressed |
I don't know what the status is. I am happy to apply any code review suggestions. |
fc3a663
to
e7af745
Compare
Rebased |
|
Thank you. |
bc7f710
to
171f413
Compare
171f413
to
f5eee87
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but a few nits.
Sorry for taking too long to review this
f5eee87
to
16e8428
Compare
Thanks. I tried to address are your review comments. |
16e8428
to
8ac7dd4
Compare
This implements container restore as described in: https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/#restore-checkpointed-container-standalone For detailed step by step instruction also see contrib/checkpoint/checkpoint-restore-cri-test.sh The code changes are based on changes I have done in Podman around 2018 and CRI-O around 2020. The history behind restoring container via CRI/Kubernetes probably requires some explanation. The initial proposal to bring checkpoint/restore to Kubernetes was looking at pod checkpoint and restoring and the corresponding CRI changes. kubernetes-sigs/cri-tools#662 kubernetes/kubernetes#97194 After discussing this topic for about two years another approach was implemented as described in KEP-2008: kubernetes/enhancements#2008 "Forensic Container Checkpointing" allowed us to separate checkpointing from restoring. For the "Forensic Container Checkpointing" it is enough to create a checkpoint of the container. Restoring is not necessary as the analysis of the checkpoint archive can happen without restoring the container. While thinking about a way to restore a container it was by coincidence that we started to look into restoring containers in Kubernetes via Create and Start. The way it was done in CRI-O is to figure out during Create if the container image is a checkpoint image and if that is true we are using another code path. The same was implemented now with this change in containerd. With this change it is possible to restore the container from a checkpoint tar archive that is created during checkpointing via CRI. To restore a container via Kubernetes we convert the tar archive to an OCI image as described in the kubernetes.io blog post from above. Using this OCI image it is possible to restore a container in Kubernetes. At this point I think it should be doable to restore containers in CRI-O and containerd no matter if they have been created by containerd or CRI-O. The biggest difference is the container metadata and that can be adapted during restore. Open items: * It is not clear to me why restoring a container in containerd goes through task/Create(). But as the restore code already exists this change extended the existing code path to restore a container in task/Create() to also restore a container through the CRI via Create and Start. * Automatic image pulling. containerd does not pull images automatically if created via the CRI. There is an option in crictl to pull images before starting, but that uses the CRI image pull interface. It is still a separate pull and create operation. Restoring containers from an OCI image is a bit different. The checkpoint OCI image does not include the base image, but just a reference to the image (NAME@DIGEST). Using crictl with pulling will enable the pulling of the checkpoint image, but not of the base image the checkpoint is based on. So during preparation of the checkpoint containerd will automatically pull the base image, but I was not able how to pull an image blockingly in containerd. So there is a for loop waiting for the container image to appear in the internal store. I think this probably can be implemented better. Anyway, this is a first step towards container restored in Kubernetes when using containerd. Signed-off-by: Adrian Reber <[email protected]>
8ac7dd4
to
9e6beaf
Compare
This implements container restore as described in:
https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/#restore-checkpointed-container-standalone
For detailed step by step instruction also see contrib/checkpoint/checkpoint-restore-cri-test.sh
The code changes are based on changes I have done in Podman around 2018 and CRI-O around 2020.
The history behind restoring container via CRI/Kubernetes probably requires some explanation. The initial proposal to bring checkpoint/restore to Kubernetes was looking at pod checkpoint and restoring and the corresponding CRI changes.
kubernetes-sigs/cri-tools#662 kubernetes/kubernetes#97194
After discussing this topic for about two years another approach was implemented as described in KEP-2008:
kubernetes/enhancements#2008
"Forensic Container Checkpointing" allowed us to separate checkpointing from restoring. For the "Forensic Container Checkpointing" it is enough to create a checkpoint of the container. Restoring is not necessary as the analysis of the checkpoint archive can happen without restoring the container.
While thinking about a way to restore a container it was by coincidence that we started to look into restoring containers in Kubernetes via Create and Start. The way it was done in CRI-O is to figure out during Create if the container image is a checkpoint image and if that is true we are using another code path. The same was implemented now with this change in containerd.
With this change it is possible to restore the container from a checkpoint tar archive that is created during checkpointing via CRI.
To restore a container via Kubernetes we convert the tar archive to an OCI image as described in the kubernetes.io blog post from above. Using this OCI image it is possible to restore a container in Kubernetes.
At this point I think it should be doable to restore containers in CRI-O and containerd no matter if they have been created by containerd or CRI-O. The biggest difference is the container metadata and that can be adapted during restore.
Open items:
Anyway, this is a first step towards container restored in Kubernetes when using containerd.