I am currently looking at a problem concerning CRIU and OCI containers. My understanding so far is the following:
I am creating a checkpoint with manage_cgroups not set. This means we should have opts.manage_cgroups = CG_MODE_DEFAULT which is set to #define CG_MODE_DEFAULT (CG_MODE_SOFT).
Creating a checkpoint CRIU still tracks the information about the cgroup of the process in the container.
My understanding is that this should not be necessary, as (crun at least) will move the process after restore in the new cgroup created by crun. I think this is the only right approach. CRIU should, in case of OCI containers, not touch the cgroup setting. If the container is restored it will be restored with a newly created cgroup by the container runtime (crun/runc).
Setting #define CG_MODE_DEFAULT (CG_MODE_IGNORE) I still get a cgroup.img and core-1.img references cgroups via "cg_set": 2,.
The restore fails with:
(00.003375) 1: cg: Move into 2
(00.003391) 1: cg: setting cgns prefix to /machine.slice/libpod-dd47c09e12569883f67d88a5da89cbd2e1c450b2f3803087ee72e3a062a05186.scope/container
(00.003415) 1: Error (criu/cgroup.c:1092): cg: Can't move 1 into unifie//machine.slice/libpod-dd47c09e12569883f67d88a5da89cbd2e1c450b2f3803087ee72e3a062a05186.scope/container/cgroup.procs (-1/-1): Bad file descriptor
(00.003427) 1: Error (criu/cgroup.c:1148): cg: couldn't set cgns prefix unifie//machine.slice/libpod-dd47c09e12569883f67d88a5da89cbd2e1c450b2f3803087ee72e3a062a05186.scope/container/cgroup.procs: Bad file descriptor
(00.003431) 1: Error (criu/cgroup.c:1171): cg: failed preparing cgns
So there is still a bug somewhere in the code because unifie//machine.slice does not look correct.
Using CRIU's manage_cgroup mode will result in CG_MODE_SOFT and the restore works, but the restore does strange things. First of all I see in the logs:
(00.001357) cg: Preparing cgroups yard (cgroups restore mode 0x4)
(00.001593) cg: Opening .criu.cgyard.cifCa8 as cg yard
(00.001613) cg: Making controller dir .criu.cgyard.cifCa8/unifie ()
(00.001707) cg: Determined cgroup dir unifie/machine.slice/libpod-30325b748276c463e9f5e8db0f98662915f7372f7585287dcae81c8cd4d75636.scope/container already exist
(00.001713) cg: Skip restoring properties on cgroup dir unifie/machine.slice/libpod-30325b748276c463e9f5e8db0f98662915f7372f7585287dcae81c8cd4d75636.scope/container
Which again looks wrong from the used paths and it is still referencing old cgroup paths although the container has another ID and the container runtime created another ID.
To reproduce:
podman run -d quay.io/adrianreber/counter
podman container checkpoint --latest --export /tmp/dump.tar -R -k
podman container restore -i /tmp/dump.tar -n new -k
Looking at the restore log of the container new will show the message from above. The log can be found with podman inspect -l --format "{{.State.RestoreLog}}".
So this is actually a bug report that the cgroup handling is not correct from CRIU and also a question if CRIU should just completely ignore the cgroup settings when used in combination with crun/runc, because crun/runc will create a new cgroup for a new container and move the processes into it. Currently it does not seem possible to tell CRIU to completely ignore the cgroup even with CG_MODE_IGNORE.
@mihalicyn @avagin any ideas, suggestions or comments?
I am currently looking at a problem concerning CRIU and OCI containers. My understanding so far is the following:
I am creating a checkpoint with
manage_cgroupsnot set. This means we should haveopts.manage_cgroups = CG_MODE_DEFAULTwhich is set to#define CG_MODE_DEFAULT (CG_MODE_SOFT).Creating a checkpoint CRIU still tracks the information about the cgroup of the process in the container.
My understanding is that this should not be necessary, as (crun at least) will move the process after restore in the new cgroup created by crun. I think this is the only right approach. CRIU should, in case of OCI containers, not touch the cgroup setting. If the container is restored it will be restored with a newly created cgroup by the container runtime (crun/runc).
Setting
#define CG_MODE_DEFAULT (CG_MODE_IGNORE)I still get a cgroup.img and core-1.img references cgroups via"cg_set": 2,.The restore fails with:
So there is still a bug somewhere in the code because
unifie//machine.slicedoes not look correct.Using CRIU's manage_cgroup mode will result in
CG_MODE_SOFTand the restore works, but the restore does strange things. First of all I see in the logs:Which again looks wrong from the used paths and it is still referencing old cgroup paths although the container has another ID and the container runtime created another ID.
To reproduce:
Looking at the restore log of the container
newwill show the message from above. The log can be found withpodman inspect -l --format "{{.State.RestoreLog}}".So this is actually a bug report that the cgroup handling is not correct from CRIU and also a question if CRIU should just completely ignore the cgroup settings when used in combination with crun/runc, because crun/runc will create a new cgroup for a new container and move the processes into it. Currently it does not seem possible to tell CRIU to completely ignore the cgroup even with
CG_MODE_IGNORE.@mihalicyn @avagin any ideas, suggestions or comments?