Skip to content

Re-enable CRIU tests by not using overlayfs snapshotter#4708

Merged
crosbymichael merged 2 commits intocontainerd:masterfrom
kzys:enable-criu
Mar 19, 2021
Merged

Re-enable CRIU tests by not using overlayfs snapshotter#4708
crosbymichael merged 2 commits intocontainerd:masterfrom
kzys:enable-criu

Conversation

@kzys
Copy link
Copy Markdown
Member

@kzys kzys commented Nov 9, 2020

While the issue hasn't been fixed in the kernel yet, we can workaround
the issue by not using overlayfs snapshotter.

Fixes #3930.

Signed-off-by: Kazuyoshi Kato [email protected]

@k8s-ci-robot
Copy link
Copy Markdown

Hi @kzys. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kzys
Copy link
Copy Markdown
Member Author

kzys commented Nov 9, 2020

Hmm, let me check the kernel version on the GitHub Actions' Ubuntu.

@kzys
Copy link
Copy Markdown
Member Author

kzys commented Nov 9, 2020

Ubuntu 14.04 uses Linux fv-az54-676 5.4.0-1031-azure #32~18.04.1-Ubuntu SMP Tue Oct 6 10:03:22 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux.

@estesp
Copy link
Copy Markdown
Member

estesp commented Nov 9, 2020

I don't think this will work based on ongoing commentary in checkpoint-restore/criu#860 (see this comment). The upstream bug is confusing as it has been marked fixed at least once or twice, but is clearly still an issue.

The recommendation in checkpoint-restore/criu#1239 (comment) might be worth considering--should we run just the checkpoint/restore tests to make sure our code is still working by running a quick test with a different graphdriver?

@kzys
Copy link
Copy Markdown
Member Author

kzys commented Nov 10, 2020

Thanks @estesp. How about running the tests with multiple different snapshotters? criu may need devmappper, but I'm bit concerned to move containerd's tests from overlay to let's say devmapper entirely because of criu.

@estesp
Copy link
Copy Markdown
Member

estesp commented Nov 10, 2020

Yes, I think we can build a small action that only runs the criu tests with a native/devicemapper snapshotter and let the rest of the testsuite continue using overlay.

@kzys kzys force-pushed the enable-criu branch 6 times, most recently from 6bace7f to 4213735 Compare November 11, 2020 23:00
@kzys kzys changed the title Re-enable CRIU tests Re-enable CRIU tests by not using overlayfs snapshotter Nov 11, 2020
@kzys kzys force-pushed the enable-criu branch 2 times, most recently from a8a3388 to fd8bc31 Compare November 12, 2020 00:31
@kzys
Copy link
Copy Markdown
Member Author

kzys commented Nov 12, 2020

container_checkpoint_test.go:142: type with url : not found: unknown is coming from

v, err := typeurl.UnmarshalAny(r.Options)
.

Comment thread runtime/v2/runc/container.go Outdated
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure this is the right place to fix this issue. Shouldn't it be handled by typeurl.UnmarshalAny()?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crosbymichael can you provide any insight here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, i think this is fine. It should be handled by the caller because of the nil check. I'll have to think a bit on a better way to fix this in the lib but it shouldn't be common.

@containerd containerd deleted a comment from theopenlab-ci Bot Nov 13, 2020
@containerd containerd deleted a comment from theopenlab-ci Bot Nov 13, 2020
@containerd containerd deleted a comment from theopenlab-ci Bot Nov 13, 2020
@containerd containerd deleted a comment from theopenlab-ci Bot Nov 13, 2020
@containerd containerd deleted a comment from theopenlab-ci Bot Nov 13, 2020
@containerd containerd deleted a comment from theopenlab-ci Bot Nov 13, 2020
@containerd containerd deleted a comment from theopenlab-ci Bot Nov 13, 2020
@containerd containerd deleted a comment from theopenlab-ci Bot Nov 13, 2020
@containerd containerd deleted a comment from theopenlab-ci Bot Nov 13, 2020
@containerd containerd deleted a comment from theopenlab-ci Bot Nov 13, 2020
@containerd containerd deleted a comment from theopenlab-ci Bot Nov 13, 2020
@containerd containerd deleted a comment from theopenlab-ci Bot Nov 13, 2020
@containerd containerd deleted a comment from theopenlab-ci Bot Nov 13, 2020
@containerd containerd deleted a comment from theopenlab-ci Bot Nov 13, 2020
@kzys
Copy link
Copy Markdown
Member Author

kzys commented Nov 16, 2020

@estesp Could you take a look? Seems like CRIU has been broken.

@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Mar 12, 2021

Build succeeded.

@estesp
Copy link
Copy Markdown
Member

estesp commented Mar 12, 2021

Sorry @kzys this needs a rebase due to the PR which moved *_test.go from the root being merged.

Also, thinking about the time cost of having yet another pass through make integration just to pick up the checkpoint tests:
What about renaming the last test in the checkpoint test file here to start with TestCheckpoint and then you could pass EXTRA_TESTFLAGS="-run TestCheckpoint" to your make integration and only have the handful of checkpoint tests run rather than adding 2-3 minutes to all of the runc+Linux integration combinations in CI?

@kzys
Copy link
Copy Markdown
Member Author

kzys commented Mar 12, 2021

@estesp Sounds good to me. Let me update the PR.

@kzys kzys force-pushed the enable-criu branch 2 times, most recently from 706a99c to 36afcb7 Compare March 12, 2021 18:09
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Mar 12, 2021

Build succeeded.

@kzys
Copy link
Copy Markdown
Member Author

kzys commented Mar 12, 2021

Rebased! Not so sure why "CGroupsV2 and SELinux Integration" is failing here.

@mikebrow
Copy link
Copy Markdown
Member

Rebased! Not so sure why "CGroupsV2 and SELinux Integration" is failing here.

429 Too Many Requests. ... we're trying to work on this.. it's the new docker.io pull rate limits

@kzys
Copy link
Copy Markdown
Member Author

kzys commented Mar 16, 2021

@estesp Can you take a look?

Comment thread .github/workflows/ci.yml Outdated
Kazuyoshi Kato added 2 commits March 16, 2021 16:46
- process.Init#io could be nil
- Make sure CreateTaskRequest#Options is not empty before unmarshaling

Signed-off-by: Kazuyoshi Kato <[email protected]>
While the issue hasn't been fixed in the kernel yet, we can workaround
the issue by not using overlayfs snapshotter.

The newly added step runs all tests that match /TestCheckpoint/.
So, TestCRWithImagePath has been renamed to match the regexp.

Fixes containerd#3930.

Signed-off-by: Kazuyoshi Kato <[email protected]>
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Mar 16, 2021

Build succeeded.

Copy link
Copy Markdown
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; would love @crosbymichael to confirm the options typeurl change

@kzys
Copy link
Copy Markdown
Member Author

kzys commented Mar 17, 2021

/ok-to-test

@crosbymichael crosbymichael merged commit e0c94bb into containerd:master Mar 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reenable CRIU tests when kernel issue is resolved

5 participants