Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update runc binary to 1.2.1 #10877

Merged
merged 1 commit into from
Nov 3, 2024
Merged

Conversation

@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Oct 22, 2024

TestPodUserNS is failing on Fedora 40 and Rocky 9, due to:

  • setxattr /run/containerd-test/io.containerd.runtime.v2.task/k8s.io/[...]/dev/mqueue: operation not permitted
    default: === RUN   TestPodUserNS
    default: === RUN   TestPodUserNS/userns_uid_mapping
    default:     pod_userns_linux_test.go:246: Create a sandbox with userns
    default: E1022 10:38:44.240499   45870 remote_runtime.go:132] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox "ed8348b9215a10dba3ef48191f37dfa41c7a4648bbdf7fba9365fdf8a4c1ed4e": failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "mqueue" to rootfs at "/dev/mqueue": setxattr /run/containerd-test/io.containerd.runtime.v2.task/k8s.io/ed8348b9215a10dba3ef48191f37dfa41c7a4648bbdf7fba9365fdf8a4c1ed4e/rootfs/dev/mqueue: operation not permitted
    default:     pod_userns_linux_test.go:251: Unexpected RunPodSandbox error: rpc error: code = Unknown desc = failed to start sandbox "ed8348b9215a10dba3ef48191f37dfa41c7a4648bbdf7fba9365fdf8a4c1ed4e": failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "mqueue" to rootfs at "/dev/mqueue": setxattr /run/containerd-test/io.containerd.runtime.v2.task/k8s.io/ed8348b9215a10dba3ef48191f37dfa41c7a4648bbdf7fba9365fdf8a4c1ed4e/rootfs/dev/mqueue: operation not permitted
    default: === RUN   TestPodUserNS/userns_gid_mapping
    default:     pod_userns_linux_test.go:246: Create a sandbox with userns
    default:     pod_userns_linux_test.go:251: Unexpected RunPodSandbox error: rpc error: code = Unknown desc = failed to start sandbox "d89053afdbbc20f3b11b2eae107e3d70213b21f473707dd5baa762d1a317aa3c": failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "mqueue" to rootfs at "/dev/mqueue": setxattr /run/containerd-test/io.containerd.runtime.v2.task/k8s.io/d89053afdbbc20f3b11b2eae107e3d70213b21f473707dd5baa762d1a317aa3c/rootfs/dev/mqueue: operation not permitted
    default: === RUN   TestPodUserNS/rootfs_permissions
    default:     pod_userns_linux_test.go:246: Create a sandbox with userns
    default: E1022 10:38:44.623562   45870 remote_runtime.go:132] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox "d89053afdbbc20f3b11b2eae107e3d70213b21f473707dd5baa762d1a317aa3c": failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "mqueue" to rootfs at "/dev/mqueue": setxattr /run/containerd-test/io.containerd.runtime.v2.task/k8s.io/d89053afdbbc20f3b11b2eae107e3d70213b21f473707dd5baa762d1a317aa3c/rootfs/dev/mqueue: operation not permitted
    default:     pod_userns_linux_test.go:251: Unexpected RunPodSandbox error: rpc error: code = Unknown desc = failed to start sandbox "83bda990b49619f5e98b41dd6fa5c6178264677bd3a2735debb66fb114ce0859": failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "mqueue" to rootfs at "/dev/mqueue": setxattr /run/containerd-test/io.containerd.runtime.v2.task/k8s.io/83bda990b49619f5e98b41dd6fa5c6178264677bd3a2735debb66fb114ce0859/rootfs/dev/mqueue: operation not permitted
    default: === RUN   TestPodUserNS/volumes_permissions
    default: E1022 10:38:44.971328   45870 remote_runtime.go:132] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox "83bda990b49619f5e98b41dd6fa5c6178264677bd3a2735debb66fb114ce0859": failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "mqueue" to rootfs at "/dev/mqueue": setxattr /run/containerd-test/io.containerd.runtime.v2.task/k8s.io/83bda990b49619f5e98b41dd6fa5c6178264677bd3a2735debb66fb114ce0859/rootfs/dev/mqueue: operation not permitted
    default:     pod_userns_linux_test.go:246: Create a sandbox with userns
    default:     pod_userns_linux_test.go:251: Unexpected RunPodSandbox error: rpc error: code = Unknown desc = failed to start sandbox "e579723f7f6ece7cc7e6c5294fa73308777f15baacd3f0f317225c0911b9c01b": failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "mqueue" to rootfs at "/dev/mqueue": setxattr /run/containerd-test/io.containerd.runtime.v2.task/k8s.io/e579723f7f6ece7cc7e6c5294fa73308777f15baacd3f0f317225c0911b9c01b/rootfs/dev/mqueue: operation not permitted
    default: === RUN   TestPodUserNS/fails_with_several_mappings
    default:     pod_userns_linux_test.go:246: Create a sandbox with userns
    default: E1022 10:38:45.379638   45870 remote_runtime.go:132] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox "e579723f7f6ece7cc7e6c5294fa73308777f15baacd3f0f317225c0911b9c01b": failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "mqueue" to rootfs at "/dev/mqueue": setxattr /run/containerd-test/io.containerd.runtime.v2.task/k8s.io/e579723f7f6ece7cc7e6c5294fa73308777f15baacd3f0f317225c0911b9c01b/rootfs/dev/mqueue: operation not permitted
    default: E1022 10:38:45.401499   45870 remote_runtime.go:132] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to create network namespace for sandbox "461d0d64ea29a2c2b36262ad005d0ebaed8f1ea1d969f6944b575165caebc8a2": required only one uid mapping, but got 2 uid mapping(s)
    default: --- FAIL: TestPodUserNS (1.51s)
    default:     --- FAIL: TestPodUserNS/userns_uid_mapping (0.35s)
    default:     --- FAIL: TestPodUserNS/userns_gid_mapping (0.38s)
    default:     --- FAIL: TestPodUserNS/rootfs_permissions (0.35s)
    default:     --- FAIL: TestPodUserNS/volumes_permissions (0.41s)
    default:     --- PASS: TestPodUserNS/fails_with_several_mappings (0.02s)

https://github.com/containerd/containerd/actions/runs/11457221604/job/31880030218?pr=10877

@rata
Copy link
Contributor

rata commented Oct 22, 2024

@AkihiroSuda It might be a regression from this PR: #10607. See this issue, which is already fixed, that was a regression too: #10847.

Maybe you can try to revert #10741 and #10607 in this PR, just to see if that fixes the issue on CI?

@AkihiroSuda
Copy link
Member Author

/retest

@rata
Copy link
Contributor

rata commented Oct 22, 2024

@AkihiroSuda so it seems that is the culprit! :)

There might be another field in the struct that #10607 forgot to set (I mean, like #10741 that added one more field to the struct used to create the userns). Can you try to debug this further? I will join forces as soon as I get some time (but maybe not this week :-/).

The other option might be a revert. I don't know how close the final containerd 2.0.0 release is.

@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Oct 22, 2024

Given that the CI is passing for crun (EDIT: Wrong. It wasn't tested with crun + SELinux), it could be also argued that this is an issue on runc's side?

@AkihiroSuda AkihiroSuda requested a review from fuweid October 22, 2024 17:58
@AkihiroSuda AkihiroSuda changed the title update runc binary to 1.2.0 update runc binary to 1.2.0; Revert "internal/cri: simplify netns setup with pinned userns" Oct 22, 2024
@AkihiroSuda AkihiroSuda changed the title update runc binary to 1.2.0; Revert "internal/cri: simplify netns setup with pinned userns" update runc binary to 1.2.0; Revert internal/cri: simplify netns setup with pinned userns Oct 22, 2024
@fuweid
Copy link
Member

fuweid commented Oct 22, 2024

It looks like there is bug in runc side...

@AkihiroSuda
Copy link
Member Author

@rata
Copy link
Contributor

rata commented Oct 23, 2024

runc creates the userns and then the other namespaces to avoid a bug with selinux and mqueue in the kernel, it seems to be what we hit here. See here: https://github.com/opencontainers/runc/blob/e37371ebd791114e4f23444ab1fbcae87ac6b928/libcontainer/nsenter/nsexec.c#L866-L871:

		 * A specific case of this is that the SELinux label of the
		 * internal kern-mount that mqueue uses will be incorrect if the
		 * UTS namespace is cloned before the USER namespace is mapped.
		 * I've also heard of similar problems with the network namespace
		 * in some scenarios. This also mirrors how LXC deals with this
		 * problem.

It seems creating the namespaces in two steps, as runc is doing it, might be a way forward.

@AkihiroSuda are you sure that crun is tested on vagrant here in containerd? Because only vagrant is failing in this PR, as far as I can see

@fuweid any thoughts? :)

@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Oct 23, 2024

are you sure that crun is tested on vagrant here in containerd? B

Sorry, I was wrong 😅

@rata
Copy link
Contributor

rata commented Oct 23, 2024

assumming the issue is what I mentioned in the last comment, it doesn't seem simple to fix. It is quite limited what we can do in go. Currently we create a process with the userns and the netns here: https://github.com/containerd/containerd/blob/main/pkg/sys/unshare_linux.go#L48-L63

We can't easily run something inside that process, because we are using /proc/self/exe as the command to exec (but with ptrace so we don't really exec it). So doing first the unshare for userns and then for the netns doesn't seem like a 1 line fix, at least.

What I can think is to do something like this instead: https://github.com/containers/storage/blob/main/pkg/idmap/idmapped_utils.go#L53. But the child does the creation of the other namespaces. There might be a perf hit bigger than before, though, as we are indeed cloning the process, starting the go-runtime, etc.

I'm unsure if revert is simpler for 2.0 or if we should try that.

EDIT: Ooor, what happens if instead of using cloneFlags and unshareFlags, we pass just one of those? (like the userns and the netns both in cloneFlags or both in unshareFlags here: https://github.com/containerd/containerd/pull/10607/files#diff-106945d93d68e955471ccab149a1302ebb7214c1832b7df0bbd8855992ddf397R52-R53). Maybe the kernel does the right thing in that case? It seems we need to map the userns before creating the other ns. We need to check the code to see what is happening here, it might be fun to write a patch to improve it (so the other namespaces are created after the userns is mapped).

@fuweid any thoughts or other ideas? It would be great to first validate my theory of what is the bug, though.

@fuweid
Copy link
Member

fuweid commented Oct 23, 2024

runc creates the userns and then the other namespaces to avoid a bug with selinux and mqueue in the kernel, it seems to be what we hit here. See here: https://github.com/opencontainers/runc/blob/e37371ebd791114e4f23444ab1fbcae87ac6b928/libcontainer/nsenter/nsexec.c#L866-L871:

		 * A specific case of this is that the SELinux label of the
		 * internal kern-mount that mqueue uses will be incorrect if the
		 * UTS namespace is cloned before the USER namespace is mapped.
		 * I've also heard of similar problems with the network namespace
		 * in some scenarios. This also mirrors how LXC deals with this
		 * problem.

It seems creating the namespaces in two steps, as runc is doing it, might be a way forward.

@AkihiroSuda are you sure that crun is tested on vagrant here in containerd? Because only vagrant is failing in this PR, as far as I can see

@fuweid any thoughts? :)

Thanks for the information. I will take a look in this week. We can rolllback first and create a tracking issue for this.
I want to keep ptrace function because of https://github.com/golang/go/blob/69234ded30614a471c35cef5d87b0e0d3c136cd9/src/runtime/proc.go#L4837

@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Oct 23, 2024

I actually ran the test with crun + SELinux, and it seems green.

@AkihiroSuda AkihiroSuda changed the title update runc binary to 1.2.0; Revert internal/cri: simplify netns setup with pinned userns update runc binary to 1.2.0; apply runc PR 4473 Oct 24, 2024
samuelkarp

This comment was marked as duplicate.

Copy link
Member

@samuelkarp samuelkarp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on green

@dmcgowan dmcgowan added this pull request to the merge queue Nov 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 2, 2024
@samuelkarp samuelkarp added this pull request to the merge queue Nov 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 2, 2024
@samuelkarp samuelkarp added this pull request to the merge queue Nov 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 2, 2024
@AkihiroSuda AkihiroSuda added this pull request to the merge queue Nov 3, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 3, 2024
@dmcgowan
Copy link
Member

dmcgowan commented Nov 3, 2024

The TestContainerPids race is happening pretty frequently, are the changes in runc making the issue show up more?

@AkihiroSuda AkihiroSuda added this pull request to the merge queue Nov 3, 2024
Merged via the queue into containerd:main with commit c017828 Nov 3, 2024
58 checks passed
@samuelkarp samuelkarp added cherry-pick/1.6.x Change to be cherry picked to release/1.6 branch cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch labels Nov 4, 2024
@samuelkarp
Copy link
Member

/cherrypick release/1.7

@samuelkarp
Copy link
Member

/cherrypick release/1.6

@k8s-infra-cherrypick-robot

@samuelkarp: new pull request created: #10940

In response to this:

/cherrypick release/1.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot

@samuelkarp: new pull request created: #10941

In response to this:

/cherrypick release/1.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@austinvazquez austinvazquez added cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch and removed cherry-pick/1.6.x Change to be cherry picked to release/1.6 branch cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch labels Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cri Container Runtime Interface (CRI) area/runtime Runtime cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch ok-to-test size/XS
Projects
None yet
Development

Successfully merging this pull request may close these issues.