Skip to content

Conversation

@everpeace
Copy link
Contributor

@everpeace everpeace commented Feb 19, 2023

What is the issue

I found a bug in setting runtime.Spec.Process.User.additionalGids in CRI server/sbserver.

If LinuxContainerSecurityContext.{RunAsUser,RunAsUsername} are empty, contained should fallback to the container image's Config.User for resolving the primary user and use it to inspect /etc/group in the container image for resolving additionalGids. But containerd always fallback to 0.

Fortunately, in the use of Kubernetes+containerd, the bug does NOT happen because kubelet always fills out LinuxContainerSecurityContext.RunAsUser from the image status (code) when image's Config.User is not empty. So ontainer_create_linux.go#L369-L370 can grab the correct user from the SecurityContext.

However, because the cri-api spec does not assume the behavior, which fills out LinuxContainerSecurityContext.RunAsUser from the image's Config.User, I think the code should be fixed.

CRI-O seems to handle this correctly. See here.

Reproduction

Version

root@primary:~# crictl version
Version:  0.1.0
RuntimeName:  containerd
RuntimeVersion:  1.6.16
RuntimeApiVersion:  v1

Image and Container Spec

The image everpeace/test:containerd-image-config is used.

# the image has alice(uid=1000)
root@primary:~# docker run -it --rm everpeace/test:containerd-image-config cat /etc/passwd | grep alice
alice:x:1000:1000::/home/alice:/bin/sh

# In the image
# - root belongs to an additional-group-for-root
# - alice belongs to an additiona-group-for-alice
root@primary:~# docker run -it --rm everpeace/test:containerd-image-config cat /etc/group | grep -e root -e alice
root:x:0:
Alice:x:1000:
additional-group-for-alice:x:10000:alice
additional-group-for-root:x:20000:root

# image specifies "config.user: 1000" (alice)
root@primary:~# crictl inspecti everpeace/test:containerd-image-config
...
  "info": {
    "chainID": "sha256:140469df148ed27972d663207bf92e11d6d11b046647532811ab0f7a297c0cb0",
    "imageSpec": {
      "config": {
        "User": "1000",
...

root@primary:~# cat pod-config.json
{
    "metadata": {
        "name": "test-sandbox",
        "namespace": "default",
        "attempt": 1,
        "uid": "hdishd83djaidwnduwk28bcsx"
    },
    "log_directory": "/tmp/tmp.AqRbZ4eV2s",
    "linux": {
    }
}

root@primary:~# cat container-config.json
{
    "metadata": {
        "name": "test"
    },
    "image": {
        "image": "everpeace/test:containerd-image-config"
    },
    "command": [
        "id"
    ],
    "log_path": "test.0.log",
    "linux": {}
}

The output

# Run the container
root@primary:~# crictl runp pod-config.json 
3323c5e01e65ae0a4663b6e7391eb1e81a6fccb9a6cda07b639e792798cdb11e
root@primary:~# crictl create 3323c5e01e65ae0a4663b6e7391eb1e81a6fccb9a6cda07b639e792798cdb11e container-config.json pod-config.json 
6fa373a9292d86f3cdc43e63feeb298c622a0f092ba3204ac04a541a9ca56cc4
root@primary:~# crictl start 6fa373a9292d86f3cdc43e63feeb298c622a0f092ba3204ac04a541a9ca56cc4
6fa373a9292d86f3cdc43e63feeb298c622a0f092ba3204ac04a541a9ca56cc4
# 'additional-group-for-alice' expects to be attached because imageConfig.User=1000(alice).
# However, 'additional-group-gor-root' is attached instead.
root@primary:~# crictl logs 6fa373a9292d86f3cdc43e63feeb298c622a0f092ba3204ac04a541a9ca56cc4
uid=1000(alice) gid=1000(alice) groups=1000(alice),20000(additional-group-for-root)

Notes for reviewers

The current containerd's CRI implementation doesn't follow the conversion spec config.User to spec.Process.User. The containerd's CRI impl

  • doesn't support to specify group in config.User in the image, and
  • ALWAYS attach groups from the image(/etc/group) when setting RunAsGroup in SecurityContext

The issue will be handled in kubernetes/enhancements#3619 separately to avoid breaking changes to the users.

@k8s-ci-robot
Copy link

Hi @everpeace. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@everpeace everpeace force-pushed the fix-additiona-gids-to-read-image-user branch from 8aa35ad to 470b6ba Compare February 19, 2023 09:21
@everpeace everpeace marked this pull request as draft February 19, 2023 09:41
@everpeace
Copy link
Contributor Author

everpeace commented Feb 19, 2023

I set it back to draft because lint fails. I will fix it later.

@everpeace everpeace force-pushed the fix-additiona-gids-to-read-image-user branch 5 times, most recently from a74a123 to 375f750 Compare February 19, 2023 13:07
It should fallback to imageConfig.User when no securityContext.RunAsUser/RunAsUsername

Signed-off-by: Shingo Omura <[email protected]>
@everpeace everpeace force-pushed the fix-additiona-gids-to-read-image-user branch from 375f750 to 727b254 Compare February 19, 2023 13:10
@everpeace
Copy link
Contributor Author

everpeace commented Feb 19, 2023

I'm making this PR "Ready for Review" because failed tests' reason may be not related to the PR.
To reviewers: Would you mind retrying failed jobs?? Thanks.

All tests are passed. This PR is ready to review now.

@everpeace everpeace marked this pull request as ready for review February 19, 2023 14:02
@everpeace
Copy link
Contributor Author

@mikebrow Hi, would you mind reviewing this PR?? Thanks.

@mikebrow
Copy link
Member

/ok-to-test

Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.. nicely done!

@mikebrow mikebrow added cherry-pick/1.6.x cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch kind/bug labels Mar 13, 2023
userstr = imageConfig.User
}
if userstr != "" {
specOpts = append(specOpts, oci.WithUser(userstr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering why we're first going through the process above to convert username and group to a single string (user:group if both are present), only to then to parse everything again in oci.WithUser() to split user and group again and see if they're numeric values (should the "happy path" use WithUIDGID() or variants thereof?)

Copy link
Contributor Author

@everpeace everpeace Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we did it like this. However, It looks the commit introduced it. It should be refactored in another PR. cri-o's corresponding code is more readable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree it makes sense to refactor in a follow up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admitted, I haven't dug fully into all of the logic, but it looks like it's because there's a combination of a "strong typed" (securityContext) and fallback to the "weakly typed" (string) from the imageConfig.User.

While trying to read the code, I found the logic to be complex / hard to read, which scared me a bit, as such cases can easily lead to bugs; too many conditions in a row make it hard to spot corner-cases, as well as the conversion from strong-typed to weak typed (followed by "trying to discover" if a uid or username was passed, or other combinations).

So, yes, perhaps out of scope for this PR (TBD?) but it would be good to clean this code up to make it more maintainable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nod.. I had to read it a few times take notes check test cases

@everpeace everpeace force-pushed the fix-additiona-gids-to-read-image-user branch from 570ac18 to 50740a1 Compare March 14, 2023 04:52
Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additional commit LGTM

@everpeace
Copy link
Contributor Author

@mikebrow What are the remaining tasks for merging this PR? I believe all the comments are addressed and it is ready to merge now.

@everpeace
Copy link
Contributor Author

@mikebrow would you mind merging this PR if you don't see any concerns??

Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The comments could be addressed in the followup.

additional-group-for-alice:x:11111:alice
additional-group-for-root:x:22222:root
`
tempRootDir, err := os.MkdirTemp("", "TestContainerUser-")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use t.TempDir() here and testing framework will cleanup it.

expected: runtimespec.User{UID: 0, GID: 0, AdditionalGids: []uint32{0, 22222}},
},
} {
t.Run(desc, func(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add desc := desc before t.Run to capture range variable just in case that it run in parallel mode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always get confused over when a local copy is needed inside a for range block, some examples show it's not needed if param of the func but here it's a t.Run vs a go Run so.. confusion :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! It should be test := test... my mistake...

t.Run can go in parallel mode so it is like go func(). We need to capture it before spawning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, it makes sense. I will open a fix soon 🙇

@fuweid fuweid merged commit 5885db6 into containerd:main Apr 9, 2023
@everpeace everpeace mentioned this pull request Apr 10, 2023
dmcgowan added a commit that referenced this pull request Apr 10, 2023
@thaJeztah thaJeztah added cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch and removed cherry-pick/1.6.x labels Jul 14, 2023
@thaJeztah thaJeztah added cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch and removed cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch labels Jul 14, 2023
@everpeace everpeace deleted the fix-additiona-gids-to-read-image-user branch May 30, 2024 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch kind/bug ok-to-test

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants