Skip to content

Enhance CRI workflow to support pod level user namespace #6908

@jiangliu

Description

@jiangliu

What is the problem you're trying to solve

There are ongoing works to enable pod level user namespace:
kubernetes/enhancements#127
kubernetes/enhancements#3275

User namespace (uid/gid remap) has already been supported by ctr, containerd core and runC, but there are still some issues in the CRI subsystem. When doing PoC for pod level user namespace, the first issue we encountered is:

RunPodSandbox for &PodSandboxMetadata{Name:nginx-sandbox,Uid:018b4704-222a-4657-990e-bb568b870f4b,Namespace:default,Attempt:1,} failed, error  error="failed to create containerd task: failed to create shim task: OCI runtime create failed: container_linux.go:346: starting container process caused \"process_linux.go:449: container init caused \\\"rootfs_linux.go:58: mounting \\\\\\\"sysfs\\\\\\\" to rootfs \\\\\\\"/home/wanglei01/opt/open/go_project/containerd_upstream/bin/run/containerd/io.containerd.runtime.v2.task/k8s.io/7388141ccd209d9df243a1b7df52c9510436e5d89b072904bd6778400d82301c/rootfs\\\\\\\" at \\\\\\\"/sys\\\\\\\" caused \\\\\\\"operation not permitted\\\\\\\"\\\"\": unknown"

After some investigation, we have found that the failure is caused by the flow to create namespaces.
When mounting sysfs, linux kernel checks that current user has CAP_SYS_ADMIN cap in the user namespace associated with the net namespace. And the current flow to create namespaces for sandbox/infra container is:

  1. call netns to create net ns for the pod if the net namespace mode is not NODE.
  2. call cni to initialize the pod network
  3. configure other namespace for the infrastructure/app containers
  4. call container runtime to create other namespaces for the container, including the user ns.

With above flow, the net namespace will be associated with the init user ns because it's created before the pod user ns. And it fails to mount the sysfs in the pod user ns.

Describe the solution you'd like

We could tune the flow to create namespaces and initialize the pod network for sandbox as below:

  1. configure all needed namespaces for the sandbox/infra container, including the user namespace
  2. call runtime to start the infra container and creates all needed namespaces
  3. call CNI to initialize pod network with the net ns created by the runtime for the infra container

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions