Skip to content

image volumes have incorrect selinux labels #5090

@dweomer

Description

@dweomer

Description

Full copy of image volume content, including extended attributes, on SELinux systems prevents containers from writing to the implicit volume(s) setup by CRI.

We've run into an issue preventing us from running Rancher 2.x on RKE2 v1.19 on SELinux=Enforcing systems (el 7/8). See rancher/rke2#690. I've traced it back to content in image volumes getting copied over (from layer storage) with extended attributes hence container_share_t (or container_ro_file_t on newer systems). This issue doesn't present when using RKE2 v1.18 because it uses containerd v1.3.x which isn't built with containerd/continuity@b05c0fd3fcbe whereas RKE2 v1.19 ships with containerd v1.4.x.

Steps to reproduce the issue:

  1. git clone https://github.com/dweomer/vagrantfiles
  2. cd vagrantfiles/centos-7-rke2-rancher
  3. INSTALL_RKE2_CHANNEL=v1.19 vagrant up
  4. Wait a few minutes, keeping an eye on the console.
  5. vagrant ssh node-1 -- kubectl -n cattle-system get pod
  6. Notice the node-1 provisioning fails.
  7. vagrant destroy --force
  8. SELINUX=Permissive INSTALL_RKE2_CHANNEL=v1.19 vagrant up
  9. vagrant ssh node-1
  10. sudo ausearch -ts recent -m avc
time->Thu Feb 25 22:35:03 2021
type=PROCTITLE msg=audit(1614292503.789:1217): proctitle=72616E63686572002D2D687474702D6C697374656E2D706F72743D3830002D2D68747470732D6C697374656E2D706F72743D343433002D2D61756469742D6C6F672D706174683D2F7661722F6C6F672F61756469746C6F672F72616E636865722D6170692D61756469742E6C6F67002D2D61756469742D6C6576656C3D30002D
type=SYSCALL msg=audit(1614292503.789:1217): arch=c000003e syscall=258 success=yes exit=0 a0=ffffffffffffff9c a1=c0012a8e20 a2=1c0 a3=c0012a8e20 items=0 ppid=6765 pid=6786 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="rancher" exe="/usr/bin/rancher" subj=system_u:system_r:container_t:s0:c350,c498 key=(null)
type=AVC msg=audit(1614292503.789:1217): avc:  denied  { create } for  pid=6786 comm="rancher" name="management-state" scontext=system_u:system_r:container_t:s0:c350,c498 tcontext=system_u:object_r:container_share_t:s0 tclass=dir permissive=1
type=AVC msg=audit(1614292503.789:1217): avc:  denied  { add_name } for  pid=6786 comm="rancher" name="management-state" scontext=system_u:system_r:container_t:s0:c350,c498 tcontext=system_u:object_r:container_share_t:s0 tclass=dir permissive=1
type=AVC msg=audit(1614292503.789:1217): avc:  denied  { write } for  pid=6786 comm="rancher" name="2a5daf38b382020f331bddaa7000ef6c096f5d90ffe54a6aadbb63534a03d648" dev="vda1" ino=3567177 scontext=system_u:system_r:container_t:s0:c350,c498 tcontext=system_u:object_r:container_share_t:s0 tclass=dir permissive=1
----
time->Thu Feb 25 22:35:03 2021
type=PROCTITLE msg=audit(1614292503.790:1218): proctitle=72616E63686572002D2D687474702D6C697374656E2D706F72743D3830002D2D68747470732D6C697374656E2D706F72743D343433002D2D61756469742D6C6F672D706174683D2F7661722F6C6F672F61756469746C6F672F72616E636865722D6170692D61756469742E6C6F67002D2D61756469742D6C6576656C3D30002D
type=SYSCALL msg=audit(1614292503.790:1218): arch=c000003e syscall=266 success=yes exit=0 a0=c000bd28d8 a1=ffffffffffffff9c a2=c0012a8e40 a3=c0012a8e40 items=0 ppid=6765 pid=6786 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="rancher" exe="/usr/bin/rancher" subj=system_u:system_r:container_t:s0:c350,c498 key=(null)
type=AVC msg=audit(1614292503.790:1218): avc:  denied  { create } for  pid=6786 comm="rancher" name="etcd" scontext=system_u:system_r:container_t:s0:c350,c498 tcontext=system_u:object_r:container_share_t:s0 tclass=lnk_file permissive=1

Describe the results you received:

The rancher pods failed to start because SELinux prevented writing to various extant paths under /var/lib/rancher (in-container). See the audit log entries above.

Describe the results you expected:

The rancher pods should start up successfully.

Output of containerd --version:

# this is our fork yes but I have replicated with stock containerd v1.4.3
containerd github.com/rancher/containerd v1.4.3-k3s2 441c8be2f38dbb1b0676818391aa93d3ad46ecb6

Any other relevant information:
I have corroborated this behavior with @samuelkarp on the AWS Bottlerocket team which they are currently carrying a patch to address: https://github.com/bottlerocket-os/bottlerocket/blob/develop/packages/containerd/1003-cri-relabel-volumes-after-copying-source-files.patch

After some discussion in the CNCF #containerd Slack channel we have decided to instead attempt to filter out the security.selinux xattr during the directory copy from layer storage to volume location. I am working on a PR to submit to continuity that ought to make this possible. I will follow up with a patch to containerd leveraging the change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/criContainer Runtime Interface (CRI)kind/bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions