Skip to content

seccomp filter breaks latest glibc (in fedora rawhide) by blocking clone3 with EPERM #42680

@berrange

Description

@berrange

Description
I have a docker built with seccomp running on Fedora 34 host. Attempting to run commands inside a container with the registry.fedoraproject.org/fedora:rawhide image results in programs failing to fork processes.

eg

$ docker run -it registry.fedoraproject.org/fedora:rawhide  curl google.com
curl: (6) getaddrinfo() thread failed to start

Tracing the container "curl" process I can see

clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7f000ec6d910, parent_tid=0x7f000ec6d910, exit_signal=0, stack=0x7f000e46d000, stack_size=0x7ffe00, tls=0x7f000ec6d640}, 88) = -1 EPERM (Operation not permitted)

The latest glibc now attempts to use 'clone3' by default. For backwards compatibility it will look for ENOSYS errno and fallback to "clone". The EPERM errno meanwhile is treated as a fatal error.

The default seccomp filter installed by docker is causing EPERM and so this breaks the glibc fallback.

Explicitly passing the default seccomp profile config makes it work, despite not allowing clone3

$ wget https://raw.githubusercontent.com/docker/labs/master/security/seccomp/seccomp-profiles/default.json -O profile.json
$ docker run --security-opt seccomp=profile2.json -it registry.fedoraproject.org/fedora:rawhide  curl google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
..snip...

Tracing again shows clone3 now returns ENOSYS

clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7f098bf8a910, parent_tid=0x7f098bf8a910, exit_signal=0, stack=0x7f098b78a000, stack_size=0x7ffe00, tls=0x7f098bf8a640}, 88) = -1 ENOSYS (Function not implemented)

I expect this difference in behaviour is as a result of the heuristics implemented for choosing EPERM vs ENOSYS in runc with opencontainers/runc@7a8d716

Also it is impossible to run docker build

$ cat test.dkr 
FROM registry.fedoraproject.org/fedora:rawhide

RUN curl google.com

$ docker build -f test.dkr  .
Sending build context to Docker daemon  2.048kB
Step 1/2 : FROM registry.fedoraproject.org/fedora:rawhide
 ---> 887689ee223e
Step 2/2 : RUN curl google.com
 ---> Running in a370ae01f27e
curl: (6) getaddrinfo() thread failed to start
The command '/bin/sh -c curl google.com' returned a non-zero code: 6

and seccomp can't be overriden to make it work

$ docker build --security-opt seccomp=~/profile2.json -f test.dkr  .
Sending build context to Docker daemon  2.048kB
Error response from daemon: The daemon on this platform does not support setting security options on build

Steps to reproduce the issue:

  1. Install docker 20.10.7, with seccomp enabled in biuld
  2. docker run -it registry.fedoraproject.org/fedora:rawhide curl google.com

Describe the results you received:
curl: (6) getaddrinfo() thread failed to start

Describe the results you expected:
Dump of google.com

Output of docker version:

Client:
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.16.6
 Git commit:        f0df350
 Built:             Mon Jul 26 16:34:29 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.6
  Git commit:       b0f5bc3
  Built:            Thu Jul 22 00:00:00 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.3
  GitCommit:        
 runc:
  Version:          1.0.1
  GitCommit:        4fc6f22
 docker-init:
  Version:          0.19.0
  GitCommit:        

Output of docker info:

Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 78
  Running: 1
  Paused: 0
  Stopped: 77
 Images: 3
 Server Version: 20.10.7
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: journald
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: /usr/libexec/docker/docker-init
 containerd version: 
 runc version: 4fc6f22
 init version: 
 Security Options:
  seccomp
   Profile: default
  selinux
  cgroupns
 Kernel Version: 5.14.0-0.rc2.20210721git8cae8cd89f05.24.fc35.x86_64
 Operating System: Fedora Linux 35 (Server Edition Prerelease)
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 7.438GiB
 Name: fedora
 ID: GQBM:HCKW:MKVM:Y5RK:HXPA:ZCCY:EXPA:FQBS:S4ZN:HRL5:5PSZ:KK7B
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: true

Additional environment details (AWS, VirtualBox, physical, etc.):
Virtual machine running Fedora 35 VM. Also seen in GitLab CI when using 'docker:dind' for builds

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions