Skip to content

Can we support PKU in docker by default? (pkey_mprotect(2), pkey_alloc(2), pkey_free(2)) #43481

@Bonjourz

Description

@Bonjourz

Description

PKU (Protection Keys for Userspace) (aka. MPK: Memory Protection Keys) has been introduced into Linux since 2015 (details can be found in the lwn page). It provides lightweight and flexible thread-level permission control. Application can leverage such hardware feature to enforce their robustness.

The related syscalls in Linux are shown below (More details can be found in man page):

  • pkey_alloc()
  • pkey_mprotect()
  • pkey_free()

These syscalls are invoked by applications for address space configuration. But we discover that the default seccomp profile (can be found here) has not added these syscall to the whitlist, which makes their invocation failed.
If we want to use PKU in docker, we can use our customized profile by --secomp-profile flag, which is redundant to developer. Another choice is to add --privileged flag in docker run command, which may bring other security issues.

Our question is:
Can we add PKU related syscalls to the white list in the default seccomp profile?
Similar to mprotect(), these syscalls can only configure its own virtual memory in a process, and they brings no extral security issues as far as I know.
If we cannot added them to whitelist, why?

Steps to reproduce the issue:

  1. Here is the sample code in test.c:
#define _GNU_SOURCE
#include <stdio.h>
#include <sys/mman.h>

int main() {
    int pkey = pkey_alloc(0, 0);
    if (pkey < 0) {
        printf("pkey_alloc() return error: %d\n", pkey);
        perror("Cannot invoke pkey_alloc()");
    } else {
        printf("Successfully alloc pkey, pkey=%d\n", pkey);
        pkey_free(pkey);
    }
    return 0;
}
  1. Compile it, copy it to the docker and run:
$ gcc -o test test.c
$ docker run -it -v ${PWD}:/root/test_pku ubuntu:18.04 
$ /root/test_pku/test
  1. Then get the output

Describe the results you received:

The results show that pkey_alloc() failed:

pkey_alloc() return error: -1
Cannot invoke pkey_alloc(): Operation not permitted

Describe the results you expected:

The output should be as same as that in native Linux:

Successfully alloc pkey, pkey=1

Additional information you deem important:

These test must be performed on the particular hardware and kernel which support PKU.
We can use lscpu | grep pku to detect whether our machine support PKU.

Output of docker version:

Client: Docker Engine - Community
 Version:           20.10.9
 API version:       1.41
 Go version:        go1.16.8
 Git commit:        c2ea9bc
 Built:             Mon Oct  4 16:08:29 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.9
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.8
  Git commit:       79ea9d3
  Built:            Mon Oct  4 16:06:37 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.11
  GitCommit:        5b46e404f6b9f661a205e28d59c982d3634148f8
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.6.3-docker)
  scan: Docker Scan (Docker Inc., v0.8.0)

Server:
 Containers: 74
  Running: 21
  Paused: 0
  Stopped: 53
 Images: 468
 Server Version: 20.10.9
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 5b46e404f6b9f661a205e28d59c982d3634148f8
 runc version: v1.0.2-0-g52b36a2
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.13.0-27-generic
 Operating System: Ubuntu 20.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 64
 Total Memory: 188.4GiB
 Name: xxx
 ID: xxx
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: xxx
 HTTPS Proxy: xxx
 No Proxy: localhost,127.0.0.1,xxx
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Note: I change some sentitive information to xxx

Additional environment details (AWS, VirtualBox, physical, etc.):

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/security/seccompkind/enhancementEnhancements are not bugs or new features but can improve usability or performance.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions