Description
PKU (Protection Keys for Userspace) (aka. MPK: Memory Protection Keys) has been introduced into Linux since 2015 (details can be found in the lwn page). It provides lightweight and flexible thread-level permission control. Application can leverage such hardware feature to enforce their robustness.
The related syscalls in Linux are shown below (More details can be found in man page):
pkey_alloc()
pkey_mprotect()
pkey_free()
These syscalls are invoked by applications for address space configuration. But we discover that the default seccomp profile (can be found here) has not added these syscall to the whitlist, which makes their invocation failed.
If we want to use PKU in docker, we can use our customized profile by --secomp-profile flag, which is redundant to developer. Another choice is to add --privileged flag in docker run command, which may bring other security issues.
Our question is:
Can we add PKU related syscalls to the white list in the default seccomp profile?
Similar to mprotect(), these syscalls can only configure its own virtual memory in a process, and they brings no extral security issues as far as I know.
If we cannot added them to whitelist, why?
Steps to reproduce the issue:
- Here is the sample code in
test.c:
#define _GNU_SOURCE
#include <stdio.h>
#include <sys/mman.h>
int main() {
int pkey = pkey_alloc(0, 0);
if (pkey < 0) {
printf("pkey_alloc() return error: %d\n", pkey);
perror("Cannot invoke pkey_alloc()");
} else {
printf("Successfully alloc pkey, pkey=%d\n", pkey);
pkey_free(pkey);
}
return 0;
}
- Compile it, copy it to the docker and run:
$ gcc -o test test.c
$ docker run -it -v ${PWD}:/root/test_pku ubuntu:18.04
$ /root/test_pku/test
- Then get the output
Describe the results you received:
The results show that pkey_alloc() failed:
pkey_alloc() return error: -1
Cannot invoke pkey_alloc(): Operation not permitted
Describe the results you expected:
The output should be as same as that in native Linux:
Successfully alloc pkey, pkey=1
Additional information you deem important:
These test must be performed on the particular hardware and kernel which support PKU.
We can use lscpu | grep pku to detect whether our machine support PKU.
Output of docker version:
Client: Docker Engine - Community
Version: 20.10.9
API version: 1.41
Go version: go1.16.8
Git commit: c2ea9bc
Built: Mon Oct 4 16:08:29 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.9
API version: 1.41 (minimum version 1.12)
Go version: go1.16.8
Git commit: 79ea9d3
Built: Mon Oct 4 16:06:37 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.11
GitCommit: 5b46e404f6b9f661a205e28d59c982d3634148f8
runc:
Version: 1.0.2
GitCommit: v1.0.2-0-g52b36a2
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Output of docker info:
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Build with BuildKit (Docker Inc., v0.6.3-docker)
scan: Docker Scan (Docker Inc., v0.8.0)
Server:
Containers: 74
Running: 21
Paused: 0
Stopped: 53
Images: 468
Server Version: 20.10.9
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 5b46e404f6b9f661a205e28d59c982d3634148f8
runc version: v1.0.2-0-g52b36a2
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.13.0-27-generic
Operating System: Ubuntu 20.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 64
Total Memory: 188.4GiB
Name: xxx
ID: xxx
Docker Root Dir: /var/lib/docker
Debug Mode: false
HTTP Proxy: xxx
HTTPS Proxy: xxx
No Proxy: localhost,127.0.0.1,xxx
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Note: I change some sentitive information to xxx
Additional environment details (AWS, VirtualBox, physical, etc.):
None
Description
PKU (Protection Keys for Userspace) (aka. MPK: Memory Protection Keys) has been introduced into Linux since 2015 (details can be found in the lwn page). It provides lightweight and flexible thread-level permission control. Application can leverage such hardware feature to enforce their robustness.
The related syscalls in Linux are shown below (More details can be found in man page):
pkey_alloc()pkey_mprotect()pkey_free()These syscalls are invoked by applications for address space configuration. But we discover that the default
seccompprofile (can be found here) has not added these syscall to the whitlist, which makes their invocation failed.If we want to use PKU in docker, we can use our customized profile by
--secomp-profileflag, which is redundant to developer. Another choice is to add--privilegedflag indocker runcommand, which may bring other security issues.Our question is:
Can we add PKU related syscalls to the white list in the default
seccompprofile?Similar to
mprotect(), these syscalls can only configure its own virtual memory in a process, and they brings no extral security issues as far as I know.If we cannot added them to whitelist, why?
Steps to reproduce the issue:
test.c:Describe the results you received:
The results show that
pkey_alloc()failed:Describe the results you expected:
The output should be as same as that in native Linux:
Additional information you deem important:
These test must be performed on the particular hardware and kernel which support PKU.
We can use
lscpu | grep pkuto detect whether our machine support PKU.Output of
docker version:Output of
docker info:Note: I change some sentitive information to
xxxAdditional environment details (AWS, VirtualBox, physical, etc.):
None