-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
Entitlements in Moby
This issue captures a draft design proposal for an entitlement mechanism that can be leveraged by Moby and other container management platforms to describe what additional permissions a specific service should be allowed to have when executing.
An entitlement is a single right granted to a particular service/container that gives it additional permissions above and beyond what it would ordinarily have. An entitlement is a piece of configuration information included in the service spec, telling the container engine that executes a service allow access to certain resources or perform certain operations. In effect, an entitlement extends the sandbox and capabilities of your service to allow a particular operation to occur.
The Docker CLI currently supports over 100 command-line flags. By implementing an entitlement mechanism we plan to allow downstream consumes of moby such as docker to unify all of the security related flags into a single mechanism that is granular enough to be useful, platform independent, and understandable by a non-expert user. This mechanism can be seen as Moby's equivalent of Apple’s App Store permission model, where apps are granted capabilities to operate beyond the normal privileged of an application, such as access to the keychain or the ability of enabling push-notifications.
Goal
The goal is to simplify the way downstream users ask for permissions for their containers/services.
In order for a service to use a specific entitlement, access to that entitlement has to be granted. The objective is to have a grant mechanism that would look like this:
Specifying 1+ entitlement on the command-line:
docker run --entitlements=[entitlement1] --entitlements=[entitlement2] alpine
docker service create --entitlements=[entitlement] alpine
Current proposal for entitlements
| Entitlement | Privileges | Capabilities | Blocked syscalls | On Windows |
|---|---|---|---|---|
| api.access | API access management, defaults provided for the Engine and Swarm APIs | |||
| host.processes.none | Do not share host's PID namespace | N/A | ||
| host.processes.admin | Shares host's PID namespace | N/A | ||
| host.devices.none | RO for sysfs, No additional non-default mounts; No RW on /proc/kcore | N/A | ||
| host.devices.view | RO on non-default mounts | N/A | ||
| host.devices.mount | Add SYS_ADMIN and allow a device to be mounted in. | N/A | ||
| network.none | No access to /proc/pid/net, /proc/sys/net; No access to /sys/class/net | No NET_ADMIN, NET_BIND_SERVICE, NET_RAW, NET_BROADCAST | socket, socketpair, setsockopt, getsockopt, getsockname, getpeername, bind, listen, accept, accept4, connect, shutdown,recvfrom, recvmsg, sendto, sendmsg, sendmmsg, sethostname, setdomainname, bpf | |
| network.user | CAP_NET_RAW, CAP_NET_BIND_SERVICE?, CAP_NET_BROADCAST? | sethostname, setdomainname bpf, setsockopt(SO_DEBUG) | ||
| network.proxy | Add: CAP_NET_RAW, CAP_NET_BROADCAST, CAP_NET_BIND_SERVICE, CAP_NET_BIND_SERVICE, CAP_NET_RAW | |||
| network.admin | CAP_NET_ADMIN, CAP_NET_BROADCAST, CAP_NET_BIND_SERVICE, CAP_NET_RAW | |||
| security.confined | Block access to sentitive paths: /sys/kernel/security , /sys/kernel/debug (ftrace), /sys/kernel/livepatch, /sys/fs/selinux, /sys/fs/cgroup, debugfs, securityfs, selinuxfs, /proc/sys/kernel/, /proc/config.gz, /boot, /proc/{mem,cpu,kcore,kmem,sysrq-trigger,bus} No MAC/DAC policy read/write or configuration/state change NoNewPrivileges activated |
Drop:No CAP_MAC_*, CAP_DAC_*, CAP_SETPCAP, SYS_PTRACE, CAP_SET_*, CAP_FSETID, CAP_SYS_ADMIN | bpf, ptrace, seccomp, arch_prctl, personality, setuid/setgid?, madvise, prctl(PR_CAPBSET_DROP, PR_SET_*, ..) | |
| security.view | Read Only rights on sensitive filesystems / fs directories and MAC/DAC policies | Add:CAP_MAC_*, CAP_DAC_*, CAP_SETPCAP Drop: CAP_LINUX_IMMUTABLE?, CAP_SET_*, CAP_FSETID, , SYS_PTRACE, CAP_SYS_ADMIN |
||
| security.admin | Add: CAP_MAC_*, CAP_DAC_*, CAP_LINUX_IMMUTABLE, CAP_SYS_MODULE, CAP_SYS_PTRACE, CAP_SYSLOG, CAP_FSETID, CAP_SYS_BOOT | |||
| security.read-only | Mounts the container's filesystem as read-only | |||
resources.limit=value |
value is a percentage of available resources in the container at launch context for: Pids, perf_event, blkio, hugetlb, freezer, net_cls, net_prio, cpuset, memory, systemd. Set ulimits properly |
|||
| debug | security.unconfined | Add:CAP_SYS_ADMIN, CAP_SYS_PTRACE, CAP_SYSLOG |
Examples of usecases
Following examples are meant to show how to use entitlements for the most downloaded images on Docker Hub. They will probably be edited as new entitlements will probably be added and/or adjusted.
People used to run the following command in various use-cases:
docker run --privileged imagename:label
We impact privileges by reducing the amount of rights granted, let’s see some examples:
People who need to use raw sockets to control link discovery and aggregation
Before:
docker run --privileged image:label
After:
docker run --entitlements=network.proxy image:label
Docker in Docker would probably look like:
Before:
docker run --privileged docker:dind
After:
docker run --entitlements=network.admin --entitlements=host.devices.admin --entitlements=security.admin docker:dind
Long term, you should be able to tie a set of entitlements to an image if you want to as a publisher so the more example, the better.
Open Questions
Can the same privilege name mean different things for docker version to docker version? We would be treating entitlements as we treat default profiles.
Do all these entitlements also make sense on windows?
Should we provide a way to create custom entitlements?
What we need from the community
- Validation that these entitlements are: high-level enough, non-overlapping and correctly implemented using the lower level primitives available from Linux
- Examples of use-cases where the entitlements fit/don't fit, and if they add too much privilege over what users are usually doing Don't unmount entire plugin manager tree on remove #33422