We recently add support for no_new_priv to docker.
It was also earlier added to runc and the Open Container Initiative
spec.
This security feature was added to the Linux kernel back in 2012. A process can set the no_new_priv bit
in the kernel that persists across fork, clone and execve. The no_new_priv bit ensures that the process or its
children processes do not gain any additional privileges. A process isn't allowed to unset the no_new_priv bit
once it is set. Process with no_new_privs are not allows to change uid/gid or gain any other capabilities. Even
if the process executes setuid binaries or executables with file capability bits set. no_new_priv also, prevents LSMs like SELinux from transitioning to process labels that have access not allowed to the current process. This means an SELinux process is only allowed to transition to a process type with less privileges.
For more details see kernel documentation
Here is an example showcasing how it helps in docker:
Create a setuid binary that displays the effective uid
[$ dockerfiles]# cat testnnp.c
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
int main(int argc, char *argv[])
{
printf("Effective uid: %d\n", geteuid());
return 0;
}
[$ dockerfiles]# make testnnp
cc testnnp.c -o testnnp
Now we will add the binary to a docker image
[$ dockerfiles]# cat Dockerfile
FROM fedora:latest
ADD testnnp /root/testnnp
RUN chmod +s /root/testnnp
ENTRYPOINT /root/testnnp
[$ dockerfiles]# docker build -t testnnp .
Sending build context to Docker daemon 12.29 kB
Step 1 : FROM fedora:latest
---> 760a896a323f
Step 2 : ADD testnnp /root/testnnp
---> 6c700f277948
Removing intermediate container 0981144fe404
Step 3 : RUN chmod +s /root/testnnp
---> Running in c1215bfbe825
---> f1f07d05a691
Removing intermediate container c1215bfbe825
Step 4 : ENTRYPOINT /root/testnnp
---> Running in 5a4d324d54fa
---> 44f767c67e30
Removing intermediate container 5a4d324d54fa
Successfully built 44f767c67e30
Now we will create and run a container without no-new-privileges.
[$ dockerfiles]# docker run -it --rm --user=1000 testnnp
Effective uid: 0
This shows that even though you requested a non privileged user (UID=1000) to run your container,
that user would be able to become root by executing the setuid app on the container image.
Running with no-new-privileges prevents the uid transition while running a setuid binary
[$ dockerfiles]# docker run -it --rm --user=1000 --security-opt=no-new-privileges testnnp
Effective uid: 1000
As you can see above the container process is still running as UID=1000, meaning that even if the
image has dangerous code in it, we can stil prevent the user from escalating privs.
If you want to allow users to run images as a non privilege UID, in most cases you would want to
prevent them from becoming root. no_new_privileges is a great tool for guaranteeing this.
We recently add support for no_new_priv to docker.
It was also earlier added to runc and the Open Container Initiative
spec.
This security feature was added to the Linux kernel back in 2012. A process can set the no_new_priv bit
in the kernel that persists across fork, clone and execve. The no_new_priv bit ensures that the process or its
children processes do not gain any additional privileges. A process isn't allowed to unset the no_new_priv bit
once it is set. Process with no_new_privs are not allows to change uid/gid or gain any other capabilities. Even
if the process executes setuid binaries or executables with file capability bits set. no_new_priv also, prevents LSMs like SELinux from transitioning to process labels that have access not allowed to the current process. This means an SELinux process is only allowed to transition to a process type with less privileges.
For more details see kernel documentation
Here is an example showcasing how it helps in docker:
Create a setuid binary that displays the effective uid
Now we will add the binary to a docker image
Now we will create and run a container without no-new-privileges.
This shows that even though you requested a non privileged user (UID=1000) to run your container,
that user would be able to become root by executing the setuid app on the container image.
Running with no-new-privileges prevents the uid transition while running a setuid binary
As you can see above the container process is still running as UID=1000, meaning that even if the
image has dangerous code in it, we can stil prevent the user from escalating privs.
If you want to allow users to run images as a non privilege UID, in most cases you would want to
prevent them from becoming root. no_new_privileges is a great tool for guaranteeing this.