Skip to content

Conversation

@Bonjourz
Copy link
Contributor

@Bonjourz Bonjourz commented Apr 15, 2022

Closes #43481

Add pkey_alloc(2), pkey_free(2) and pkey_mprotect(2) in seccomp default profile.
Similar to mprotect(), pkey_alloc(2), pkey_free(2) and pkey_mprotect(2) can only configure its own memory of the process, so they are existing "safe for everyone" syscalls.

Such syscalls were added to Linux in kernel 4.9
More details can be found in the man page

  • What I did
    I add pkey_alloc(), pkey_free()andpkey_mprotect()` into the default syscall's white list.

  • How I did it
    Modify profiles/seccomp/default.json and profiles/seccomp/default_linux.go, append pkey_alloc(), pkey_free() and pkey_mprotect() to the default syscall list.

  • How to verify it

    Here is the sample code (test.c):

    #define _GNU_SOURCE
    #include <stdlib.h>
    #include <stdio.h>
    #include <sys/mman.h>
    #define BUF_SIZE    (0x1000)
    int main() {
        int ret = -1, pkey = -1;
        ret = pkey_alloc(0, 0);
        if (ret < 0) {
            perror("Cannot invoke pkey_alloc() successfully");
            exit(-1);
        } else {
            pkey = ret;
            char *buf = (char *)mmap(NULL, BUF_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
            if (buf == MAP_FAILED) {
                perror("Cannot invoke mmap() successfully");
                exit(-1);
            }
            ret = pkey_mprotect(buf, BUF_SIZE, PROT_READ | PROT_WRITE, pkey);
            if (ret < 0) {
                perror("Cannot invoke pkey_mprotect() successfully");
                exit(-1);
            }
            ret = munmap(buf, BUF_SIZE);
            if (ret < 0) {
                perror("Cannot invoke munmap() successfully");
                exit(-1);
            }
            ret = pkey_free(pkey);
            if (ret < 0) {
                perror("Cannot invoke pkey_free() successfully");
                exit(-1);
            }
        }
        printf("PKU related syscalls are allowed in container environment!\n");
        return 0;
    }
    

    Compile it first:

    gcc -o test test.c
    
    docker run -it -v ${PWD}:/root/test --security-opt seccomp=${PWD}/profiles/seccomp/default.json ubuntu::20.04
    

    Run test in docker:

    /root/test/test
    

    and will get the output:

    PKU related syscalls are allowed in container environment!
    
  • Description for the changelog
    profiles/seccomp/default.json and profiles/seccomp/default_linux.go: Add pkey_alloc(), pkey_free() and pkey_mprotect() to the default white list.

  • A picture of a cute animal (not mandatory but encouraged)

@Bonjourz Bonjourz marked this pull request as ready for review April 15, 2022 03:43
@Bonjourz Bonjourz force-pushed the 43481_support_pku branch 2 times, most recently from bd93981 to 96afe3e Compare April 15, 2022 03:53
@tianon
Copy link
Member

tianon commented Apr 19, 2022

cc @justincormack

@cpuguy83
Copy link
Member

This seems like it would be ok.
I would say this paragraph seems a bit ambiguous to me if the limit is per process.

   To use the pkeys feature, the processor must support it, and the
   kernel must contain support for the feature on a given processor.
   As of early 2016 only future Intel x86 processors are supported,
   and this hardware supports 16 protection keys in each process.
   However, pkey 0 is used as the default key, so a maximum of 15
   are available for actual application use.

If it is indeed per process this seems OK.
That said you can change the default seccomp profile of a daemon by setting the --secomp-profile flag on dockerd.

@Bonjourz
Copy link
Contributor Author

Bonjourz commented Apr 21, 2022

Hi @cpuguy83 , could you provide more details to show which statement confuses you?
As mentioned in #43490 (comment), pkey_alloc(), pkey_free() and pkey_mprotect() can only takes effect on its process's own memory, they do nothing with other processes.

You also mentioned:

That said you can change the default seccomp profile of a daemon by setting the --secomp-profile flag on dockerd.

Add --seccomp-profile works for us, but such solution seems not to be elegant, since developer may need to update legacy command in some production environments. My propose is to support pkey_alloc(), pkey_free() and pkey_mprotect() syscalls in seccomp default profile smoothly, similar to mprotect().

@thaJeztah thaJeztah changed the title Support PKU in docker by default seccomp: Support PKU in docker by default May 28, 2022
Add pkey_alloc(2), pkey_free(2) and pkey_mprotect(2) in seccomp default profile.
pkey_alloc(2), pkey_free(2) and pkey_mprotect(2) can only configure
the calling process's own memory, so they are existing "safe for everyone" syscalls.

close issue: moby#43481

Signed-off-by: zhubojun <[email protected]>
@Bonjourz Bonjourz force-pushed the 43481_support_pku branch from 96afe3e to e258d66 Compare July 11, 2022 01:51
@Bonjourz
Copy link
Contributor Author

Hi, is there anyone help me review this PR? Thanks for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can we support PKU in docker by default? (pkey_mprotect(2), pkey_alloc(2), pkey_free(2))

6 participants