Skip to content

Conversation

@robertgzr
Copy link

The --syscall-whitelist option allows to exclude user-defined
system calls from the seccomp filter.

We reuse systemd's filter sets to match capabilities and introduce
a new function to seccomp-util to allow adding a filter set but
retaining some of the syscalls that belong to it.

In order to run Docker >1.10.3 inside of system-nspawn containers
the syscall keyctl is needed. By default it is included in nspawns
seccomp blacklist.

Fixes #5163

Needed for kinvolk/kube-spawn#21

The --syscall-whitelist option allows to exclude user-defined
system calls from the seccomp filter.

We reuse systemd's filter sets to match capabilities and introduce
a new function to seccomp-util to allow adding a filter set but
retaining some of the syscalls that belong to it.

In order to run Docker >1.10.3 inside of system-nspawn containers
the syscall `keyctl` is needed. By default it is included in nspawns
seccomp blacklist.

Fixes systemd#5163

Needed for kinvolk/kube-spawn#21
Copy link
Member

@poettering poettering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty ok, but a few minor issues

above).</para></listitem>
</varlistentry>

<varlistentry>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we tend to suffix options that accept arguments with "=" everywhere, i.e. please write this as <option>--syscall-whitelist=</option>


<varlistentry>
<term><option>--syscall-whitelist</option></term>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In prose we should probably use the wording "system call" rather than syscall

<varlistentry>
<term><option>--syscall-whitelist</option></term>

<listitem><para><command>systemd-nspawn</command> sets up a syscall filter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"seccomp" should either be explained, or at least placed in ", to indicate that is not a proper word, but some specific technology...

#include <sys/types.h>

int setup_seccomp(uint64_t cap_list_retain);
#include "seccomp-util.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why include that here? you don't need any types of it here? usually we try to keep header inclusions as local as possible

int id;

/* We can also whitelist entire filter sets */
if (sys[0] == '@' && !strv_contains(syscall_whitelist, sys)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both if branches check !strv_contains(…). Hence you can pull it out of the if check and instead continue early, no? That would remove redundancy and also be more readable... i.e. something like this at the beginning fo the loop:

if (strv_containers(syscall_whitelist, sys))
         continue;


return 0;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this function separate of seccomp_add_syscall_filter_set()? It appears you only need one, and the other can be replaced by a tiny stub in the header that calls the other with a NULL whitelist, no?

@poettering poettering added the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label May 11, 2017
@poettering
Copy link
Member

also, the CI doesn't like this:

/nspawn/nspawn-register.c
src/nspawn/nspawn-seccomp.c:179:5: error: conflicting types for ‘setup_seccomp’
 int setup_seccomp(uint64_t cap_list_retain) {
     ^
In file included from src/nspawn/nspawn-seccomp.c:31:0:
src/nspawn/nspawn-seccomp.h:26:5: note: previous declaration of ‘setup_seccomp’ was here
 int setup_seccomp(uint64_t cap_list_retain, char **system_call_filter);
     ^

Please always test you stuff locally, and make sure to enable all options while building it.

@poettering poettering added the ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR label May 11, 2017
@tixxdz
Copy link
Member

tixxdz commented May 12, 2017

@poettering adding "--syscall-whitelist" is not consistent with other parts of the code and it may affect nspawn in longterm plan. It suggests that there is some other blacklist mechanism, where that actual mechanism does block only some syscalls not all syscalls, and it is not available to users as a parameter. I suggests to keep it simple with systemd.exec SystemCallFilter= and do the whitelist there...
If the empty string is assigned, the filter is reset, all prior assignments will have no effect apply the same rule on default nspawn filter and add what you want to filter...

@poettering
Copy link
Member

@tixxdz you are right, we probably should make this work more like SystemCallFilter in unit files, i.e. permit to use it for both whitelisting and blacklisting, depending on the "~" option. But I am pretty sure this should be independent of the unit file setting, after all nspawn can be called from the command line, and often is, and the container is starts should have different restrictions than nspawn itself...

@robertgzr any chance you can extend this slightly, i.e. that we start with the current filter of nspawn, but can then both whitelist and blacklist additional syscalls using --sycall-filter= and --syscall-filter=~... Such an extension should make things a bit more uniform and powerful, and not be that much more complex.

@robertgzr
Copy link
Author

robertgzr commented May 13, 2017

@poettering Sure. Do we care about which syscalls are black-/whitelisted? Couldn't the user break nspawn with this?

That would mean we whitelist syscalls with ~ and blacklist them otherwise?
Edit: never mind...

{ CAP_SYS_TIME, { .value = "@clock\0" }},
{ CAP_SYSLOG, { .value = "syslog\0" }},
{ CAP_SYS_PACCT, { .value = "acct\0" }},
{ CAP_SYS_PTRACE, { .value =
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@poettering What is your opinion on CAP_SYS_PTRACE can we just use @debug even though it contains some additional calls?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't really mind either way

@poettering
Copy link
Member

@poettering Sure. Do we care about which syscalls are black-/whitelisted? Couldn't the user break nspawn with this?

Well, exposing low-level syscall white/blacklisting does indeed expose a certain risk to bfreak things, but I think it's OK. For example, if people block execve(), then we can't exec PID 1 anymore, and hence will break. But I think it's fine, and we documented the limitation that you can't block execve() in the SystemCallFilter= setting documentation for services, and we should do the same here.

My suggestion would be to implement --system-call-filter= very similar to how SystemCallFilter= is already implemented for services -- except that in nspawn we don't start from an empty filter list, and the caps management influences the default filter, too.

@poettering poettering modified the milestone: v234 Jun 26, 2017
robertgzr added a commit to kinvolk/systemd that referenced this pull request Sep 7, 2017
Adds a --syscall-filter option to modify nspawns default seccomp filter.
After the comments in systemd#5944 this now works similar to the `SystemCallFilter=`
option in unit files, but only accepts system calls.

This option takes a comma-separated list of system calls to add to the default filter.
If the first character of the list is "~", the listed system calls will be whitelisted
instead. I modify the blacklist before handling the `--[drop-]capability` option,
so anything there will overwrite this option.

Fixes systemd#5163

Signed-off-by: Robert Günzler <[email protected]>
robertgzr added a commit to kinvolk/systemd that referenced this pull request Sep 8, 2017
Adds a --syscall-filter option to modify nspawns default seccomp filter.
After the comments in systemd#5944 this now works similar to the `SystemCallFilter=`
option in unit files, but only accepts system calls.

This option takes a comma-separated list of system calls to add to the default filter.
If the first character of the list is "~", the listed system calls will be whitelisted
instead. I modify the blacklist before handling the `--[drop-]capability` option,
so anything there will overwrite this option.

Fixes systemd#5163
robertgzr added a commit to kinvolk/systemd that referenced this pull request Sep 8, 2017
Adds a --syscall-filter option to modify nspawns default seccomp filter.
After the comments in systemd#5944 this now works similar to the `SystemCallFilter=`
option in unit files, but only accepts system calls.

This option takes a comma-separated list of system calls to add to the default filter.
If the first character of the list is "~", the listed system calls will be whitelisted
instead. I modify the blacklist before handling the `--[drop-]capability` option,
so anything there will overwrite this option.

Fixes systemd#5163
robertgzr added a commit to kinvolk/systemd that referenced this pull request Sep 8, 2017
Adds a --syscall-filter option to set custom system call filters.
After the comments in systemd#5944 this now works similar to the `SystemCallFilter=`
option in unit files.

This option takes a comma-separated list of system calls to allow.
If the first character of the list is "~", the listed system calls
will be blocked instead. Filter sets beginning with '@' are also
supported. This overwrites nspawn default blacklist and the `--[drop-]capability`
options

Fixes systemd#5163
robertgzr added a commit to kinvolk/systemd that referenced this pull request Sep 11, 2017
Adds a --syscall-filter option to set custom system call filters.
After the comments in systemd#5944 this now works similar to the `SystemCallFilter=`
option in unit files.

This option takes a comma-separated list of system calls to allow.
If the first character of the list is "~", the listed system calls
will be blocked instead. Filter sets beginning with '@' are also
supported. This overwrites nspawn default blacklist and the `--[drop-]capability`
options

Fixes systemd#5163
poettering added a commit to poettering/systemd that referenced this pull request Sep 11, 2017
Now that we have ported nspawn's seccomp code to the generic code in
seccomp-util, let's extend it to support whitelisting and blacklisting
of specific additional syscalls.

This uses similar syntax as PID1's support for system call filtering,
but in contrast to that always implements a blacklist (and not a
whitelist), as we prepopulate the filter with a blacklist, and the
unit's system call filter logic does not come with anything
prepopulated.

(Later on we might actually want to invert the logic here, and
whitelist rather than blacklist things, but at this point let's not do
that. In case we switch this over later, the syscall add/remove logic of
this commit should be compatible conceptually.)

Fixes: systemd#5163

Replaces: systemd#5944
@robertgzr
Copy link
Author

Close in favor of #6797

@robertgzr robertgzr closed this Sep 11, 2017
poettering added a commit to poettering/systemd that referenced this pull request Sep 11, 2017
Now that we have ported nspawn's seccomp code to the generic code in
seccomp-util, let's extend it to support whitelisting and blacklisting
of specific additional syscalls.

This uses similar syntax as PID1's support for system call filtering,
but in contrast to that always implements a blacklist (and not a
whitelist), as we prepopulate the filter with a blacklist, and the
unit's system call filter logic does not come with anything
prepopulated.

(Later on we might actually want to invert the logic here, and
whitelist rather than blacklist things, but at this point let's not do
that. In case we switch this over later, the syscall add/remove logic of
this commit should be compatible conceptually.)

Fixes: systemd#5163

Replaces: systemd#5944
poettering added a commit to poettering/systemd that referenced this pull request Sep 12, 2017
Now that we have ported nspawn's seccomp code to the generic code in
seccomp-util, let's extend it to support whitelisting and blacklisting
of specific additional syscalls.

This uses similar syntax as PID1's support for system call filtering,
but in contrast to that always implements a blacklist (and not a
whitelist), as we prepopulate the filter with a blacklist, and the
unit's system call filter logic does not come with anything
prepopulated.

(Later on we might actually want to invert the logic here, and
whitelist rather than blacklist things, but at this point let's not do
that. In case we switch this over later, the syscall add/remove logic of
this commit should be compatible conceptually.)

Fixes: systemd#5163

Replaces: systemd#5944
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR nspawn reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks

Development

Successfully merging this pull request may close these issues.

3 participants