-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
nspawn: Add --syscall-whitelist option #5944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nspawn: Add --syscall-whitelist option #5944
Conversation
The --syscall-whitelist option allows to exclude user-defined system calls from the seccomp filter. We reuse systemd's filter sets to match capabilities and introduce a new function to seccomp-util to allow adding a filter set but retaining some of the syscalls that belong to it. In order to run Docker >1.10.3 inside of system-nspawn containers the syscall `keyctl` is needed. By default it is included in nspawns seccomp blacklist. Fixes systemd#5163 Needed for kinvolk/kube-spawn#21
poettering
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty ok, but a few minor issues
| above).</para></listitem> | ||
| </varlistentry> | ||
|
|
||
| <varlistentry> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we tend to suffix options that accept arguments with "=" everywhere, i.e. please write this as <option>--syscall-whitelist=</option>
|
|
||
| <varlistentry> | ||
| <term><option>--syscall-whitelist</option></term> | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In prose we should probably use the wording "system call" rather than syscall
| <varlistentry> | ||
| <term><option>--syscall-whitelist</option></term> | ||
|
|
||
| <listitem><para><command>systemd-nspawn</command> sets up a syscall filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"seccomp" should either be explained, or at least placed in ", to indicate that is not a proper word, but some specific technology...
| #include <sys/types.h> | ||
|
|
||
| int setup_seccomp(uint64_t cap_list_retain); | ||
| #include "seccomp-util.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why include that here? you don't need any types of it here? usually we try to keep header inclusions as local as possible
| int id; | ||
|
|
||
| /* We can also whitelist entire filter sets */ | ||
| if (sys[0] == '@' && !strv_contains(syscall_whitelist, sys)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both if branches check !strv_contains(…). Hence you can pull it out of the if check and instead continue early, no? That would remove redundancy and also be more readable... i.e. something like this at the beginning fo the loop:
if (strv_containers(syscall_whitelist, sys))
continue;|
|
||
| return 0; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this function separate of seccomp_add_syscall_filter_set()? It appears you only need one, and the other can be replaced by a tiny stub in the header that calls the other with a NULL whitelist, no?
|
also, the CI doesn't like this: Please always test you stuff locally, and make sure to enable all options while building it. |
|
@poettering adding "--syscall-whitelist" is not consistent with other parts of the code and it may affect nspawn in longterm plan. It suggests that there is some other blacklist mechanism, where that actual mechanism does block only some syscalls not all syscalls, and it is not available to users as a parameter. I suggests to keep it simple with systemd.exec |
|
@tixxdz you are right, we probably should make this work more like SystemCallFilter in unit files, i.e. permit to use it for both whitelisting and blacklisting, depending on the "~" option. But I am pretty sure this should be independent of the unit file setting, after all nspawn can be called from the command line, and often is, and the container is starts should have different restrictions than nspawn itself... @robertgzr any chance you can extend this slightly, i.e. that we start with the current filter of nspawn, but can then both whitelist and blacklist additional syscalls using --sycall-filter= and --syscall-filter=~... Such an extension should make things a bit more uniform and powerful, and not be that much more complex. |
|
@poettering Sure. Do we care about which syscalls are black-/whitelisted? Couldn't the user break nspawn with this? That would mean we whitelist syscalls with |
| { CAP_SYS_TIME, { .value = "@clock\0" }}, | ||
| { CAP_SYSLOG, { .value = "syslog\0" }}, | ||
| { CAP_SYS_PACCT, { .value = "acct\0" }}, | ||
| { CAP_SYS_PTRACE, { .value = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@poettering What is your opinion on CAP_SYS_PTRACE can we just use @debug even though it contains some additional calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't really mind either way
Well, exposing low-level syscall white/blacklisting does indeed expose a certain risk to bfreak things, but I think it's OK. For example, if people block execve(), then we can't exec PID 1 anymore, and hence will break. But I think it's fine, and we documented the limitation that you can't block execve() in the SystemCallFilter= setting documentation for services, and we should do the same here. My suggestion would be to implement --system-call-filter= very similar to how SystemCallFilter= is already implemented for services -- except that in nspawn we don't start from an empty filter list, and the caps management influences the default filter, too. |
Adds a --syscall-filter option to modify nspawns default seccomp filter. After the comments in systemd#5944 this now works similar to the `SystemCallFilter=` option in unit files, but only accepts system calls. This option takes a comma-separated list of system calls to add to the default filter. If the first character of the list is "~", the listed system calls will be whitelisted instead. I modify the blacklist before handling the `--[drop-]capability` option, so anything there will overwrite this option. Fixes systemd#5163 Signed-off-by: Robert Günzler <[email protected]>
Adds a --syscall-filter option to modify nspawns default seccomp filter. After the comments in systemd#5944 this now works similar to the `SystemCallFilter=` option in unit files, but only accepts system calls. This option takes a comma-separated list of system calls to add to the default filter. If the first character of the list is "~", the listed system calls will be whitelisted instead. I modify the blacklist before handling the `--[drop-]capability` option, so anything there will overwrite this option. Fixes systemd#5163
Adds a --syscall-filter option to modify nspawns default seccomp filter. After the comments in systemd#5944 this now works similar to the `SystemCallFilter=` option in unit files, but only accepts system calls. This option takes a comma-separated list of system calls to add to the default filter. If the first character of the list is "~", the listed system calls will be whitelisted instead. I modify the blacklist before handling the `--[drop-]capability` option, so anything there will overwrite this option. Fixes systemd#5163
Adds a --syscall-filter option to set custom system call filters. After the comments in systemd#5944 this now works similar to the `SystemCallFilter=` option in unit files. This option takes a comma-separated list of system calls to allow. If the first character of the list is "~", the listed system calls will be blocked instead. Filter sets beginning with '@' are also supported. This overwrites nspawn default blacklist and the `--[drop-]capability` options Fixes systemd#5163
Adds a --syscall-filter option to set custom system call filters. After the comments in systemd#5944 this now works similar to the `SystemCallFilter=` option in unit files. This option takes a comma-separated list of system calls to allow. If the first character of the list is "~", the listed system calls will be blocked instead. Filter sets beginning with '@' are also supported. This overwrites nspawn default blacklist and the `--[drop-]capability` options Fixes systemd#5163
Now that we have ported nspawn's seccomp code to the generic code in seccomp-util, let's extend it to support whitelisting and blacklisting of specific additional syscalls. This uses similar syntax as PID1's support for system call filtering, but in contrast to that always implements a blacklist (and not a whitelist), as we prepopulate the filter with a blacklist, and the unit's system call filter logic does not come with anything prepopulated. (Later on we might actually want to invert the logic here, and whitelist rather than blacklist things, but at this point let's not do that. In case we switch this over later, the syscall add/remove logic of this commit should be compatible conceptually.) Fixes: systemd#5163 Replaces: systemd#5944
|
Close in favor of #6797 |
Now that we have ported nspawn's seccomp code to the generic code in seccomp-util, let's extend it to support whitelisting and blacklisting of specific additional syscalls. This uses similar syntax as PID1's support for system call filtering, but in contrast to that always implements a blacklist (and not a whitelist), as we prepopulate the filter with a blacklist, and the unit's system call filter logic does not come with anything prepopulated. (Later on we might actually want to invert the logic here, and whitelist rather than blacklist things, but at this point let's not do that. In case we switch this over later, the syscall add/remove logic of this commit should be compatible conceptually.) Fixes: systemd#5163 Replaces: systemd#5944
Now that we have ported nspawn's seccomp code to the generic code in seccomp-util, let's extend it to support whitelisting and blacklisting of specific additional syscalls. This uses similar syntax as PID1's support for system call filtering, but in contrast to that always implements a blacklist (and not a whitelist), as we prepopulate the filter with a blacklist, and the unit's system call filter logic does not come with anything prepopulated. (Later on we might actually want to invert the logic here, and whitelist rather than blacklist things, but at this point let's not do that. In case we switch this over later, the syscall add/remove logic of this commit should be compatible conceptually.) Fixes: systemd#5163 Replaces: systemd#5944
The --syscall-whitelist option allows to exclude user-defined
system calls from the seccomp filter.
We reuse systemd's filter sets to match capabilities and introduce
a new function to seccomp-util to allow adding a filter set but
retaining some of the syscalls that belong to it.
In order to run Docker >1.10.3 inside of system-nspawn containers
the syscall
keyctlis needed. By default it is included in nspawnsseccomp blacklist.
Fixes #5163
Needed for kinvolk/kube-spawn#21