Align default seccomp profile with selected capabilities#22554
Align default seccomp profile with selected capabilities#22554thaJeztah merged 1 commit intomoby:masterfrom
Conversation
|
There is I think an issue with checking the generated json profile on the tests on non amd64 architectures that I need to work out a way to resolve or tests on arm will probably fail. |
c15efb9 to
14379c0
Compare
Failure on simple janky |
|
@LK4D4 actually it might be, will investigate further |
|
@justincormack no, I saw failures in two other PRs :/ But if you'll find why it fails it would be perfect :) |
|
@LK4D4 yes, that test will be broken with this patch as |
There was a problem hiding this comment.
just glancing over this, so this may be a silly note; are these the values as passed through the command-line (I.e, --cap-add), because I think we pass those without the CAP_ prefix; see https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities
There was a problem hiding this comment.
No, at this point they have been normalised to the OCI spec, so they are the full version, in caps, and with all the actual caps that will be applied, conveniently.
|
👍 I like this |
|
This is cool! Looks good to me, maybe just needs some tests for adding the specific caps :) since this kinda complicates the behavior maybe some unit tests too but I'm wondering how difficult those would be |
|
Added docs changes and test fixes, maybe might pass CI now... |
|
I am fine with this, I like the idea, I think there is a chance of this being buggy where a seccomp filter continues to block access that would be allowed with the modified caps. But this at least gives users the ability to adjust the permissions based on --cap-add/--cap-drop without needing to drop seccomp filters all together or switching to --privileged. |
|
I think we can move it to code-review. |
|
@rhatdan yes there are probably some cases, but I have been pretty broad with the added syscalls, eg |
|
Yes, but to give you an example of a potential bug from a user point of view. reboot inside of a container depending on the namespaces will only effect the PID1 of the container not necessarily reboot the host system. |
|
|
|
1 test still failing, will examine. |
Currently the default seccomp profile is fixed. This changes it so that it varies depending on the Linux capabilities selected with the --cap-add and --cap-drop options. Without this, if a user adds privileges, eg to allow ptrace with --cap-add sys_ptrace then still cannot actually use ptrace as it is still blocked by seccomp, so they will probably disable seccomp or use --privileged. With this change the syscalls that are needed for the capability are also allowed by the seccomp profile based on the selected capabilities. While this patch makes it easier to do things with for example cap_sys_admin enabled, as it will now allow creating new namespaces and use of mount, it still allows less than --cap-add cap_sys_admin --security-opt seccomp:unconfined would have previously. It is not recommended that users run containers with cap_sys_admin as this does give full access to the host machine. It also cleans up some architecture specific system calls to be only selected when needed. Signed-off-by: Justin Cormack <[email protected]>
|
Rebase/rebuild seems to have fixed the test failure, which I could not replicate, maybe that is the occasional failure @LK4D4 was seeing before? |
|
sigh.. |
|
retest this please |
|
It's green now ping @jfrazelle @LK4D4 PTAL 👍 |
|
looks good to me! I like the tests :) |
|
LGTM |
|
LGTM |
|
docs LGTM - @thaJeztah |
|
oh! forgot to add my LGTM. Here it is \o/ |
In moby#22554 I aligned seccomp and capabilities, however the case of the chown calls and CAP_CHOWN was less clearcut, as these are simple calls that the capabilities will block if they are not allowed. They are needed when no new privileges is not set in order to allow docker to call chown before the container is started, so there was a workaround but this did not include all the chown syscalls, and Arm was failing on some seccomp tests because it was using a different syscall from just the fchown that was allowed in this case. It is simpler to just allow all the chown calls in the default seccomp profile and let the capabilities subsystem block them. Signed-off-by: Justin Cormack <[email protected]>
Currently the default seccomp profile is fixed. This changes it so that it varies depending on the Linux capabilities selected with the --cap-add and --cap-drop options. Without this, if a user adds privileges, eg to allow ptrace with --cap-add sys_ptrace then still cannot actually use ptrace as it is still blocked by seccomp, so they will probably disable seccomp or use --privileged. With this change the syscalls that are needed for the capability are also allowed by the seccomp profile based on the selected capabilities. While this patch makes it easier to do things with for example cap_sys_admin enabled, as it will now allow creating new namespaces and use of mount, it still allows less than --cap-add cap_sys_admin --security-opt seccomp:unconfined would have previously. It is not recommended that users run containers with cap_sys_admin as this does give full access to the host machine. It also cleans up some architecture specific system calls to be only selected when needed. cherry-pick from moby#22554 Signed-off-by: Justin Cormack <[email protected]> Signed-off-by: yangshukui <[email protected]>
Currently the default seccomp profile is fixed. This changes it
so that it varies depending on the Linux capabilities selected with
the --cap-add and --cap-drop options. Without this, if a user adds
privileges, eg to allow ptrace with --cap-add sys_ptrace then still
cannot actually use ptrace as it is still blocked by seccomp, so
they will probably disable seccomp or use --privileged. With this
change the syscalls that are needed for the capability are also
allowed by the seccomp profile based on the selected capabilities.
While this patch makes it easier to do things with for example
cap_sys_admin enabled, as it will now allow creating new namespaces
and use of mount, it still allows less than --cap-add cap_sys_admin
--security-opt seccomp:unconfined would have previously. It is not
recommended that users run containers with cap_sys_admin as this does
give full access to the host machine.
It also cleans up some architecture specific system calls to be
only selected when needed.
Signed-off-by: Justin Cormack [email protected]
Note this patch is currently missing the needed documentation changes,
but I thought I would put it out for review first.
cc @jfrazelle