Skip to content

Align default seccomp profile with selected capabilities#22554

Merged
thaJeztah merged 1 commit intomoby:masterfrom
justincormack:seccap
May 20, 2016
Merged

Align default seccomp profile with selected capabilities#22554
thaJeztah merged 1 commit intomoby:masterfrom
justincormack:seccap

Conversation

@justincormack
Copy link
Copy Markdown
Contributor

Currently the default seccomp profile is fixed. This changes it
so that it varies depending on the Linux capabilities selected with
the --cap-add and --cap-drop options. Without this, if a user adds
privileges, eg to allow ptrace with --cap-add sys_ptrace then still
cannot actually use ptrace as it is still blocked by seccomp, so
they will probably disable seccomp or use --privileged. With this
change the syscalls that are needed for the capability are also
allowed by the seccomp profile based on the selected capabilities.

While this patch makes it easier to do things with for example
cap_sys_admin enabled, as it will now allow creating new namespaces
and use of mount, it still allows less than --cap-add cap_sys_admin
--security-opt seccomp:unconfined would have previously. It is not
recommended that users run containers with cap_sys_admin as this does
give full access to the host machine.

It also cleans up some architecture specific system calls to be
only selected when needed.

Signed-off-by: Justin Cormack [email protected]

Note this patch is currently missing the needed documentation changes,
but I thought I would put it out for review first.

cc @jfrazelle

goatymcgoatface

@justincormack
Copy link
Copy Markdown
Contributor Author

There is I think an issue with checking the generated json profile on the tests on non amd64 architectures that I need to work out a way to resolve or tests on arm will probably fail.

@justincormack
Copy link
Copy Markdown
Contributor Author

cc @rhatdan @cpuguy83

@justincormack justincormack force-pushed the seccap branch 2 times, most recently from c15efb9 to 14379c0 Compare May 6, 2016 15:15
@LK4D4
Copy link
Copy Markdown
Contributor

LK4D4 commented May 6, 2016

15:38:53 ----------------------------------------------------------------------
15:38:53 FAIL: docker_cli_run_unix_test.go:947: DockerSuite.TestRunSeccompDefaultProfile
15:38:53 
15:38:53 docker_cli_run_unix_test.go:989:
15:38:53     c.Assert(err, checker.IsNil)
15:38:53 ... value *errors.errorString = &errors.errorString{s:"expected Operation not permitted, got: acct failed: No such file or directory\n"} ("expected Operation not permitted, got: acct failed: No such file or directory\n")
15:38:53 

Failure on simple janky
Looks like it's not related to your PR.

@justincormack
Copy link
Copy Markdown
Contributor Author

@LK4D4 actually it might be, will investigate further

@LK4D4
Copy link
Copy Markdown
Contributor

LK4D4 commented May 6, 2016

@justincormack no, I saw failures in two other PRs :/ But if you'll find why it fails it would be perfect :)

@justincormack
Copy link
Copy Markdown
Contributor Author

@LK4D4 yes, that test will be broken with this patch as acct is allowed now with --cap-add all, will fix the test. So where else did you see it fail?

Comment thread profiles/seccomp/seccomp_default.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just glancing over this, so this may be a silly note; are these the values as passed through the command-line (I.e, --cap-add), because I think we pass those without the CAP_ prefix; see https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, at this point they have been normalised to the OCI spec, so they are the full version, in caps, and with all the actual caps that will be applied, conveniently.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good!

@coolljt0725
Copy link
Copy Markdown
Contributor

👍 I like this

@jessfraz
Copy link
Copy Markdown
Contributor

jessfraz commented May 8, 2016

This is cool! Looks good to me, maybe just needs some tests for adding the specific caps :) since this kinda complicates the behavior maybe some unit tests too but I'm wondering how difficult those would be

@thaJeztah thaJeztah added the status/failing-ci Indicates that the PR in its current state fails the test suite label May 9, 2016
@justincormack
Copy link
Copy Markdown
Contributor Author

Added docs changes and test fixes, maybe might pass CI now...

@rhatdan
Copy link
Copy Markdown
Contributor

rhatdan commented May 9, 2016

I am fine with this, I like the idea, I think there is a chance of this being buggy where a seccomp filter continues to block access that would be allowed with the modified caps. But this at least gives users the ability to adjust the permissions based on --cap-add/--cap-drop without needing to drop seccomp filters all together or switching to --privileged.

@LK4D4
Copy link
Copy Markdown
Contributor

LK4D4 commented May 9, 2016

I think we can move it to code-review.

@justincormack
Copy link
Copy Markdown
Contributor Author

justincormack commented May 9, 2016

@rhatdan yes there are probably some cases, but I have been pretty broad with the added syscalls, eg SYS_ADMIN will allow mount and namespace creation, SYS_BOOT will allow reboot etc, as the aim is to avoid people having to disable seccomp. I can re-review eg SYS_ADMIN as there are a lot of calls that it allows.

@rhatdan
Copy link
Copy Markdown
Contributor

rhatdan commented May 9, 2016

Yes, but to give you an example of a potential bug from a user point of view. reboot inside of a container depending on the namespaces will only effect the PID1 of the container not necessarily reboot the host system.

@justincormack
Copy link
Copy Markdown
Contributor Author

justincormack commented May 10, 2016

There is an issue in the default seccomp profile tests with "panic: sync: negative WaitGroup counter" which I am looking into. @LK4D4 is that what you saw elsewhere? that looks like me being dumb, will push a fix.

@justincormack
Copy link
Copy Markdown
Contributor Author

1 test still failing, will examine.

Currently the default seccomp profile is fixed. This changes it
so that it varies depending on the Linux capabilities selected with
the --cap-add and --cap-drop options. Without this, if a user adds
privileges, eg to allow ptrace with --cap-add sys_ptrace then still
cannot actually use ptrace as it is still blocked by seccomp, so
they will probably disable seccomp or use --privileged. With this
change the syscalls that are needed for the capability are also
allowed by the seccomp profile based on the selected capabilities.

While this patch makes it easier to do things with for example
cap_sys_admin enabled, as it will now allow creating new namespaces
and use of mount, it still allows less than --cap-add cap_sys_admin
--security-opt seccomp:unconfined would have previously. It is not
recommended that users run containers with cap_sys_admin as this does
give full access to the host machine.

It also cleans up some architecture specific system calls to be
only selected when needed.

Signed-off-by: Justin Cormack <[email protected]>
@justincormack
Copy link
Copy Markdown
Contributor Author

Rebase/rebuild seems to have fixed the test failure, which I could not replicate, maybe that is the occasional failure @LK4D4 was seeing before?

@thaJeztah
Copy link
Copy Markdown
Member

sigh..

E: Failed to fetch http://cdn-fastly.deb.debian.org/debian/pool/main/g/gcc-4.9/gcc-4.9_4.9.2-10_amd64.deb  503  backend read error

@thaJeztah thaJeztah removed the status/failing-ci Indicates that the PR in its current state fails the test suite label May 12, 2016
@justincormack
Copy link
Copy Markdown
Contributor Author

retest this please

@thaJeztah
Copy link
Copy Markdown
Member

It's green now

ping @jfrazelle @LK4D4 PTAL 👍

@jessfraz
Copy link
Copy Markdown
Contributor

looks good to me! I like the tests :)

@LK4D4
Copy link
Copy Markdown
Contributor

LK4D4 commented May 12, 2016

LGTM

@thaJeztah thaJeztah added this to the 1.12.0 milestone May 13, 2016
@crosbymichael
Copy link
Copy Markdown
Contributor

LGTM

@SvenDowideit
Copy link
Copy Markdown
Contributor

docs LGTM - @thaJeztah

@thaJeztah
Copy link
Copy Markdown
Member

oh! forgot to add my LGTM. Here it is \o/

@thaJeztah thaJeztah merged commit 0e9009b into moby:master May 20, 2016
justincormack added a commit to justincormack/docker that referenced this pull request May 25, 2016
In moby#22554 I aligned seccomp and capabilities, however the case of
the chown calls and CAP_CHOWN was less clearcut, as these are
simple calls that the capabilities will block if they are not
allowed. They are needed when no new privileges is not set in
order to allow docker to call chown before the container is
started, so there was a workaround but this did not include
all the chown syscalls, and Arm was failing on some seccomp
tests because it was using a different syscall from just the
fchown that was allowed in this case. It is simpler to just
allow all the chown calls in the default seccomp profile and
let the capabilities subsystem block them.

Signed-off-by: Justin Cormack <[email protected]>
liusdu pushed a commit to liusdu/moby that referenced this pull request Oct 30, 2017
Currently the default seccomp profile is fixed. This changes it
so that it varies depending on the Linux capabilities selected with
the --cap-add and --cap-drop options. Without this, if a user adds
privileges, eg to allow ptrace with --cap-add sys_ptrace then still
cannot actually use ptrace as it is still blocked by seccomp, so
they will probably disable seccomp or use --privileged. With this
change the syscalls that are needed for the capability are also
allowed by the seccomp profile based on the selected capabilities.

While this patch makes it easier to do things with for example
cap_sys_admin enabled, as it will now allow creating new namespaces
and use of mount, it still allows less than --cap-add cap_sys_admin
--security-opt seccomp:unconfined would have previously. It is not
recommended that users run containers with cap_sys_admin as this does
give full access to the host machine.

It also cleans up some architecture specific system calls to be
only selected when needed.

cherry-pick from moby#22554

Signed-off-by: Justin Cormack <[email protected]>
Signed-off-by: yangshukui <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants