Skip to content

Conversation

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Aug 24, 2020

  • seccomp: allow add preadv2 and pwritev2 syscalls (equivalent of Seccomp Update moby/moby#31620)

  • seccomp: allow adjtimex get time operation (equivalent of Whitelisting adjtimex get time operation and requiring CAP_SYS_TIME only in case of adjustment moby/moby#33403)
    Enabled adjtimex in the default profile without requiring CAP_SYS_TIME privilege. The kernel will check CAP_SYS_TIME and won't allow setting the time.

    • Fixes: Getting the system time with ntptime returns an error in an unprivileged container
  • seccomp: allow syscall membarrier (equivalent of seccomp: allow syscall membarrier moby/moby#40731)
    Add the membarrier syscall to the default seccomp profile.
    It is for example used in the implementation of dlopen() in
    the musl libc of Alpine images.

  • seccomp: allow personality with UNAME26 bit set (equivalent of seccomp: Allow personality with UNAME26 bit set. moby/moby#32965)
    From personality(2):

      Have uname(2) report a 2.6.40+ version number rather than a 3.x version
      number.  Added as a stopgap measure to support broken applications that
      could not handle the  kernel  version-numbering  switch  from 2.6.x to 3.x.
    

    This allows both "UNAME26|PER_LINUX" and "UNAME26|PER_LINUX32".

    • Fixes: "setarch broken in docker packages from Debian stretch"
  • seccomp: allow sync_file_range2 on supported architectures (equivalent of Allow sync_file_range2 on supported architectures. moby/moby#30971)
    On a ppc64le host, running postgres (tried with 9.4 to 9.6) gives the following
    warning when trying to flush data to disks (which happens very frequently):

       WARNING: could not flush dirty data: Operation not permitted.
    

    A quick dig in postgres source code indicate it uses sync_file_range(2) to
    flush data; which on ppe64le and arm64 is translated to sync_file_range2(2)
    for alignements reasons.

    The profile did not allow sync_file_range2(2), making postgres sad because
    it can not flush its buffers. arm_sync_file_range(2) is an ancient alias to
    sync_file_range2(2), the syscall was renamed in Linux 2.6.22 when the same
    syscall was added for PowerPC.

  • seccomp: allow quotactl with CAP_SYS_ADMIN (equivalent of seccomp: whitelist quotactl with CAP_SYS_ADMIN moby/moby#34445)
    This allows the quotactl syscall in the default seccomp profile, gated by
    CAP_SYS_ADMIN.

  • seccomp: allow clock_settime when CAP_SYS_TIME is added (equivalent of profiles: seccomp: allow clock_settime when CAP_SYS_TIME is added moby/moby#31966)

Enabled adjtimex in the default profile without requiring CAP_SYS_TIME privilege.
The kernel will check CAP_SYS_TIME and won't allow setting the time.

Fixes: Getting the system time with ntptime returns an error in an unprivileged
container

To verify, inside a CentOS 7 container:

    yum install -y ntp
    ntptime
    # ntp_gettime() returns code 0 (OK)

    ntpdate -v time.nist.gov
    # ntpdate[84]: Can't adjust the time of day: Operation not permitted

Signed-off-by: Sebastiaan van Stijn <[email protected]>
Add the membarrier syscall to the default seccomp profile.
It is for example used in the implementation of dlopen() in
the musl libc of Alpine images.

Signed-off-by: Sebastiaan van Stijn <[email protected]>
@thaJeztah
Copy link
Member Author

@AkihiroSuda @justincormack ptal

@thaJeztah
Copy link
Member Author

Seeing some more diffs; let me have a quick look at those

From personality(2):

    Have uname(2) report a 2.6.40+ version number rather than a 3.x version
    number.  Added as a stopgap measure to support broken applications that
    could not handle the  kernel  version-numbering  switch  from 2.6.x to 3.x.

This allows both "UNAME26|PER_LINUX" and "UNAME26|PER_LINUX32".

Fixes: "setarch broken in docker packages from Debian stretch"

Signed-off-by: Sebastiaan van Stijn <[email protected]>
@theopenlab-ci
Copy link

theopenlab-ci bot commented Aug 24, 2020

Build succeeded.

On a ppc64le host, running postgres (tried with 9.4 to 9.6) gives the following
warning when trying to flush data to disks (which happens very frequently):

     WARNING: could not flush dirty data: Operation not permitted.

A quick dig in postgres source code indicate it uses sync_file_range(2) to
flush data; which on ppe64le and arm64 is translated to sync_file_range2(2)
for alignements reasons.

The profile did not allow sync_file_range2(2), making postgres sad because
it can not flush its buffers. arm_sync_file_range(2) is an ancient alias to
sync_file_range2(2), the syscall was renamed in Linux 2.6.22 when the same
syscall was added for PowerPC.

Signed-off-by: Sebastiaan van Stijn <[email protected]>
This allows the quotactl syscall in the default seccomp profile, gated by
CAP_SYS_ADMIN.

Signed-off-by: Sebastiaan van Stijn <[email protected]>
@thaJeztah
Copy link
Member Author

@AkihiroSuda added more commits (and updated description) PTAL

@theopenlab-ci
Copy link

theopenlab-ci bot commented Aug 24, 2020

Build succeeded.

Copy link
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@crosbymichael
Copy link
Member

LGTM

@crosbymichael crosbymichael merged commit 40ce36f into containerd:master Aug 25, 2020
@thaJeztah thaJeztah deleted the seccomp_updates branch August 26, 2020 08:37
@AkihiroSuda AkihiroSuda added cherry-picked/1.4.x PR commits are cherry picked into the release/1.4 branch and removed cherry-pick/1.4.x labels Aug 26, 2020
kevpar added a commit to kevpar/containerd that referenced this pull request Oct 26, 2020
containerd 1.4.1

Welcome to the v1.4.1 release of containerd!

The first patch release for `containerd` 1.4 includes a fix for v1 shims hanging
on exit and exec when the log pipe fills up along with other minor changes.

* Always consume shim logs to prevent logs in the shim from blocking [containerd#4546](containerd#4546)
* Fix error deleting v2 bundle directory when removing rootfs returns `ErrNotExist` [containerd#4472](containerd#4472)
* Fix metrics monitoring of v2 runtime tasks [containerd#4486](containerd#4486)
* Fix incorrect stat for Windows containers [containerd#4468](containerd#4468)
* Fix devmapper device deletion on rollback [containerd#4437](containerd#4437)
* Update seccomp default profile [containerd#4481](containerd#4481) [containerd#4491](containerd#4491) [containerd#4492](containerd#4492) [containerd#4493](containerd#4493)

Please try out the release binaries and report any issues at
https://github.com/containerd/containerd/issues.

* Sebastiaan van Stijn
* Derek McGowan
* Wei Fu
* Brian Goff
* Akihiro Suda
* Antonio Ojea
* Jintao Zhang
* Phil Estes
* Kazuyoshi Kato
* Li Yuxuan
* Mike Brown
* Prashant Bhutani
<details><summary>36 commits</summary>
<p>

* [`c623d1b3`](containerd@c623d1b) Merge pull request  [containerd#4564](containerd#4564) from dmcgowan/prepare-1.4.1
* [`97d690d2`](containerd@97d690d) Prepare v1.4.1 release
* [`910da2fb`](containerd@910da2f) Merge pull request  [containerd#4555](containerd#4555) from thaJeztah/1.4_backport_bumpcni
* [`ca3b91d8`](containerd@ca3b91d) Merge pull request  [containerd#4560](containerd#4560) from dmcgowan/backport-4546
* [`42f38718`](containerd@42f3871) Always consume shim logs
* [`ea29a60a`](containerd@ea29a60) Merge pull request  [containerd#4558](containerd#4558) from thaJeztah/1.4_backport_winstats
* [`db931948`](containerd@db93194) Merge pull request  [containerd#4557](containerd#4557) from thaJeztah/1.4_backport_makefile_test_tags
* [`9b5066aa`](containerd@9b5066a) Merge pull request  [containerd#4556](containerd#4556) from thaJeztah/1.4_backport_fix_static_plugin
* [`3bcce819`](containerd@3bcce81) Merge pull request  [containerd#4554](containerd#4554) from thaJeztah/1.4_backport_add_openat2_syscall
* [`98a733e0`](containerd@98a733e) Merge pull request  [containerd#4552](containerd#4552) from thaJeztah/1.4_backport_shim_exec_p_debug
* [`f247618a`](containerd@f247618) Report correct stats for windows containers
* [`cc5d1518`](containerd@cc5d151) Update go list to respect build tags
* [`086e859d`](containerd@086e859) BUILDING.md: fix description about static builds
* [`16712ae4`](containerd@16712ae) bump cni version to v0.8.0
* [`1575c88c`](containerd@1575c88) seccomp: add `faccessat2` syscall.
* [`8bd2bece`](containerd@8bd2bec) seccomp: add `openat2` syscall.
* [`4e3397e0`](containerd@4e3397e) shimv1: downgrade poroccess missing log to debug
* [`6b5fc7f2`](containerd@6b5fc7f) Merge pull request  [containerd#4542](containerd#4542) from thaJeztah/1.4_backport_forward_signal_not_found
* [`d118c90d`](containerd@d118c90) Ignore SIGURG signals in signal forwarder
* [`3ee6189f`](containerd@3ee6189) Exit signal forward if process not found
* [`1a367762`](containerd@1a36776) Merge pull request  [containerd#4512](containerd#4512) from fuweid/14-cherry-pick-4486
* [`a1289d6b`](containerd@a1289d6) tasks: Monitor v2 tasks in initFunc as well
* [`12f20c99`](containerd@12f20c9) Merge pull request  [containerd#4503](containerd#4503) from thaJeztah/1.4_backport_seccomp_updates
* [`1f823f76`](containerd@1f823f7) seccomp: allow io-uring related system calls
* [`3d28944b`](containerd@3d28944) seccomp: allow clock_settime when CAP_SYS_TIME is added
* [`e5cc7d52`](containerd@e5cc7d5) seccomp: allow quotactl with CAP_SYS_ADMIN
* [`20273a80`](containerd@20273a8) seccomp: allow sync_file_range2 on supported architectures.
* [`357d1002`](containerd@357d100) seccomp: allow personality with UNAME26 bit set
* [`0c9de662`](containerd@0c9de66) seccomp: allow syscall membarrier
* [`caa46116`](containerd@caa4611) seccomp: allow adjtimex get time operation
* [`2b80b7dc`](containerd@2b80b7d) seccomp: allow add preadv2 and pwritev2 syscalls
* [`e71eccbc`](containerd@e71eccb) seccomp: move the syslog syscall to be gated by CAP_SYS_ADMIN or CAP_SYSLOG
* [`881db9b5`](containerd@881db9b) Merge pull request  [containerd#4499](containerd#4499) from fuweid/cherry-pick-4472
* [`feff914a`](containerd@feff914) runtime: ignore ErrNotExist when remove rootfs
* [`94c8bd94`](containerd@94c8bd9) Merge pull request  [containerd#4496](containerd#4496) from kzys/backport-1.4-4437
* [`23e0ea27`](containerd@23e0ea2) snapshots/devmapper: fix rollback
</p>
</details>
<details><summary>4 commits</summary>
<p>

* [`8fbf363`](containerd/go-cni@8fbf363) Merge pull request  [containerd#56](containerd/go-cni#56) from aojea/bumpcni
* [`49657db`](containerd/go-cni@49657db) bump containernetworking/cni dependency to 0.8.0
* [`1582593`](containerd/go-cni@1582593) Merge pull request  [containerd#58](containerd/go-cni#58) from fuweid/update-readme-usage
* [`8ffba88`](containerd/go-cni@8ffba88) README.md: update Usage case
</p>
</details>

* **github.com/containerd/go-cni**            v1.0.0 -> v1.0.1
* **github.com/containernetworking/cni**      v0.7.1 -> v0.8.0
* **github.com/containernetworking/plugins**  v0.7.6 -> v0.8.6

Previous release can be found at [v1.4.0](https://github.com/containerd/containerd/releases/tag/v1.4.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked/1.4.x PR commits are cherry picked into the release/1.4 branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants