Skip to content

Revert "seccomp: whitelist io-uring related system calls"#41223

Closed
AkihiroSuda wants to merge 1 commit intomasterfrom
revert-39415-master
Closed

Revert "seccomp: whitelist io-uring related system calls"#41223
AkihiroSuda wants to merge 1 commit intomasterfrom
revert-39415-master

Conversation

@AkihiroSuda
Copy link
Copy Markdown
Member

Reverts #39415

See #39415 (comment)

io_uring allows programs to execute certain syscalls without being limited by seccomp.

@cyphar
Copy link
Copy Markdown
Contributor

cyphar commented Jul 17, 2020

I don't think it's necessary to revert. The default profile allows everything that you can do with io_uring -- and custom profiles should be modified accordingly. I was more just surprised that this wasn't mentioned at the time. io_uring will grow support for more syscalls in the future so we should keep an eye on this.

@omegacoleman
Copy link
Copy Markdown
Contributor

I agree with cyphar. The topics were more about the ability to configure the filter for uring syscalls, not that it really created any security issue yet, especially not for the default config.

@thaJeztah thaJeztah deleted the revert-39415-master branch August 17, 2021 15:16
@giuseppe
Copy link
Copy Markdown
Contributor

we are in the process of discussing enabling io_uring for the Podman seccomp profile and I was kind of surprised this was enabled in the default configuration for Moby.

From the discussion we had (containers/common#1264), we believe that enabling it is not future-proof as we have no control over what syscalls might be enabled in io_uring in the future. On the other end, it would be nice if there are no such differences between Podman and Moby.

It is just a theoretical question at this point but how this would be future-proof as we have no control over the syscalls that will be added in the future to io_uring and could potentially be a problem when the current configuration is used on a future kernel? This differs from the model we are using for seccomp right now, that new syscalls default to ENOSYS and we cherry-pick the safe ones to enable.

@cyphar
Copy link
Copy Markdown
Contributor

cyphar commented Dec 18, 2022

On the other hand, if any syscall adds a feature that opens a potential security issue (openat(O_GIVEMEROOT) for instance 😉), we would need to react to it in the same way -- releasing a new version of Docker which disallows that flag. Though in the case of io_uring, we cannot use seccomp to restrict the bits of kernel VFS it hits during execution.

I spoke to Jens Axboe at LPC and he said that in general they would be open to having a more capable restriction mechanism for io_uring (but for obvious reasons we cannot use seccomp for this) so this might be improved in the future.

(As an aside, If we do have to block io_uring-related syscalls we need to make them return -ENOSYS.)

@giuseppe
Copy link
Copy Markdown
Contributor

io_uring is a special one though, potentially every other syscall could be multiplexed by io_uring at some point, while the likelihood of a flag that changes completely the meaning of a syscall sounds less likely; or at least I hope the security implications of something like openat(O_GIVEMEROOT) would be considered during the review :-)

I wonder why the syscalls called by io_uring couldn't go through the same checks as the syscalls that are called directly so that the same seccomp profile could still work with io_uring without any change.

@thaJeztah
Copy link
Copy Markdown
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants