feat(jailer): Linux defense-in-depth: user namespaces, AppArmor, bwrap mounts, seccomp#256
Merged
DorianZheng merged 11 commits intomainfrom Feb 16, 2026
Merged
Conversation
008c45e to
110ab3b
Compare
…tics Port Chrome's CanCreateProcessInNewUserNS() and CheckCloneNewUserErrno() from sandbox/linux/services/credentials.cc. Dual-probe approach: 1. Raw clone(CLONE_NEWUSER) for kernel-level errno diagnosis 2. bwrap --unshare-user for actual bwrap capability (handles AppArmor per-binary profiles where bwrap may work even if our clone fails) When bwrap fails, build_diagnostic() combines Chrome errno + sysctl detection to provide targeted fix commands for each scenario: - AppArmor restrict_unprivileged_userns (Ubuntu 23.10+) - kernel.unprivileged_userns_clone (Debian/older distros) - user.max_user_namespaces (RHEL/CentOS)
When bundled bwrap fails on Ubuntu 23.10+ with kernel.apparmor_restrict_unprivileged_userns=1, generate an AppArmor profile at ~/.boxlite/apparmor/boxlite-bwrap and include the `sudo apparmor_parser -r` command in the diagnostic. - Add apparmor.rs with generate_bwrap_profile() and write_bwrap_profile() - Profile mirrors Ubuntu's bwrap-userns-restrict with unique names (boxlite_bwrap/boxlite_unpriv_bwrap) to avoid collision - Caller in bwrap.rs computes apparmor_dir (Minimal Knowledge)
The bwrap sandbox was missing two critical mount categories: 1. ~/.boxlite/rootfs (ro) - VM init rootfs (Alpine bootstrap) 2. User volume host_paths - from BoxOptions.volumes Without the rootfs mount, libkrun couldn't boot the VM inside bwrap, causing the shim to exit immediately.
9dd5432 to
9699056
Compare
Add explicit unsafe block in credentials.rs for unsafe-op-in-unsafe-fn compliance and fix cargo fmt formatting in bwrap.rs.
Separate Go runtime syscalls from VMM seccomp filter using Linux's seccomp filter stacking semantics. Previously, the VMM filter contained ~100 syscalls (VMM + Go runtime combined). Now: - VMM filter: ~66 unique syscalls (original Firecracker + libkrun) - Gvproxy filter: ~106 unique syscalls (strict superset, includes Go runtime) Two-phase application in shim: 1. Apply gvproxy filter with TSYNC (before gvproxy creation) 2. Create gvproxy → Go threads inherit permissive filter 3. Stack VMM filter on main thread only (no TSYNC) 4. krun_start_enter → vCPU threads inherit both from main Stacked filters evaluate as intersection (most restrictive wins). Since VMM ⊂ gvproxy, effective filter on main/vCPU = VMM. Go threads keep only the gvproxy filter (more permissive). Without gvproxy, VMM filter applied with TSYNC (original behavior).
The Firecracker-derived VMM filter (45 syscalls) was fundamentally inadequate for libkrun on modern glibc (2.38+): 1. Missing modern glibc equivalents: glibc rewrites open→openat, stat→newfstatat, etc. The filter had legacy names but not modern ones. 2. Missing libkrun runtime syscalls: libkrun needs mprotect, bind, listen, clone3, pread64, etc. for VM setup and operation. 3. Missing thread init syscalls: vCPU threads created after seccomp need set_tid_address, rseq, arch_prctl for pthread initialization. 4. Insufficient ioctl coverage: Firecracker used 8 specific KVM ioctls; libkrun requires 28+ for VM creation and vCPU management. Adds 47 entries to VMM filter (86 unique syscalls, up from 45). Verified: vmm ⊂ gvproxy (all VMM syscalls are in gvproxy superset). This fixes a pre-existing SIGSYS crash (exit code 159) on Linux when seccomp is enabled — the shim was killed on the first openat() call.
Remove unused imports (SecurityOptions, FilesystemLayout) from linux/mod.rs and add explicit unsafe block in credentials.rs for Rust 2024 edition compliance (unsafe-op-in-unsafe-fn).
…SYNC The VMM filter was expanded to 106 syscalls covering both libkrun and Go runtime needs. The gvproxy filter (107 syscalls) was a strict superset differing only by the `seccomp` syscall needed for two-phase stacking. With a single TSYNC application after gvproxy creation, the `seccomp` syscall is no longer needed and the gvproxy filter becomes redundant. - Remove gvproxy section from seccomp JSON (~650 lines) - Remove SeccompRole::Gvproxy variant and apply_gvproxy_filter() - Simplify apply_vmm_filter() to always use TSYNC (remove tsync param) - Remove two-phase stacking logic from shim main.rs
…helper The previous commit removed `linux::apply_isolation()` but missed updating the `PlatformIsolation` trait impl that called it. Inline the logic directly: call `apply_vmm_filter()` when seccomp is enabled.
The `layout: &FilesystemLayout` parameter was unused in all three platform implementations (Linux, macOS, Unsupported). No external code called this trait method. Clean up the dead parameter.
9699056 to
60fe6f8
Compare
Save pre-modification Firecracker-derived filters as *.original.json for reference. Add TODO noting the current VMM filter is intentionally broad (all arg restrictions removed) and should be tightened once libkrun's actual syscall arg patterns are profiled.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan
cargo clippy -p boxlite -p boxlite-python -p boxlite-node -- -D warnings)cargo test -p boxlite-node— 5/5)cargo fmt --check)