Skip to content

Memory unsafety at libc/kernel boundary via argv overread in unix impl of std::process::Command #155748

@lopopolo

Description

@lopopolo

There is a soundness and memory safety issue in the Unix implementation of std::process::Command caused by panic-unsafety in CStringArray's argv maintenance.

This finding was discovered by OpenAI Codex, validated by @lopopolo. We have been scanning the Rust toolchain using RUSTSEC-2026-0078 as a seed for partially committed state corruption after unwind.

The issue requires a caught unwind during Command::arg at an allocation edge, followed by continued use of the recovered Command. On the validated toolchain, the recovered object survives with a malformed unterminated argv, and later safe process creation fails with os error 14 / EFAULT.

Root cause is panic-unsafety in the Unix CStringArray::push helper. It overwrites the trailing null sentinel first and only afterwards appends a replacement null pointer. If allocation failure reaches handle_alloc_error in that second step and the caller catches the unwind, the recovered Command keeps an unterminated argv. Later Command::output() / spawn() passes that malformed pointer array into posix_spawn / execvp, causing the libc/kernel process-spawn path to walk past the end of the allocation looking for a null sentinel.

Below is a proof that allocates a two-page mapping, places an unterminated one-pointer argv array at the end of the first page, marks the second page PROT_NONE, and calls posix_spawn. That test confirms the boundary behavior by checking that posix_spawn returns EFAULT when it reads past the end of the pointer array.

#![cfg_attr(all(test, unix), feature(alloc_error_hook))]
#![cfg_attr(not(test), allow(dead_code, unused_imports))]

//! Repro for a Unix `std::process::Command` unwind-safety bug.
//!
//! The vulnerable shape is a two-step mutation of the internal `argv` buffer:
//! `CStringArray::push` overwrites the trailing null sentinel first and only
//! afterwards appends a replacement null pointer. If allocation failure reaches
//! `handle_alloc_error` in that second step and the caller catches the unwind,
//! the recovered `Command` survives with a malformed argument vector. Later
//! safe process creation then hands that unterminated `argv` array to libc,
//! which can read past the end of the allocation looking for a null sentinel
//! and fail with `EFAULT`.

#[cfg(all(test, unix))]
use std::alloc::{set_alloc_error_hook, take_alloc_error_hook};
#[cfg(unix)]
use std::alloc::{GlobalAlloc, Layout, System};
#[cfg(all(test, unix))]
use std::mem::{align_of, size_of};
#[cfg(unix)]
use std::panic::{self, AssertUnwindSafe};
#[cfg(unix)]
use std::process::{Command, Stdio};
#[cfg(all(test, unix))]
use std::ptr;
#[cfg(unix)]
use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};

#[cfg(unix)]
const VALIDATED_TARGET: usize = 2;
#[cfg(unix)]
const CHILD_ENV: &str = "RUST_STD_COMMAND_UNWIND_REPRO_CHILD";

#[cfg(unix)]
struct FailingAlloc;

#[cfg(unix)]
static ALLOCATOR_ARMED: AtomicBool = AtomicBool::new(false);
#[cfg(unix)]
static ALLOCATION_COUNT: AtomicUsize = AtomicUsize::new(0);
#[cfg(unix)]
static FAIL_ON_ALLOCATION: AtomicUsize = AtomicUsize::new(usize::MAX);

#[cfg(unix)]
#[global_allocator]
static GLOBAL_ALLOCATOR: FailingAlloc = FailingAlloc;

#[cfg(unix)]
unsafe impl GlobalAlloc for FailingAlloc {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        if should_fail_allocation() {
            return std::ptr::null_mut();
        }
        unsafe { System.alloc(layout) }
    }

    unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 {
        if should_fail_allocation() {
            return std::ptr::null_mut();
        }
        unsafe { System.alloc_zeroed(layout) }
    }

    unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8 {
        let _ = new_size.max(layout.size());
        if should_fail_allocation() {
            return std::ptr::null_mut();
        }
        unsafe { System.realloc(ptr, layout, new_size) }
    }

    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        unsafe { System.dealloc(ptr, layout) }
    }
}

#[cfg(unix)]
#[inline(never)]
fn should_fail_allocation() -> bool {
    if !ALLOCATOR_ARMED.load(Ordering::Relaxed) {
        return false;
    }

    let allocation = ALLOCATION_COUNT.fetch_add(1, Ordering::Relaxed) + 1;
    if allocation == FAIL_ON_ALLOCATION.load(Ordering::Relaxed) {
        ALLOCATOR_ARMED.store(false, Ordering::Relaxed);
        return true;
    }
    false
}

#[cfg(all(test, unix))]
#[derive(Debug)]
struct ProbeReport {
    target: usize,
    arg_panic_caught: bool,
    baseline_args: Vec<String>,
    args_visible_after_catch: Vec<String>,
    spawn_succeeded: bool,
    raw_os_error: Option<i32>,
}

#[cfg(all(test, unix))]
fn probe(target: usize) -> ProbeReport {
    let current_exe = std::env::current_exe().expect("current test executable path");
    let mut cmd = Command::new(current_exe);
    cmd.env(CHILD_ENV, "1");
    cmd.stdout(Stdio::piped());
    cmd.stderr(Stdio::inherit());
    let baseline_args = cmd
        .get_args()
        .map(|arg| arg.to_string_lossy().into_owned())
        .collect();

    let panic_hook_guard = PanicHookGuard::install();
    let alloc_error_hook_guard = AllocErrorHookGuard::install();

    ALLOCATION_COUNT.store(0, Ordering::Relaxed);
    FAIL_ON_ALLOCATION.store(target, Ordering::Relaxed);
    ALLOCATOR_ARMED.store(true, Ordering::Relaxed);
    let arg_panic_caught = panic::catch_unwind(AssertUnwindSafe(|| {
        cmd.arg("A");
    }))
    .is_err();
    ALLOCATOR_ARMED.store(false, Ordering::Relaxed);

    drop(alloc_error_hook_guard);
    drop(panic_hook_guard);

    let args_visible_after_catch = cmd
        .get_args()
        .map(|arg| arg.to_string_lossy().into_owned())
        .collect();
    let (spawn_succeeded, raw_os_error) = match cmd.output() {
        Ok(_) => (true, None),
        Err(err) => (false, err.raw_os_error()),
    };

    ProbeReport {
        target,
        arg_panic_caught,
        baseline_args,
        args_visible_after_catch,
        spawn_succeeded,
        raw_os_error,
    }
}

#[cfg(all(test, unix))]
struct PanicHookGuard(Option<Box<dyn Fn(&std::panic::PanicHookInfo<'_>) + Sync + Send + 'static>>);

#[cfg(all(test, unix))]
impl PanicHookGuard {
    fn install() -> Self {
        let previous_hook = std::panic::take_hook();
        std::panic::set_hook(Box::new(|_| {}));
        Self(Some(previous_hook))
    }
}

#[cfg(all(test, unix))]
impl Drop for PanicHookGuard {
    fn drop(&mut self) {
        if let Some(previous_hook) = self.0.take() {
            std::panic::set_hook(previous_hook);
        }
    }
}

#[cfg(all(test, unix))]
struct AllocErrorHookGuard(fn(Layout));

#[cfg(all(test, unix))]
impl AllocErrorHookGuard {
    fn install() -> Self {
        let previous_hook = take_alloc_error_hook();
        set_alloc_error_hook(|_| panic!("reproducer alloc error"));
        Self(previous_hook)
    }
}

#[cfg(all(test, unix))]
impl Drop for AllocErrorHookGuard {
    fn drop(&mut self) {
        set_alloc_error_hook(self.0);
    }
}

#[cfg(all(test, unix))]
struct MappingGuard {
    base: *mut libc::c_void,
    len: usize,
}

#[cfg(all(test, unix))]
impl Drop for MappingGuard {
    fn drop(&mut self) {
        unsafe {
            assert_eq!(libc::munmap(self.base, self.len), 0);
        }
    }
}

#[cfg(all(test, unix))]
fn run_child() {}

#[cfg(all(test, unix))]
#[test]
fn command_arg_keeps_argv_well_formed_after_alloc_error_unwind() {
    if std::env::var_os(CHILD_ENV).is_some() {
        run_child();
        return;
    }

    // Phase 1: hit the validated alloc-error unwind site inside `Command::arg`.
    let report = probe(VALIDATED_TARGET);

    // Phase 2: a correct implementation should catch the panic and roll the
    // failed argument append back out of the recovered `Command`.
    assert!(
        report.arg_panic_caught,
        "Command::arg should unwind through handle_alloc_error on the reproducing target"
    );
    assert!(
        report.args_visible_after_catch == report.baseline_args,
        "the recovered Command should keep its original argument list after the panic"
    );

    // Phase 3: a correct `Command` should still spawn successfully after the
    // caught panic because its `argv` terminator was restored.
    assert!(
        report.spawn_succeeded,
        "later safe process creation should still succeed after the caught panic; raw_os_error={:?}",
        report.raw_os_error
    );
    assert_eq!(
        report.target, VALIDATED_TARGET,
        "the regression should keep probing the validated allocator target"
    );
}

#[cfg(all(test, unix))]
#[test]
fn posix_spawn_faults_when_unterminated_argv_crosses_guard_page() {
    use std::ffi::CString;

    let program = CString::new("/usr/bin/true").expect("static executable path");
    let mut envp = [ptr::null_mut::<libc::c_char>()];

    let mut ok_argv = [program.as_ptr().cast_mut(), ptr::null_mut()];
    let mut pid = 0;
    let rc = unsafe {
        libc::posix_spawn(
            &mut pid,
            program.as_ptr(),
            ptr::null(),
            ptr::null(),
            ok_argv.as_mut_ptr(),
            envp.as_mut_ptr(),
        )
    };
    assert_eq!(rc, 0, "control spawn with a terminated argv should succeed");
    let mut status = 0;
    let waited = unsafe { libc::waitpid(pid, &mut status, 0) };
    assert_eq!(waited, pid, "control child should be waitable");

    let page_size = unsafe { libc::sysconf(libc::_SC_PAGESIZE) as usize };
    assert!(page_size >= size_of::<*mut libc::c_char>());

    let mapping_len = page_size * 2;
    let base = unsafe {
        libc::mmap(
            ptr::null_mut(),
            mapping_len,
            libc::PROT_READ | libc::PROT_WRITE,
            libc::MAP_PRIVATE | libc::MAP_ANON,
            -1,
            0,
        )
    };
    assert_ne!(base, libc::MAP_FAILED, "guard-page mapping should succeed");
    let mapping = MappingGuard { base, len: mapping_len };

    let second_page = unsafe { (mapping.base as *mut u8).add(page_size) } as *mut libc::c_void;
    let rc = unsafe { libc::mprotect(second_page, page_size, libc::PROT_NONE) };
    assert_eq!(rc, 0, "guard page should become inaccessible");

    let argv = unsafe {
        (mapping.base as *mut u8).add(page_size - size_of::<*mut libc::c_char>())
            as *mut *mut libc::c_char
    };
    assert_eq!((argv as usize) % align_of::<*mut libc::c_char>(), 0);

    unsafe {
        argv.write(program.as_ptr().cast_mut());
    }

    let mut pid = 0;
    let rc = unsafe {
        libc::posix_spawn(
            &mut pid,
            program.as_ptr(),
            ptr::null(),
            ptr::null(),
            argv,
            envp.as_mut_ptr(),
        )
    };
    assert_eq!(
        rc,
        libc::EFAULT,
        "unterminated argv at a guard page boundary should fault when libc/kernel reads past the array"
    );
}

The deterministic witness in the reproducer is nightly-only because it uses alloc_error_hook to turn handle_alloc_error into a catchable unwind after a custom global allocator temporarily returns null. It does not rely on unwinding from GlobalAlloc.

The straightforward mitigation appears to be reserving capacity fallibly before overwriting the trailing null sentinel, for example via Vec::try_reserve(1), so allocation failure preserves the old argv contents and its null terminator.

Below is the failing nightly repro showing the recovered Command later hitting os error 14:

$ cargo +nightly test -p rust_std_command_unwind_repro command_arg_keeps_argv_well_formed_after_alloc_error_unwind -- --exact --nocapture
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.04s
     Running unittests src/lib.rs (target/debug/deps/rust_std_command_unwind_repro-93af15781c654a54)

running 1 test

thread 'command_arg_keeps_argv_well_formed_after_alloc_error_unwind' (1770157) panicked at rust-std-command-unwind-repro/src/lib.rs:232:5:
later safe process creation should still succeed after the caught panic; raw_os_error=Some(14)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test command_arg_keeps_argv_well_formed_after_alloc_error_unwind ... FAILED

failures:

failures:
    command_arg_keeps_argv_well_formed_after_alloc_error_unwind

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 1 filtered out; finished in 0.00s

Below is the companion guard-page proof command. The test passes because it asserts that posix_spawn returns EFAULT when given an unterminated argv whose next pointer slot lies in an inaccessible page:

$ cargo +nightly test -p rust_std_command_unwind_repro posix_spawn_faults_when_unterminated_argv_crosses_guard_page -- --exact --nocapture
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.04s
     Running unittests src/lib.rs (target/debug/deps/rust_std_command_unwind_repro-93af15781c654a54)

running 1 test
test posix_spawn_faults_when_unterminated_argv_crosses_guard_page ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 1 filtered out; finished in 0.00s

I previously discussed this with @cuviper and @emilyalbini on the Security WG where we agree that this is a soundness issue, but because it requires nightly APIs, we can triage and fix in public.

Metadata

Metadata

Assignees

Labels

C-bugCategory: This is a bug.I-unsoundIssue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundnessrequires-nightlyThis issue requires a nightly compiler in some way. When possible, use a F-* label instead.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions