Skip to content

Pool implementation is not thread safe #891

@bgmerrell

Description

@bgmerrell

PoolInner's unsafe implementation of Sync is unsound and can cause runtime borrow check panic.

unsafe impl Sync for PoolInner {}

The PoolInner struct has a vec of PoolPlugins:

/// `PoolPlugin` is used by the pool to track the number of live instances of a particular plugin
#[derive(Clone, Debug)]
pub struct PoolPlugin(std::rc::Rc<std::cell::RefCell<Plugin>>);

struct PoolInner {
plugin_source: Box<PluginSource>,
instances: Vec<PoolPlugin>,
}

As commented in the code, "PoolPlugin is used by the pool to track the number of live instances of a particular plugin". This tracking is done by checking the Rc refcount. This means that the code has to carefully lock around every read or modification of the Rc to soundly implement Sync.

One problem, however, is that PoolPlugin does not implement its own Drop. This means that when a PoolPlugin is dropped, the refcount decrement is not synchronized, which can result in a runtime borrow check panic. (In fact, PoolPlugin has no internal synchronization; it relies entirely on external locking. This means there's no trivial fix to synchronize just the Drop - the synchronization strategy needs to change fundamentally (e.g., replacing Rc<RefCell<Plugin>> with Arc<Mutex<Plugin>>)).

I found this issue when writing benchmarks for my project that uses Extism pools. The panic is intermittent, but it manifests like this:

thread 'tokio-runtime-worker' (51141255) panicked at crates/extism/runtime/src/pool.rs:53:16:
RefCell already borrowed
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

thread 'main' (51141232) panicked at crates/executor/benches/plugin_execution.rs:157:55:
Task panicked: JoinError::Panic(Id(339173), "RefCell already borrowed", ...)

I wasn't able to create a simple Extism unit test or a simple independent test that reproduces the issue. (I could put more effort into this if requested), but I implemented a fix using Arc<Mutex<Plugin>> instead of Rc<RefCell<Plugin>> in pool.rs and verified that I can no longer reproduce the panic, and that the change doesn't cause any measurable perf penalty in my own benchmarks. I can submit my fix as a PR if that sounds like an acceptable fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions