-
Notifications
You must be signed in to change notification settings - Fork 151
Pool implementation is not thread safe #891
Description
PoolInner's unsafe implementation of Sync is unsound and can cause runtime borrow check panic.
Line 79 in 7230657
| unsafe impl Sync for PoolInner {} |
The PoolInner struct has a vec of PoolPlugins:
Lines 42 to 44 in 7230657
| /// `PoolPlugin` is used by the pool to track the number of live instances of a particular plugin | |
| #[derive(Clone, Debug)] | |
| pub struct PoolPlugin(std::rc::Rc<std::cell::RefCell<Plugin>>); |
Lines 73 to 76 in 7230657
| struct PoolInner { | |
| plugin_source: Box<PluginSource>, | |
| instances: Vec<PoolPlugin>, | |
| } |
As commented in the code, "PoolPlugin is used by the pool to track the number of live instances of a particular plugin". This tracking is done by checking the Rc refcount. This means that the code has to carefully lock around every read or modification of the Rc to soundly implement Sync.
One problem, however, is that PoolPlugin does not implement its own Drop. This means that when a PoolPlugin is dropped, the refcount decrement is not synchronized, which can result in a runtime borrow check panic. (In fact, PoolPlugin has no internal synchronization; it relies entirely on external locking. This means there's no trivial fix to synchronize just the Drop - the synchronization strategy needs to change fundamentally (e.g., replacing Rc<RefCell<Plugin>> with Arc<Mutex<Plugin>>)).
I found this issue when writing benchmarks for my project that uses Extism pools. The panic is intermittent, but it manifests like this:
thread 'tokio-runtime-worker' (51141255) panicked at crates/extism/runtime/src/pool.rs:53:16:
RefCell already borrowed
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' (51141232) panicked at crates/executor/benches/plugin_execution.rs:157:55:
Task panicked: JoinError::Panic(Id(339173), "RefCell already borrowed", ...)
I wasn't able to create a simple Extism unit test or a simple independent test that reproduces the issue. (I could put more effort into this if requested), but I implemented a fix using Arc<Mutex<Plugin>> instead of Rc<RefCell<Plugin>> in pool.rs and verified that I can no longer reproduce the panic, and that the change doesn't cause any measurable perf penalty in my own benchmarks. I can submit my fix as a PR if that sounds like an acceptable fix.