Skip to content

Fix pool thread safety issues#892

Closed
bgmerrell wants to merge 3 commits intoextism:mainfrom
bgmerrell:bgmerrell/fix_891
Closed

Fix pool thread safety issues#892
bgmerrell wants to merge 3 commits intoextism:mainfrom
bgmerrell:bgmerrell/fix_891

Conversation

@bgmerrell
Copy link
Copy Markdown
Contributor

@bgmerrell bgmerrell commented Feb 11, 2026

This is an alternative to #893

The pool implementation had an unsound unsafe impl Sync on PoolInner containing Vec<PoolPlugin> where PoolPlugin was Rc<RefCell<Plugin>>. This caused intermittent "RefCell already borrowed" panics in multi-threaded usage (issue #891).

The problem with the original design was that it seemed to assumed external locking (the pool's Mutex) would be sufficient to make Rc<RefCell<>> safe to share across threads. However, there were several issues that made the design unsound:

  1. PoolPlugin had no Drop implementation to synchronize refcount decrements. When dropped on different threads, unsynchronized Rc refcount modifications could lead to data races.
  2. The pool used Rc::strong_count == 1 to track availability, but had no control over when clones were created or destroyed. Callers could clone and keep PoolPlugin instances, breaking the availability tracking.
  3. The PoolPlugin type didn't leverage the type system to enforce safe usage patterns; it relied entirely on external locking that the type system couldn't verify.

This replaces the unsafe implementation with proper thread-safe primitives and move semantics inspired by the deadpool crate.

Changes:

  • Replace shared Rc<RefCell<Plugin>> with move semantics: plugins are moved out of a VecDeque when checked out and moved back on drop
  • PoolPlugin now owns the plugin and implements Drop to automatically return it, making checkout/return atomic and type-safe.
  • Add Send bound to UserDataHandle::Rust and add T: Send requirements to
    UserData::new() and Function::new(). Previously, UserData<T> was marked
    Send via unsafe impl<T> Send regardless of whether T was Send. When
    converting UserData<T> to UserDataHandle::Rust, the type is erased to
    Arc<Mutex<dyn Any>>. Adding + Send to the trait object preserves the Send
    constraint through type erasure, but the compiler enforces that the cast is only valid
    when T: Send. The explicit bounds ensure this is checked at the API boundaries.
    This makes PluginBuilder with custom functions properly Send, enabling
    pools with custom host functions. I don't know if this is OK, I think there is likely
    something in I am not fully grokking in the current design with these unsafe Send+Sync impls
  • Replace busy-wait polling with Condvar for efficient waiting
  • PoolPlugin implements Deref/DerefMut for ergonomic access
  • Use Weak reference to pool so checked-out plugins don't keep pool alive
  • Remove all unsafe impl Send/Sync. Soundness is now enforced by the type system.
  • Fix function_exists to handle timeout gracefully instead of panicking.
  • Add test for pools with captured PluginBuilder containing custom functions.

Breaking API changes:

  • PoolPlugin no longer has a .plugin() method - it now implements
    Deref/DerefMut, so you can use it directly where you previously
    called .plugin(). Example: pool.get()?.call(...) instead of
    pool.get()?.plugin().call(...)
  • PoolPlugin::call() now takes &mut self instead of &self (direct
    ownership instead of interior mutability via RefCell)
  • UserData types must be Send (previously claimed via unsafe impl
    but not enforced through type erasure to dyn Any)

Fixes #891

The pool implementation had an unsound `unsafe impl Sync` on `PoolInner`
containing `Vec<PoolPlugin>` where `PoolPlugin` was `Rc<RefCell<Plugin>>`.
This caused intermittent "RefCell already borrowed" panics in multi-threaded
usage (issue extism#891).

The problem with the original design was that it seemed to assumed external
locking (the pool's `Mutex`) would be sufficient to make `Rc<RefCell<>>` safe
to share across threads. However, there were several issues that made the
design unsound:

1. `PoolPlugin` had no `Drop` implementation to synchronize refcount
   decrements. When dropped on different threads, unsynchronized `Rc`
   refcount modifications could lead to data races.
2. The pool used `Rc::strong_count == 1` to track availability, but had no
   control over when clones were created or destroyed. Callers could clone
   and keep `PoolPlugin` instances, breaking the availability tracking.
3. The `PoolPlugin` type didn't leverage the type system to enforce safe
   usage patterns; it relied entirely on external locking that the type
   system couldn't verify.

This replaces the unsafe implementation with proper thread-safe primitives
and move semantics inspired by the deadpool crate.

Changes:

- Replace shared `Rc<RefCell<Plugin>>` with move semantics: plugins are
  moved out of a `VecDeque` when checked out and moved back on drop
- `PoolPlugin` now owns the plugin and implements `Drop` to automatically
  return it, making checkout/return atomic and type-safe.
- Add `Send` bound to `UserDataHandle::Rust` (`dyn Any` → `dyn Any + Send`).
  `UserData<T>` already had the `Send` bound, but when converted to
  `UserDataHandle::Rust` it was being removed during type erasure. This
  was necessary to make `PluginBuilder` with custom functions `Send`,
  enabling pools with custom host functions
- Replace busy-wait polling with `Condvar` for efficient waiting
- `PoolPlugin` implements `Deref`/`DerefMut` for ergonomic access
- Use `Weak` reference to pool so checked-out plugins don't keep pool alive
- Remove all `unsafe impl Send/Sync`. Soundness is now enforced by the
  type system.
- Fix `function_exists` to handle timeout gracefully instead of panicking.
- Add test for pools with captured `PluginBuilder` containing custom functions.

Breaking API changes:

- `PoolPlugin` no longer has a `.plugin()` method - it now implements
  `Deref`/`DerefMut`, so you can use it directly where you previously
  called `.plugin()`. Example: `pool.get()?.call(...)` instead of
  `pool.get()?.plugin().call(...)`
- `PoolPlugin::call()` now takes `&mut self` instead of `&self` (direct
  ownership instead of interior mutability via `RefCell`)
- `UserData` types must be `Send` (previously claimed via `unsafe impl`
  but not enforced through type erasure to `dyn Any`)

Fixes extism#891
nilslice pushed a commit that referenced this pull request Mar 19, 2026
This is an alternative to #892 

The pool implementation had an unsound `unsafe impl Sync` on `PoolInner`
containing `Vec<PoolPlugin>` where `PoolPlugin` was
`Rc<RefCell<Plugin>>`. This caused intermittent "RefCell already
borrowed" panics in multi-threaded usage (issue #891).

The problem with the original design was that it assumed external
locking (the pool's `Mutex`) would be sufficient to make `Rc<RefCell<>>`
safe to share across threads. However, there were several issues that
made the design unsound:

1. `PoolPlugin` had no `Drop` implementation to synchronize refcount
decrements. When dropped on different threads, unsynchronized `Rc`
refcount modifications could lead to data races.
2. The pool used `Rc::strong_count == 1` to track availability, but had
no control over when clones were created or destroyed. Callers could
clone and keep `PoolPlugin` instances, breaking the availability
tracking.
3. The `PoolPlugin` type didn't leverage the type system to enforce safe
usage patterns; it relied entirely on external locking that the type
system couldn't verify.

This replaces the unsafe implementation with proper thread-safe
primitives and move semantics inspired by the deadpool crate.

Changes:

- Replace shared `Rc<RefCell<Plugin>>` with move semantics: plugins are
moved out of a `VecDeque` when checked out and moved back on drop
- `PoolPlugin` now owns the plugin and implements `Drop` to
automatically return it, making checkout/return atomic and type-safe
- Add `unsafe impl Send/Sync` for `UserDataHandle` to complete the chain
of unsafe assertions. This trusts that the existing `unsafe impl<T> Send
for UserData<T>` is sound (verified: `UserData::get()` only returns
`Arc<Mutex<T>>`, never extracts `T`, so it cannot be moved across
threads unsafely)
- Replace busy-wait polling with `Condvar` for efficient waiting
- `PoolPlugin` implements `Deref`/`DerefMut` for ergonomic access
- Use `Weak` reference to pool so checked-out plugins don't keep pool
alive
- Fix `function_exists` to handle timeout gracefully instead of
panicking
- Add test for pools with captured `PluginBuilder` containing custom
functions

Breaking API changes:

- `PoolPlugin` no longer has a `.plugin()` method - it now implements
  `Deref`/`DerefMut`, so you can use it directly where you previously
  called `.plugin()`. Example: `pool.get()?.call(...)` instead of
  `pool.get()?.plugin().call(...)`
- `PoolPlugin::call()` now takes `&mut self` instead of `&self` (direct
  ownership instead of interior mutability via `RefCell`)

Fixes #891
@bgmerrell
Copy link
Copy Markdown
Contributor Author

#893 was merged instead

@bgmerrell bgmerrell closed this Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pool implementation is not thread safe

1 participant