Skip to content

Fix pool thread safety issues (option 2)#893

Merged
nilslice merged 4 commits intoextism:mainfrom
bgmerrell:bgmerrell/fix_891_option2
Mar 19, 2026
Merged

Fix pool thread safety issues (option 2)#893
nilslice merged 4 commits intoextism:mainfrom
bgmerrell:bgmerrell/fix_891_option2

Conversation

@bgmerrell
Copy link
Copy Markdown
Contributor

@bgmerrell bgmerrell commented Feb 11, 2026

This is an alternative to #892

The pool implementation had an unsound unsafe impl Sync on PoolInner containing Vec<PoolPlugin> where PoolPlugin was Rc<RefCell<Plugin>>. This caused intermittent "RefCell already borrowed" panics in multi-threaded usage (issue #891).

The problem with the original design was that it assumed external locking (the pool's Mutex) would be sufficient to make Rc<RefCell<>> safe to share across threads. However, there were several issues that made the design unsound:

  1. PoolPlugin had no Drop implementation to synchronize refcount decrements. When dropped on different threads, unsynchronized Rc refcount modifications could lead to data races.
  2. The pool used Rc::strong_count == 1 to track availability, but had no control over when clones were created or destroyed. Callers could clone and keep PoolPlugin instances, breaking the availability tracking.
  3. The PoolPlugin type didn't leverage the type system to enforce safe usage patterns; it relied entirely on external locking that the type system couldn't verify.

This replaces the unsafe implementation with proper thread-safe primitives and move semantics inspired by the deadpool crate.

Changes:

  • Replace shared Rc<RefCell<Plugin>> with move semantics: plugins are moved out of a VecDeque when checked out and moved back on drop
  • PoolPlugin now owns the plugin and implements Drop to automatically return it, making checkout/return atomic and type-safe
  • Add unsafe impl Send/Sync for UserDataHandle to complete the chain of unsafe assertions. This trusts that the existing unsafe impl<T> Send for UserData<T> is sound (verified: UserData::get() only returns Arc<Mutex<T>>, never extracts T, so it cannot be moved across threads unsafely)
  • Replace busy-wait polling with Condvar for efficient waiting
  • PoolPlugin implements Deref/DerefMut for ergonomic access
  • Use Weak reference to pool so checked-out plugins don't keep pool alive
  • Fix function_exists to handle timeout gracefully instead of panicking
  • Add test for pools with captured PluginBuilder containing custom functions

Breaking API changes:

  • PoolPlugin no longer has a .plugin() method - it now implements
    Deref/DerefMut, so you can use it directly where you previously
    called .plugin(). Example: pool.get()?.call(...) instead of
    pool.get()?.plugin().call(...)
  • PoolPlugin::call() now takes &mut self instead of &self (direct
    ownership instead of interior mutability via RefCell)

Fixes #891

The pool implementation had an unsound `unsafe impl Sync` on `PoolInner`
containing `Vec<PoolPlugin>` where `PoolPlugin` was `Rc<RefCell<Plugin>>`.
This caused intermittent "RefCell already borrowed" panics in multi-threaded
usage (issue extism#891).

The problem with the original design was that it seemed to assumed external
locking (the pool's `Mutex`) would be sufficient to make `Rc<RefCell<>>` safe
to share across threads. However, there were several issues that made the
design unsound:

1. `PoolPlugin` had no `Drop` implementation to synchronize refcount
   decrements. When dropped on different threads, unsynchronized `Rc`
   refcount modifications could lead to data races.
2. The pool used `Rc::strong_count == 1` to track availability, but had no
   control over when clones were created or destroyed. Callers could clone
   and keep `PoolPlugin` instances, breaking the availability tracking.
3. The `PoolPlugin` type didn't leverage the type system to enforce safe
   usage patterns; it relied entirely on external locking that the type
   system couldn't verify.

This replaces the unsafe implementation with proper thread-safe primitives
and move semantics inspired by the deadpool crate.

Changes:

- Replace shared `Rc<RefCell<Plugin>>` with move semantics: plugins are
  moved out of a `VecDeque` when checked out and moved back on drop
- `PoolPlugin` now owns the plugin and implements `Drop` to automatically
  return it, making checkout/return atomic and type-safe.
- Add `Send` bound to `UserDataHandle::Rust` (`dyn Any` → `dyn Any + Send`).
  `UserData<T>` already had the `Send` bound, but when converted to
  `UserDataHandle::Rust` it was being removed during type erasure. This
  was necessary to make `PluginBuilder` with custom functions `Send`,
  enabling pools with custom host functions
- Replace busy-wait polling with `Condvar` for efficient waiting
- `PoolPlugin` implements `Deref`/`DerefMut` for ergonomic access
- Use `Weak` reference to pool so checked-out plugins don't keep pool alive
- Remove all `unsafe impl Send/Sync`. Soundness is now enforced by the
  type system.
- Fix `function_exists` to handle timeout gracefully instead of panicking.
- Add test for pools with captured `PluginBuilder` containing custom functions.

Breaking API changes:

- `PoolPlugin` no longer has a `.plugin()` method - it now implements
  `Deref`/`DerefMut`, so you can use it directly where you previously
  called `.plugin()`. Example: `pool.get()?.call(...)` instead of
  `pool.get()?.plugin().call(...)`
- `PoolPlugin::call()` now takes `&mut self` instead of `&self` (direct
  ownership instead of interior mutability via `RefCell`)
- `UserData` types must be `Send` (previously claimed via `unsafe impl`
  but not enforced through type erasure to `dyn Any`)

Fixes extism#891
…e and Function

This alternative approach trusts that the existing 'unsafe impl Send/Sync
for UserData<T>' is sound (T is never extracted from the Arc<Mutex<>>),
and extends the unsafe chain to UserDataHandle and Function rather than
adding explicit T: Send bounds.

Changes:
- Add 'unsafe impl Send/Sync' for UserDataHandle. This trusts that the
  `UserData<T>`s unsafe Send/Sync impls being sound.
- Keep 'unsafe impl Send/Sync for Plugin'
- No T: Send bounds on UserData::new() or with_function()
- Pool works correctly with move semantics and Arc<Mutex<Plugin>>

Trade-offs vs the other approach:
+ No breaking API changes. Users can continue use non-Send types in
  single-threaded contexts.
- Less safe

Fixes extism#891
@bgmerrell bgmerrell marked this pull request as ready for review February 11, 2026 20:54
@nilslice
Copy link
Copy Markdown
Member

Thanks @bgmerrell! I think both this and #892 are solid. It does seem like #892 may be a slightly safer implementation, and if we're rolling out a fix for that reason maybe we go with that option.

Does that work for you? Let me know if you've leaned more in one direction or the other since opening these PRs.

@bgmerrell
Copy link
Copy Markdown
Contributor Author

bgmerrell commented Feb 26, 2026

@nilslice #892 is safer from a code safety perspective for sure because it doesn't use unsafe and it enforces that Send bound on UserData. I just wasn't sure if that new bound was OK. It works for my use case, but there seems to be a pattern in Extism of doing unsafe impls for Send and Sync for various types, and I'm afraid I don't know why that is (I can only assume it's for some added type flexibility):

bgmerrell@oberon: ~/code/github.com/extism/extism$ rg "unsafe" |rg 'Send|Sync'
runtime/src/pool.rs:unsafe impl Send for PoolInner {}
runtime/src/pool.rs:unsafe impl Sync for PoolInner {}
runtime/src/pool.rs:unsafe impl Send for Pool {}
runtime/src/pool.rs:unsafe impl Sync for Pool {}
runtime/src/lib.rs:unsafe impl<F: Clone + Fn(&str)> Send for LogFunction<F> {}
runtime/src/lib.rs:unsafe impl<F: Clone + Fn(&str)> Sync for LogFunction<F> {}
[... several more ...]

So I created this PR which follows this pattern. Either PR works for me, and I've been running my unit and integration tests for my project with a patched Extism with this PR here for a couple of weeks.

@nilslice
Copy link
Copy Markdown
Member

[...] and it enforces that Send bound on UserData. I just wasn't sure if that new bound was OK.

Yes, thanks for calling this out. I think originally, we planned for mostly non-Rust embedders via FFI, in which case there was no safety enforcement possible. Rust has gained a lot of usage since then though so it's worth tightening up places that we didn't give as much attention to.

I think #892 is the way to go then. I'll let that thought simmer for another day though, and then close this PR and merge that one. Will ping you with any more questions :)

@bgmerrell
Copy link
Copy Markdown
Contributor Author

@nilslice just a polite little bump on this ;) I'd like to ditch my extism fork as soon as possible

@nilslice
Copy link
Copy Markdown
Member

Thank you @bgmerrell!

@nilslice nilslice merged commit 85fce56 into extism:main Mar 19, 2026
5 checks passed
@nilslice
Copy link
Copy Markdown
Member

Release v1.20.0 is on its way out

@bgmerrell
Copy link
Copy Markdown
Contributor Author

@nilslice Thanks! You decided on this one over #892? Just wanted to be sure you merged what you intended because last discussion it seemed like you were leaning toward #892.

@nilslice
Copy link
Copy Markdown
Member

ah.. my bad. I took your comment here as indicating this was your expected version to merge.

I'll probably revert and release #892

Doing too much at once today 😢

@bgmerrell
Copy link
Copy Markdown
Contributor Author

@nilslice ah, yeah, sorry it was kind of awkward having two PRs, so I should have put some more context in my bump comment.

If I can make things easier on you by just rebasing #892, then I'm happy to do that.

@bgmerrell
Copy link
Copy Markdown
Contributor Author

@nilslice was thinking about this more this evening.. it's still a net positive that this one landed. It fixes a real bug and it's consistent with the other Extism code. It also let's me get rid of my fork (thanks for the release!).

It's also what I ended up using that last several weeks in my local testing/development because I thought it would be the more likely to be merged... turns out I was right! Just for an unexpected reason! ;)

@nilslice
Copy link
Copy Markdown
Member

@bgmerrell - well that's good to hear! since you've exercised the implementation all this time, I am confident in it and if it's all the same to you, will just keep it as-is.

we have the alternative in the event things should change and can always revisit the decision then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pool implementation is not thread safe

2 participants