Fix the spinlock from #4377 and probably also fix #4448 by durban · Pull Request #4449 · typelevel/cats-effect

durban · 2025-07-21T20:24:05Z

So, #4377 added a spinlock for interrupting a sleeping/polling WorkerThread. That spinlock is incorrect, i.e., a worker thread doesn't necessarily waits (spins) until the spinlock is released by WSTP#notifyParked. The incorrect scenario is when the worker thread observes parked being "parked", tries to CAS it to "unparked", and the CAS fails. The current implementation in this case simply goes on, as if the lock is free. While in fact it may not be free; it's likely that the CAS failed exactly because the spinlock was acquired.

This PR fixes the spinlock, so in this case the worker thread re-reads parked, and correctly waits for the lock to be free.

Now, I think this is the reason for the bug #4448. I think what happens there is that the worker thread goes on while the spinlock is locked by WSTP#notifyParked; the worker thread goes to sleep (parked becomes "parked"); WSTP#notifyParked "releases" the lock it thinks it still holds, thus parked changes state directly from "parked" -> "unparked", without a corresponding doneSleeping (in normal (correct) operation, a doneSleeping call is always made when parked moves out from "parked"). Thus, a doneSleeping call is, in effect "missing". The state 0xfffeffff¹ observed in #4448 is exactly the missing doneSleeping call (i.e., -DeltaSearching).

That's why I believe this should fix #4448. I don't know how to write a test for that (the only way I could observe the issue locally is by putting Thread.yields in various places in the WSTP). However, I believe the occasional MutexSpec failures were exactly because of this problem.

0xfffeffff is the (65534, 65535) = (ActiveThreadCount, SearchingThreadCount) reported in Poisoned pod, extremely slow, weird thread counts #4448. ↩

durban · 2025-07-21T20:33:45Z

Btw, in #4377 I've asked why the spinlock is necessary (I dislike spinlocks ;-). @djspiewak said (more or less), that due to doneSleeping. I think this issue proves, that @djspiewak was right, since the spinlock being incorrect caused a bug with doneSleeping. (Although, I still think that it would be possible to avoid the spinlock by always carefully CASing parked, let's fix this first, then possibly later we might figure out how exactly to remove the lock :-)

djspiewak · 2025-07-21T20:39:27Z

I've asked why the spinlock is necessary (I dislike spinlocks ;-)

What did spinlocks ever do to you?! :P

More seriously, the async bounded and unbounded queue implementations are a pretty fun one to look at from this standpoint. They're lock free MPMC structures, and what's interesting is they require a spinlock in a very very specific circumstance. It's somewhat similar to the parked case, where you have two bits of atomic state and sometimes you can detect them in a mutually inconsistent state, but you know that when that happens, you're just seeing the state a few cycles before the other CPU publishes its results, so you just have to chill out for a second and wait for the memory bus to do its thing. The only alternative would be to have a hard thread lock, which effectively does something very similar but indirectly in the kernel, so nothing is really gained.

djspiewak

Oh this is a really nice catch. I'm not sure why I didn't think concurrent interruption was possible but obviously that's an issue.

durban · 2025-07-21T20:42:26Z

They're lock free MPMC structures, and what's interesting is they require a spinlock

If they require a spinlock, they're not lock free :P

djspiewak · 2025-07-21T21:35:57Z

If they require a spinlock, they're not lock free :P

Mostly lock-free! :D But in fairness, I think that's pretty much everything. There are no truly lock-free concurrent systems, only systems which marginalize locks to extremely rare cases involving composite atomics.

durban · 2025-07-21T22:32:12Z

I mean... a MPMC queue can definitely be lock free. The one using the spinlock is possibly faster, but there is no theoretical or even practical obstacle to making it lock free (i.e., using a spinlock in rare cases will probably make it faster than using an algorithm that never uses a lock... but that algorithm still exists, and could be used...).

core/jvm/src/main/scala/cats/effect/unsafe/WorkerThread.scala

Co-authored-by: Mareks Rampāns <[email protected]>

Fix the spinlock from typelevel#4377

a6816f0

durban mentioned this pull request Jul 21, 2025

Poisoned pod, extremely slow, weird thread counts #4448

Closed

durban requested a review from djspiewak July 21, 2025 20:25

durban added the 🪲 bug label Jul 21, 2025

djspiewak previously approved these changes Jul 21, 2025

View reviewed changes

durban marked this pull request as ready for review July 21, 2025 20:52

mr-git reviewed Jul 22, 2025

View reviewed changes

core/jvm/src/main/scala/cats/effect/unsafe/WorkerThread.scala Outdated Show resolved Hide resolved

Update core/jvm/src/main/scala/cats/effect/unsafe/WorkerThread.scala

b4798eb

Co-authored-by: Mareks Rampāns <[email protected]>

djspiewak dismissed their stale review via b4798eb July 22, 2025 14:13

djspiewak approved these changes Jul 22, 2025

View reviewed changes

djspiewak merged commit 624af32 into typelevel:series/3.6.x Jul 22, 2025
34 of 36 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix the spinlock from #4377 and probably also fix #4448#4449

Fix the spinlock from #4377 and probably also fix #4448#4449
djspiewak merged 2 commits intotypelevel:series/3.6.xfrom
durban:fixSpinlock

durban commented Jul 21, 2025 •

edited

Loading

Uh oh!

durban commented Jul 21, 2025

Uh oh!

djspiewak commented Jul 21, 2025

Uh oh!

djspiewak left a comment

Uh oh!

durban commented Jul 21, 2025

Uh oh!

djspiewak commented Jul 21, 2025

Uh oh!

durban commented Jul 21, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

durban commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

durban commented Jul 21, 2025

Uh oh!

djspiewak commented Jul 21, 2025

Uh oh!

djspiewak left a comment

Choose a reason for hiding this comment

Uh oh!

durban commented Jul 21, 2025

Uh oh!

djspiewak commented Jul 21, 2025

Uh oh!

durban commented Jul 21, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

durban commented Jul 21, 2025 •

edited

Loading