High(er) performance async Queue#2885
Conversation
|
Auspicious… |
durban
left a comment
There was a problem hiding this comment.
A general comment: none of these queues seem lock-free (which is not necessarily a problem, it's just an observation.)
| // manually complete our own callback | ||
| // note that we could have a race condition here where we're already completed | ||
| // async will deduplicate these calls for us | ||
| // additionally, the continuation (below) is held until the registration completes |
There was a problem hiding this comment.
This ("the continuation is held") seems important here. Is this guaranteed by any Async?
There was a problem hiding this comment.
It is! And I agree it would be worth spelling it out more. We use this trick a lot though.
| def bounded[F[_], A](capacity: Int)(implicit F: GenConcurrent[F, _]): F[Queue[F, A]] = { | ||
| def bounded[F[_], A](capacity: Int)(implicit F: GenConcurrent[F, _]): F[Queue[F, A]] = | ||
| F match { | ||
| case f0: Async[F] => |
There was a problem hiding this comment.
This is not strictly related to this PR, but what's stopping someone implementing Cont from doing the same, and recovering an Async[G] from the MonadCancel[G, Throwable]?
There was a problem hiding this comment.
Doing so and then setting up weird forking scenarios would result in runtime errors. So, caveat emptor.
| def offer(a: A): F[Unit] = F defer { | ||
| try { | ||
| // attempt to put into the buffer; if the buffer is full, it will raise an exception | ||
| buffer.put(a) |
There was a problem hiding this comment.
Since offer and take both first try to use buffer I think it can happen that a waiter is bypassed. For example, initially the queue is empty, and there is one taker waiting. Then, offer and take are racing, and take immediately removes the item inserted by offer, thus bypassing the waiter, which was earlier than take. (Although this whole thing might be unobservable. It seems a "fairness" issue, and not necessarily a "semantics" issue.)
There was a problem hiding this comment.
You're correct! There are actually a couple cases like this. Strict fairness is not guaranteed in highly contended races.
|
They're definitely all lock free. They aren't contention free though. |
|
Regarding lock freedom: Let's look at There is another similar case (spinwaiting on another thread) in In |
|
The spinlock in UnsafeUnbounded is not needed for correctness, only
performance.--
Cheers,
√
|
|
Depends on #3000 |
Plenty more room to improve, but I'll take a 2-4x as a starting point. |
First off, one thing I discovered as part of this is that the existing
Queueis really a lot faster than you probably think. But we can do better.This relies on the runtime typecase trick on the typeclass to see if the
GenConcurrentis secretly actually anAsync. When it is, and when the bound is> 1, we transparently swap to this more efficient implementation. Benchmarks to follow, but it's somewhere between 2x and 10x faster, depending on the scenario and the number of physical threads. Implementation is based on jctools for the bounded part, and all credit is due to @viktorklang for the unbounded part. Bugs are probably my fault.Lots of room to improve here! A non-exhaustive list:
unboundedwithout too much trouble, just by building on a pair ofUnboundedUnsafescircularis also pretty easy with what we havesynchronousis going to require a brand new structure and a lot of thought. That's a very tricky casetakeAll-like operation, meaning we can overridetryTakeNand it will be much better than the naive version, which should in turn make Fs2Channelvastly betterRight now, it starts failing if you enqueue more thanLong.MaxValue. This is easily fixable, I was just lazyAn even more substantial optimization will be striping consumers. While this is an mpmc queue, the far-and-away most common scenario is mpsc, which would allow us to optimize it a lot. We can take advantage of that by striping sc queues so long as we don't care as much about fairness in that case, and since we know our consuming threads are bounded by the physical threads, we also know our stripes will be strictly bounded.
Anyway, lots of room for future excitement.
Closes #2771