Use a ConcurrentHashMap for Supervisor state if F is Async #2876
Use a ConcurrentHashMap for Supervisor state if F is Async #2876djspiewak merged 1 commit intotypelevel:series/3.3.xfrom
Conversation
|
Using a One Supervisor per worker, but with One Supervisor only (old So we get most of the benefits from just changing to |
6c7b6ea to
eda176a
Compare
|
The problem is that Also |
|
@djspiewak Yeah, sure, I was actually inspired to try I did notice it's not used anywhere in cats or cats-effect today, and it's interesting to see that using |
|
Yeah I'm working on the test failure right now. My thought on the |
eda176a to
18b06eb
Compare
|
So the test failure is legitimate. The test in question is checking to make sure that finalizers in separate actions passed to the same |
|
We'd need something like a more general version of |
|
|
Not sure why some of those CI builds failed, but the last commit passes the tests on my machine (for both JVM and JS). Thanks for figuring out why it failed, then the fix was easy. Let me know if you think it's the right one. ;) |
|
The thing with this solution is it basically replicates what |
|
Yeah, I'm not convinced either. The real issue is just the lock contention for that one shared I see there's a Tangentially, even if it doesn't end up getting used here, something like |
|
There's no particular reason that |
7c9eb5d to
87cbd64
Compare
| */ | ||
| def apply[F[_]](implicit F: Concurrent[F]): Resource[F, Supervisor[F]] = { | ||
| F match { | ||
| //case asyncF: Async[F] => applyForAsync(asyncF) |
There was a problem hiding this comment.
@djspiewak leaving this commented out if you want to wait with jumping the shark, but I see you're about to do so in #2885 anyway ;)
There was a problem hiding this comment.
Jump the shark! Go for it! :-D
|
This latest push no longer reflects the title, and I would of course update it if this approach is accepted, but instead allows Performance is still excellent, even better than the original PR: The implementation of Yes, it does use an impure In my opinion, it is preferable to the "one PS. I was able to get to around 725 ops/min by essentially duplicating the |
|
Ah dang this is cool. I'll look into this more later today! Quick note on the compilation errors though: it's coming from the Scala 2.12 cross-build. I think it would be worth not using the collection shims and instead working against the Java API directly if at all possible. |
| */ | ||
| def apply[F[_]](implicit F: Concurrent[F]): Resource[F, Supervisor[F]] = { | ||
| F match { | ||
| //case asyncF: Async[F] => applyForAsync(asyncF) |
There was a problem hiding this comment.
Jump the shark! Go for it! :-D
| new State[F] { | ||
| override def remove(token: Unique.Token): F[Unit] = F.delay(state.remove(token)).void | ||
| override def add(token: Unique.Token, cancel: F[Unit]): F[Unit] = F.delay(state.put(token, cancel)).void | ||
| override def cancelAll(): F[Unit] = F.defer(state.values().asScala.toList.parUnorderedSequence.void) |
There was a problem hiding this comment.
I think this is the only asScala here, so if we swap this for a manual iteration into a ListBuffer, it should resolve the compatibility issue!
There was a problem hiding this comment.
Done, I hope it will pass all builds and tests now.
87cbd64 to
5e55e22
Compare
5e55e22 to
7ce3b2d
Compare
7ce3b2d to
2e8e632
Compare
2e8e632 to
56d5bcc
Compare
|
Thanks for tackling this!! |
Replaces #2875 (targeted for 3.3.x)
In programs that use the same dispatcher a lot, the
stateRefinSupervisorends up being a hotspot, and a significant amount of time is spent inspinattempting to update it.While the documentation says that dispatchers are cheap and should be constructed often, web servers like blaze-server, jetty-server, http4s-netty, and fs2-netty, use one
Dispatcherper server.I don't think that's wrong, and I've also experienced problems with "dispatcher already shutdown" when trying to use shorter-lived ones, as reported in #2727
Given that, we can fix the hotspot in
Dispatcherby creating oneSupervisorfor each CPU/worker, which is what this PR does.Results (from a Ryzen 5900X, I expect the improvement to scale with the number of cores) look good:
Baseline on
series/3.x:This PR:
I'm also doing some work with Techempower HTTP benchmarks, and this change results in a 5% increase in performance on plaintext using blaze (but interestingly, gets the best result on 16384 concurrent requests while it otherwise peaks at 4096 concurrent requests), and an even greater increase when using a modified http4s-netty backend.
It's possible that there's a better fix for the underlying issue, but this one is simple, and a 3x improvement in (benchmark) performance makes it worth considering, in my opinion. Even if it does making constructing a
Dispatchera tiny bit less cheap.