Skip to content

Conversation

@viktorklang
Copy link
Contributor

@viktorklang viktorklang commented Nov 27, 2018

This PR makes the global EC implement BatchingExecutor, which on benches for Future increases performance significantly for most combinators.

Numbers from my box using scala.concurrent.global:

- 2.13.x:
+ Proposed:
- [info] AndThenFutureBenchmark.post                    1.130 ±    0.018  ops/ms
+ [info] AndThenFutureBenchmark.post                    3.129 ±    0.072  ops/ms
- [info] AndThenFutureBenchmark.pre                     1.277 ±    0.007  ops/ms
+ [info] AndThenFutureBenchmark.pre                     5.189 ±    0.273  ops/ms
- [info] CallbackFutureBenchmark.post                   0.861 ±    0.006  ops/ms
+ [info] CallbackFutureBenchmark.post                   0.848 ±    0.020  ops/ms
- [info] CallbackFutureBenchmark.pre                    1.021 ±    0.005  ops/ms
+ [info] CallbackFutureBenchmark.pre                    0.865 ±    0.017  ops/ms
- [info] CompleteFutureBenchmark.failure              146.932 ±    3.913  ops/ms
+ [info] CompleteFutureBenchmark.failure              147.145 ±    2.424  ops/ms
- [info] CompleteFutureBenchmark.success              146.881 ±    6.264  ops/ms
+ [info] CompleteFutureBenchmark.success              146.884 ±    5.060  ops/ms
- [info] CompleteWithFutureBenchmark.failure           91.133 ±    2.887  ops/ms
+ [info] CompleteWithFutureBenchmark.failure          217.430 ±   13.574  ops/ms
- [info] CompleteWithFutureBenchmark.success           91.102 ±    1.259  ops/ms
+ [info] CompleteWithFutureBenchmark.success          218.909 ±    2.613  ops/ms
- [info] FilterFutureBenchmark.post                     1.264 ±    0.024  ops/ms
+ [info] FilterFutureBenchmark.post                     3.386 ±    0.117  ops/ms
- [info] FilterFutureBenchmark.pre                      1.409 ±    0.013  ops/ms
+ [info] FilterFutureBenchmark.pre                      6.096 ±    0.549  ops/ms
- [info] FirstCompletedOfFutureBenchmark.post           1.071 ±    0.025  ops/ms
+ [info] FirstCompletedOfFutureBenchmark.post           0.981 ±    0.016  ops/ms
- [info] FirstCompletedOfFutureBenchmark.pre            0.983 ±    0.026  ops/ms
+ [info] FirstCompletedOfFutureBenchmark.pre            0.980 ±    0.024  ops/ms
- [info] FlatMapFutureBenchmark.post                    1.052 ±    0.006  ops/ms
+ [info] FlatMapFutureBenchmark.post                    2.867 ±    0.088  ops/ms
- [info] FlatMapFutureBenchmark.pre                     1.072 ±    0.006  ops/ms
+ [info] FlatMapFutureBenchmark.pre                     3.223 ±    0.031  ops/ms
- [info] LoopFutureBenchmark.post                      93.598 ±    0.400  ops/ms
+ [info] LoopFutureBenchmark.post                      94.910 ±    1.875  ops/ms
- [info] LoopFutureBenchmark.pre                       94.261 ±    1.553  ops/ms
+ [info] LoopFutureBenchmark.pre                       95.783 ±    1.059  ops/ms
- [info] MapFutureBenchmark.post                        1.125 ±    0.041  ops/ms
+ [info] MapFutureBenchmark.post                        3.152 ±    0.033  ops/ms
- [info] MapFutureBenchmark.pre                         1.304 ±    0.011  ops/ms
+ [info] MapFutureBenchmark.pre                         5.204 ±    0.272  ops/ms
- [info] NoopFutureBenchmark.post                   42388.978 ±  648.515  ops/ms
+ [info] NoopFutureBenchmark.post                   41983.270 ±  797.985  ops/ms
- [info] NoopFutureBenchmark.pre                   224481.015 ± 3176.761  ops/ms
+ [info] NoopFutureBenchmark.pre                   224429.800 ± 5779.305  ops/ms
- [info] RecoverFutureBenchmark.post                    0.876 ±    0.033  ops/ms
+ [info] RecoverFutureBenchmark.post                    2.940 ±    0.045  ops/ms
- [info] RecoverFutureBenchmark.pre                     0.984 ±    0.002  ops/ms
+ [info] RecoverFutureBenchmark.pre                     5.099 ±    0.394  ops/ms
- [info] RecoverWithFutureBenchmark.post                1.083 ±    0.012  ops/ms
+ [info] RecoverWithFutureBenchmark.post                2.899 ±    0.050  ops/ms
- [info] RecoverWithFutureBenchmark.pre                 1.073 ±    0.011  ops/ms
+ [info] RecoverWithFutureBenchmark.pre                 5.895 ±    0.216  ops/ms
- [info] SequenceFutureBenchmark.post                   0.509 ±    0.006  ops/ms
+ [info] SequenceFutureBenchmark.post                   0.911 ±    0.006  ops/ms
- [info] SequenceFutureBenchmark.pre                    0.560 ±    0.009  ops/ms
+ [info] SequenceFutureBenchmark.pre                    0.941 ±    0.014  ops/ms
- [info] TransformFutureBenchmark.post                  1.165 ±    0.016  ops/ms
+ [info] TransformFutureBenchmark.post                  3.439 ±    0.050  ops/ms
- [info] TransformFutureBenchmark.pre                   1.385 ±    0.028  ops/ms
+ [info] TransformFutureBenchmark.pre                   3.993 ±    0.160  ops/ms
- [info] TransformWithFutureBenchmark.post              1.013 ±    0.014  ops/ms
+ [info] TransformWithFutureBenchmark.post              2.739 ±    0.047  ops/ms
- [info] TransformWithFutureBenchmark.pre               1.196 ±    0.010  ops/ms
+ [info] TransformWithFutureBenchmark.pre               3.865 ±    0.319  ops/ms
- [info] VariousFutureBenchmark.post                    0.168 ±    0.006  ops/ms
+ [info] VariousFutureBenchmark.post                    0.266 ±    0.009  ops/ms
- [info] VariousFutureBenchmark.pre                     0.192 ±    0.003  ops/ms
+ [info] VariousFutureBenchmark.pre                     0.518 ±    0.005  ops/ms
- [info] ZipWithFutureBenchmark.post                    0.589 ±    0.031  ops/ms
+ [info] ZipWithFutureBenchmark.post                    1.015 ±    0.015  ops/ms
- [info] ZipWithFutureBenchmark.pre                     0.571 ±    0.002  ops/ms
+ [info] ZipWithFutureBenchmark.pre                     1.186 ±    0.016  ops/ms

Note that all operations have not improved performance, but in reality, some of the benches are not typical end-user code.

@scala-jenkins scala-jenkins added this to the 2.13.0-RC1 milestone Nov 27, 2018
@viktorklang viktorklang added the performance the need for speed. usually compiler performance, sometimes runtime performance. label Nov 27, 2018
@viktorklang viktorklang self-assigned this Nov 27, 2018
@He-Pin
Copy link
Contributor

He-Pin commented Nov 27, 2018

Seems like this should be backported to 2.12.x

@viktorklang
Copy link
Contributor Author

@hepin1989 The 2.12.x tree does not use the same implementation as 2.13.x. I'd have to run benches for 2.12.x separately.

@viktorklang
Copy link
Contributor Author

I should note that this also applies to any ECs created using the createDefaultExecutorService machinery.

@viktorklang viktorklang force-pushed the wip-make-global-ec-batched-√ branch from 8339e6a to 2754cc4 Compare November 28, 2018 10:05
@viktorklang
Copy link
Contributor Author

@SethTisue Let me know what is needed to get this merged. Would love to see this in before rc1

@plokhotnyuk
Copy link
Contributor

There are some ~15% regressions for CallbackFutureBenchmark and FirstCompletedOfFutureBenchmark... Will they be mitigated in future?

@viktorklang
Copy link
Contributor Author

@plokhotnyuk I hope to be able to bring them back to around where they were for 2.13.0-M5. As for The lower performance for CallbackBenchmark is due to how the bench is written, as the bench prefers as much parallelism as possible (one can argue that this is a faulty test but given that it is a callback it can only do side-effects). As for the regression of FirstCompletedOfBenchmark this is due to the fact that FirstCompletedOf is an inherent race and does not benefit from trampolining. I'll see if I can do something about that.

Given the impact for the typical use-cases of stacking transformations I think the improvements for global is significant. (Remember that the change does not affect all ECs)

@SethTisue
Copy link
Member

@SethTisue Let me know what is needed to get this merged

well, in scala.concurrent we consider you the primary arbiter of that, actually :-)

I can run the community build on it if you like (though the 2.13 build has ~70 projects currently, compared to ~180 on 2.12, so it's somewhat less assurance)

Would love to see this in before rc1

(RC1 won't be out before January, so there's time)

@SethTisue SethTisue added the release-notes worth highlighting in next release notes label Nov 29, 2018
@viktorklang
Copy link
Contributor Author

@SethTisue :) I'm always uneasy about approving my own work!
I'm investigating how to regain some performance for some of the operators, so let's just keep this open a week or so.

@viktorklang
Copy link
Contributor Author

viktorklang commented Nov 30, 2018

@plokhotnyuk I've made some extra optimizations and we're almost back to HEAD in performance of FirstCompletedOf:

[info] FirstCompletedOfFutureBenchmark.post     gbl         8192          1  thrpt   10  0.975 ± 0.009  ops/ms
[info] FirstCompletedOfFutureBenchmark.pre      gbl         8192          1  thrpt   10  0.934 ± 0.010  ops/ms

(We've got about 10% to go, will see if I can improve that further, but it is doing more work in that case so it is hard to regain everything, but I'll try)

For CallbackFutureBenchmark I'm at:

[info] CallbackFutureBenchmark.post     gbl         8192          1  thrpt   10  0.845 ± 0.007  ops/ms
[info] CallbackFutureBenchmark.pre      gbl         8192          1  thrpt   10  0.935 ± 0.007  ops/ms

Which means the post-bench is back to baseline and the prebench has about 10% to go. Will look into this some more.

@viktorklang
Copy link
Contributor Author

@plokhotnyuk In the pre-case it is really difficult to claw back extra performance because the 10% hit is the extra allocation for the batch, which is always discarded after only having had a single item in it. I will see what the effect is when the benchmark is executed from within a Future.

@viktorklang
Copy link
Contributor Author

viktorklang commented Nov 30, 2018

@plokhotnyuk When the CallbackFutureBenchmark is executed from within foreach of an already completed Future on the same EC then the bench is 4-5x faster, which I think is a pretty nice price to pay for 10% reduction in performance if performed from the "outside":

[info] CallbackFutureBenchmark.post gbl 8192 1 thrpt 10 4.014 ± 0.042 ops/ms
[info] CallbackFutureBenchmark.pre gbl 8192 1 thrpt 10 4.015 ± 0.022 ops/ms

I'll see when I can do to make that 10% perf gap even smaller, but it does seem like a trade-off situation.

@NthPortal
Copy link
Contributor

@viktorklang .pre is the performance of the operation on a Future from a Promise which is already completed, and .post is the performance of the operation on a Future from a Promise which is not yet completed -- is that correct?

@viktorklang
Copy link
Contributor Author

@plokhotnyuk

If I add a method to Batchable, and let each batchable decide whether it is ready to be batched or not, then I can mark foreach and onComplete as non-batchable, but the downside is that 4-5x speedup when executed in a nested fashion is almost completely lost (only about 30% left), and then we regain the baseline performance (even adds a bit too):

// Running the benchmark as usual
[info] CallbackFutureBenchmark.post     gbl         8192          1  thrpt   10  0.937 ± 0.009  ops/ms
[info] CallbackFutureBenchmark.pre      gbl         8192          1  thrpt   10  1.003 ± 0.007  ops/ms
// Running the benchmark within a precompleted future on the same EC
[info] CallbackFutureBenchmark.post     gbl         8192          1  thrpt   10  1.345 ± 0.004  ops/ms
[info] CallbackFutureBenchmark.pre      gbl         8192          1  thrpt   10  1.300 ± 0.032  ops/ms

I added a bit of optimization to FirstCompletedOf which gives a pretty significant boost in the precompleted section:

[info] FirstCompletedOfFutureBenchmark.post     gbl         8192          1  thrpt   10   1.223 ± 0.007  ops/ms
[info] FirstCompletedOfFutureBenchmark.pre      gbl         8192          1  thrpt   10  22.503 ± 1.127  ops/ms

@viktorklang
Copy link
Contributor Author

@NthPortal Exactly. pre = precompleted (completed before the operations are added), post = postcompleted (completed after the operations are added)

@NthPortal
Copy link
Contributor

@viktorklang do we care about one of those more than the other (perhaps because it is more common)?

@viktorklang
Copy link
Contributor Author

@NthPortal It's use-case specific and timing-related (as the futures could be completed either before, during, or after)

@viktorklang
Copy link
Contributor Author

@NthPortal @plokhotnyuk On second thought, I'd prefer not to introduce the new method to Batchable since it is likely to create difficult-to-reason about performance implications for end-users.
So I'll add the rest of the optimizations I added as a part of this PR, and then I can always special-case Future in the global EC's batching-decisions if there are any problems for end-users, down the line since that change would be completely binary compatible and hidden from user code.

@viktorklang
Copy link
Contributor Author

I just added some improvements. Will update the text of the PR when I have cleaned up the bench results.

@viktorklang viktorklang force-pushed the wip-make-global-ec-batched-√ branch 2 times, most recently from 246c292 to 4ccc9e6 Compare December 4, 2018 15:43
@viktorklang
Copy link
Contributor Author

@NthPortal @SethTisue Alright, I believe this one is ready for community build and then RC1 :)

@SethTisue
Copy link
Member

@viktorklang
Copy link
Contributor Author

Thanks @SethTisue !

val state = get()
if (state.isInstanceOf[Try[T]]) "Future("+state+")"
else if (state.isInstanceOf[Link[T]]) state.asInstanceOf[Link[T]].promise(this).toString
else if (state.getClass eq classOf[Link[T]]) state.asInstanceOf[Link[T]].promise(this).toString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be classOf[Link[_]]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! I tend to add the explicit type even if it is erased, mainly as documentation.

val state = get()
if (state.isInstanceOf[Try[T]]) "Future("+state+")"
else if (state.isInstanceOf[Link[T]]) state.asInstanceOf[Link[T]].promise(this).toString
else if (state.getClass eq classOf[Link[T]]) state.asInstanceOf[Link[T]].promise(this).toString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you benchmarked this and it was faster? I'm surprised the JIT can't tell that the type has no children and not walk the object's class hierarchy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it was faster, and I too was surprised :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do remember the getClass being faster a long time ago (Java 1.1/1.2) but it was fixed

I did a quick JMH synthetic benchmark to compare, and I can see no real performance advantage of getClass as the differences are within the noise

Can someone review the benchmark, or re-run on another system to compare
@viktorklang can you share the benchmark that you used and the results

[info] Benchmark                                      Mode  Cnt   Score   Error  Units
[info] InstanceOfBenchmark.innerByClassAllInner       avgt  400  38.009 ▒ 0.106  ns/op
[info] InstanceOfBenchmark.innerByClassAllInnerT      avgt  400  29.694 ▒ 0.083  ns/op
[info] InstanceOfBenchmark.innerByClassMixed          avgt  400  37.584 ▒ 0.049  ns/op
[info] InstanceOfBenchmark.innerByInstanceAllInner    avgt  400  37.709 ▒ 0.087  ns/op
[info] InstanceOfBenchmark.innerByInstanceAllInnerT   avgt  400  29.733 ▒ 0.088  ns/op
[info] InstanceOfBenchmark.innerByInstanceMixed       avgt  400  35.275 ▒ 0.122  ns/op
[info] InstanceOfBenchmark.innerTByClassAllInner      avgt  400  29.586 ▒ 0.076  ns/op
[info] InstanceOfBenchmark.innerTByClassAllInnerT     avgt  400  37.945 ▒ 0.076  ns/op
[info] InstanceOfBenchmark.innerTByClassMixed         avgt  400  34.131 ▒ 0.227  ns/op
[info] InstanceOfBenchmark.innerTByInstanceAllInner   avgt  400  29.834 ▒ 0.126  ns/op
[info] InstanceOfBenchmark.innerTByInstanceAllInnerT  avgt  400  37.654 ▒ 0.046  ns/op
[info] InstanceOfBenchmark.innerTByInstanceMixed      avgt  400  33.725 ▒ 0.046  ns/op

full code and run details in https://gist.github.com/mkeskells/c88dd1201d750d4edd2f306f8ad60f41

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkeskells The difference was small, yes. (I run these benches: https://github.com/scala/scala/blob/76b34c455e54adea171bd618e7ea5d37cc2c4af1/test/benchmarks/src/main/scala/scala/concurrent/FutureBenchmark.scala) I found another, much bigger, source of performance gain so I am happy to switch back to using isInstanceOf (I just need to do some extra benching and cleanup first).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to understand if and when a different approach is faster, after all if it an optimisation that the compiler can apply for us we can keep the code idiomatic and faster.
I would ask @retronym to have a look at the when/if any optimisation is generalisable in the compiler codegen

Or if it just an optimisation to pass onto a JIT/Graal team, if it is just a runtime rather than codegen optimisation

Copy link
Member

@retronym retronym Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, HotSpot and Graal generate equivalent code for instanceof Foo and .getClass == Foo.class testing against an effectively final class. I'd suggest switching back to instanceof.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@retronym Agreed—already done :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkeskells I'm going to compile a list of proposed changes to the compiler based on my observations, for sure!

throw e
} else throw inner
e
} else inner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should inner suppress cause?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO yes—inner suppresses cause when cause is non-fatal and inner is fatal—this is to make sure that we propagate serious problems.

@viktorklang viktorklang force-pushed the wip-make-global-ec-batched-√ branch from 76b34c4 to 4ccc9e6 Compare December 12, 2018 08:59
@viktorklang
Copy link
Contributor Author

I just pushed a new commit which should improve performance a bit further.
There's a new implementation of BatchingExecutor.Batch which only allocates an array for elements if it gets more than 1 element, this reduces the cost for operations which only submits a single Runnable per batch.

@viktorklang
Copy link
Contributor Author

@SethTisue Could you run the community build against this latest commit. I believe it should be as good as it gets right now.

@viktorklang viktorklang force-pushed the wip-make-global-ec-batched-√ branch from 8772676 to e63b1b0 Compare December 13, 2018 19:39
@viktorklang
Copy link
Contributor Author

@SethTisue Found even more optimizations! Looking really good now!

@viktorklang viktorklang force-pushed the wip-make-global-ec-batched-√ branch from e63b1b0 to c2fd06b Compare December 18, 2018 13:31
@viktorklang
Copy link
Contributor Author

Just updated the PR with the latest perf numbers compared to the 2.13.x mainline—some operators have dramatically better performance on the Global EC with these changes.

@viktorklang viktorklang force-pushed the wip-make-global-ec-batched-√ branch 2 times, most recently from 9b0362d to 4981c02 Compare December 19, 2018 09:31
    * Making the global EC a BatchedExecutor (performance)
    * Implements a new BatchedExecutor.Batch with less overhead
    * Switches to tryComplete0 instead of tryComplete to reduce calls
    * Simplifies the relinking logic
    * Moves unlinking to DefaultPromise
    * Inlines foreach and onComplete
    * Runs zipWith both map and flatMap on the same supplied EC
    * Switches to use tryComplete0 for more transformations, for fewer calls
@viktorklang viktorklang force-pushed the wip-make-global-ec-batched-√ branch from 4981c02 to fbada39 Compare December 19, 2018 12:27
@viktorklang
Copy link
Contributor Author

@SethTisue I'm merging this now. I've squashed the commits. Let me know if/when you run the next community-build on the 2.13.x mainline.

@viktorklang viktorklang merged commit 0fa4c64 into scala:2.13.x Dec 19, 2018
@SethTisue
Copy link
Member

Let me know if/when you run the next community-build on the 2.13.x mainline.

I haven't seen any regressions since scala/community-build@ee79f0f

@fgoepel
Copy link

fgoepel commented Jan 12, 2019

Is there a reason why is the BatchedExecutor trait is private[concurrent]? Are users supposed to copy and paste that code or make their own if they need a separate ExecutionContext, e.g. to avoid blocking the global pool, but want it to not have worse performance?

@viktorklang
Copy link
Contributor Author

@shado23 I wasn't sure it should be supported end-user API. If you want to avoid blocking the global pool then you can use blocking{ … }—this will use ManagedBlocker up to the configured maximum extra threads.

@fgoepel
Copy link

fgoepel commented Jan 15, 2019

@viktorklang Thanks for the reply. I hear what you're saying, but I think there are cases where having a truly separate ExecutionContext is advantageous. Slick for instance does exactly that for its JDBC IO threads. In my experience the blocking context behavior can be a cure worse than the disease. Yes, you may avoid thread starvation, but can in return end up with thousands(!) of threads and an OOM exception if you're not very careful about how much work you schedule (which Future doesn't make particularly easy). In the cases I've run into it was always easier and safer to provide a right sized thread pool from the get go, because even if you happen to avoid the OOM issue your system won't be able to deal with an infinite amount of concurrent IO ops anyway and it could even be that you're dealing with background jobs that you don't want to take up a significant amount of resources. Even limiting the extra threads of the global pool isn't the best solution, as you still end up mixing IO and computation work, which has quite different needs, and without control of the scheduling you can still end up blocking all of them. Ultimately even pure computational tasks can look like blocking to the rest of the system if they're heavy enough, so even there you have to put thought into how and where they are executed.

I do think that having the executor be batching by default is a good change, if I'm not mistaken it essentially gives you the behavior that Twitter Future and Monix Task (and probably others) already have where a simple map doesn't immediately incur a trip through the ExecutionContext and associated context switch in most cases. I think that's great, because you currently have to contort yourself to try to avoid this if your transformations are small or accept a significant performance loss.

Ultimately, I think it would be useful to allow users to create their own BatchedExecutor. It doesn't seem like it would expose any delicate internals or be very risky. I understand that binary compatibility concerns etc. create an incentive to be careful about the public API you expose, but in some cases it can also be quite annoying as a user. In this case copy and pasting the implementation wouldn't be the worst thing in the world, but it feels petty to me to disallow external usage if there aren't good reasons to do it. (Maybe there are?)

@viktorklang
Copy link
Contributor Author

@shado23

Yes, you may avoid thread starvation, but can in return end up with thousands(!) of threads and an OOM exception if you're not very careful about how much work you schedule (which Future doesn't make particularly easy).

The global default EC has a default number of extra threads of 256 (configurable via System Properties)—this means that if you use blocking{} you are able to limit impact. Speaking of that, it would also be possible to add a configuration parameter to, instead of executing the blocking logic, throw an exception. This would preserve liveness. It is also possible to install your own BlockContext which does that.

Ultimately, I think it would be useful to allow users to create their own BatchedExecutor. It doesn't seem like it would expose any delicate internals or be very risky

The biggest issues I see is that users/extenders of BatchingExecutor will not realize the implications on the code it executes—which is why the synchronous EC used by Future is not exposed directly to end users.

Also note that you are able to create a "clone" of the global EC by calling this: https://github.com/scala/scala/blob/2.13.x/src/library/scala/concurrent/ExecutionContext.scala#L153

I have some additional optimizations to BatchingExecutor that I hope I'll be able to ship with 2.13.0-RC1.

@fgoepel
Copy link

fgoepel commented Jan 16, 2019

The global default EC has a default number of extra threads of 256 (configurable via System Properties)—this means that if you use blocking{} you are able to limit impact. Speaking of that, it would also be possible to add a configuration parameter to, instead of executing the blocking logic, throw an exception. This would preserve liveness. It is also possible to install your own BlockContext which does that.

Sure, in many cases that will probably work well enough. To be really seamless and safe most likely requires some cooperation from the VM to allow avoiding blocking IO (such as what Go does). Maybe project Loom will provide something here in the future.

Also note that you are able to create a "clone" of the global EC by calling this: https://github.com/scala/scala/blob/2.13.x/src/library/scala/concurrent/ExecutionContext.scala#L153

That's good enough for me then.

I have some additional optimizations to BatchingExecutor that I hope I'll be able to ship with 2.13.0-RC1.

Sounds good. I'm looking forward to being able to take advantage of all the great optimization work you're doing.

@viktorklang
Copy link
Contributor Author

@shado23 Thanks!

@diesalbla diesalbla added the library:concurrent Changes to the concurrency support in stdlib label Mar 15, 2019
@SethTisue SethTisue changed the title Making the global EC a BatchedExecutor (performance) Make the global EC a BatchedExecutor (performance) Apr 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

library:concurrent Changes to the concurrency support in stdlib performance the need for speed. usually compiler performance, sometimes runtime performance. release-notes worth highlighting in next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants