Skip to content

Fix overly eager drainBatch#2270

Merged
djspiewak merged 4 commits intotypelevel:series/3.xfrom
vasilmkd:drain-batch-over-eager
Aug 28, 2021
Merged

Fix overly eager drainBatch#2270
djspiewak merged 4 commits intotypelevel:series/3.xfrom
vasilmkd:drain-batch-over-eager

Conversation

@vasilmkd
Copy link
Copy Markdown
Member

@vasilmkd vasilmkd commented Aug 28, 2021

Resolves #2269.

I still need to add a repro as a unit test.

@vasilmkd
Copy link
Copy Markdown
Member Author

Testing the fix on a high core count machine, it seems much easier to reproduce the original bug report on 16 cores than on 4 (my machine).

@vasilmkd
Copy link
Copy Markdown
Member Author

vasilmkd commented Aug 28, 2021

The changes seem to have the intended effect.

Explanation about the bug, when I originally hacked the solution together, I checked whether there are enough fibers in the local queue to form a batch, because if there are, then we cannot add another one (limitation of the current tuning parameters, which we'll be changing very soon). This is a "sensible" check when you're hacking quickly, but it creates an opportunity for a race condition. After the check is done and it indeed returns a true, meaning that there are enough fibers to form a batch, the code entered a region which tried to extract a batch of fibers from the local queue at any cost (only giving up if there are exactly zero fibers in the local queue).

So, what happened in the bug report is the following.

  1. The check for enough fibers to form a batch passed.
  2. The problematic code region was entered.
  3. In the meantime, due to thread interleaving, another worker thread was scheduled to steal fibers from the local queue that we're examining. The steal is successful and half of the fibers are gone.
  4. The original worker thread is rescheduled. The second "safety" check fails because there are more than 0 fibers in the local queue, just less than a batch now.
  5. The original worker thread creates a wrong batch where some fibers are null references and ships it off to the batched queue.
  6. A third worker thread queries the batched queue and due to no fault of its own starts executing null fibers because there are null fibers in the batch. And you know what happens next.

The fix just changes the safety check to bail if there are fewer fibers than a one batch worth of fibers, instead of just when there are exactly 0. In this case, a new batch can safely be added to the local queue.

@vasilmkd
Copy link
Copy Markdown
Member Author

I was also able to elide a branch, because the same check is now done as part of the batch draining process.

@vasilmkd
Copy link
Copy Markdown
Member Author

Running the full suite of comparative benchmarks and maybe our own suite if I find the time.

@vasilmkd
Copy link
Copy Markdown
Member Author

vasilmkd commented Aug 28, 2021

Benchmark                                  Mode  Cnt     Score     Error    Units
Benchmarks.catsEffect3Alloc               thrpt   20     4.907 ±   0.073  ops/min
Benchmarks.catsEffect3DeepBind            thrpt   20  5524.454 ±  75.473    ops/s
Benchmarks.catsEffect3EnqueueDequeue      thrpt   20   198.327 ±   1.998    ops/s
Benchmarks.catsEffect3RuntimeChainedFork  thrpt   20  1292.265 ±  26.662    ops/s
Benchmarks.catsEffect3RuntimeForkMany     thrpt   20   164.087 ±   3.436    ops/s
Benchmarks.catsEffect3RuntimePingPong     thrpt   20   445.916 ±  49.447    ops/s
Benchmarks.catsEffect3RuntimeYieldMany    thrpt   20   313.239 ±  43.439    ops/s
Benchmarks.catsEffect3Scheduling          thrpt   20    14.549 ±   1.713  ops/min

Passed without issues.

@vasilmkd
Copy link
Copy Markdown
Member Author

vasilmkd commented Aug 28, 2021

Our full suite passed as well, running over 3 hours:

Benchmark                                             (cpuTokens)   (size)   Mode  Cnt      Score      Error    Units
AsyncBenchmark.async                                          N/A      100  thrpt   10  17211.105 ±  192.715    ops/s
AsyncBenchmark.bracket                                        N/A      100  thrpt   10  11415.176 ±  329.277    ops/s
AsyncBenchmark.cancelable                                     N/A      100  thrpt   10  15244.468 ±  439.658    ops/s
AsyncBenchmark.race                                           N/A      100  thrpt   10   5657.483 ±  277.077    ops/s
AsyncBenchmark.racePair                                       N/A      100  thrpt   10  19938.168 ± 1164.753    ops/s
AsyncBenchmark.start                                          N/A      100  thrpt   10   4475.212 ±  560.931    ops/s
AsyncBenchmark.uncancelable                                   N/A      100  thrpt   10  25088.404 ±  474.587    ops/s
AttemptBenchmark.errorRaised                                  N/A    10000  thrpt   10   1306.509 ±   64.681    ops/s
AttemptBenchmark.happyPath                                    N/A    10000  thrpt   10   1845.562 ±  110.183    ops/s
BothBenchmark.happyPath                                       N/A    10000  thrpt   10     20.471 ±    0.995    ops/s
DeepBindBenchmark.async                                       N/A    10000  thrpt   10   1110.006 ±   64.165    ops/s
DeepBindBenchmark.delay                                       N/A    10000  thrpt   10   3066.490 ±  131.700    ops/s
DeepBindBenchmark.pure                                        N/A    10000  thrpt   10   4028.447 ±  150.454    ops/s
DispatcherBenchmark.scheduling                                N/A     1000  thrpt   10     82.803 ±    4.072  ops/min
HandleErrorBenchmark.errorRaised                              N/A    10000  thrpt   10   1311.564 ±   41.595    ops/s
HandleErrorBenchmark.happyPath                                N/A    10000  thrpt   10   1432.996 ±   30.169    ops/s
MapCallsBenchmark.batch120                                    N/A      N/A  thrpt   10    299.053 ±   12.942    ops/s
MapCallsBenchmark.batch30                                     N/A      N/A  thrpt   10     85.078 ±    9.396    ops/s
MapCallsBenchmark.one                                         N/A      N/A  thrpt   10      2.921 ±    0.193    ops/s
MapStreamBenchmark.batch120                                   N/A      N/A  thrpt   10   1672.476 ±   89.939    ops/s
MapStreamBenchmark.batch30                                    N/A      N/A  thrpt   10    652.051 ±   12.298    ops/s
MapStreamBenchmark.one                                        N/A      N/A  thrpt   10   1081.783 ±   40.177    ops/s
ParallelBenchmark.parTraverse                               10000     1000  thrpt   10    288.269 ±    4.979    ops/s
ParallelBenchmark.traverse                                  10000     1000  thrpt   10     33.278 ±    0.119    ops/s
RaceBenchmark.happyPath                                       N/A    10000  thrpt   10     18.736 ±    2.175    ops/s
RedeemBenchmark.errorRaised                                   N/A    10000  thrpt   10    787.082 ±   60.492    ops/s
RedeemBenchmark.happyPath                                     N/A    10000  thrpt   10    991.918 ±   41.842    ops/s
RedeemWithBenchmark.errorRaised                               N/A    10000  thrpt   10   1131.144 ±  115.574    ops/s
RedeemWithBenchmark.happyPath                                 N/A    10000  thrpt   10   1651.219 ±   79.691    ops/s
RefBenchmark.getAndUpdate                                     N/A      N/A  thrpt   10   2349.809 ±   97.313    ops/s
RefBenchmark.modify                                           N/A      N/A  thrpt   10   2117.175 ±  168.849    ops/s

Unfortunately I messed up the screen (I was running this over ssh) and couldn't salvage the whole output at the end.

@vasilmkd vasilmkd marked this pull request as ready for review August 28, 2021 13:43
@djspiewak djspiewak merged commit 62e10fe into typelevel:series/3.x Aug 28, 2021
@vasilmkd vasilmkd deleted the drain-batch-over-eager branch August 30, 2021 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NPE in catsEffect3RuntimePingPong

2 participants