Fix overly eager drainBatch#2270
Fix overly eager drainBatch#2270djspiewak merged 4 commits intotypelevel:series/3.xfrom vasilmkd:drain-batch-over-eager
drainBatch#2270Conversation
|
Testing the fix on a high core count machine, it seems much easier to reproduce the original bug report on 16 cores than on 4 (my machine). |
|
The changes seem to have the intended effect. Explanation about the bug, when I originally hacked the solution together, I checked whether there are enough fibers in the local queue to form a batch, because if there are, then we cannot add another one (limitation of the current tuning parameters, which we'll be changing very soon). This is a "sensible" check when you're hacking quickly, but it creates an opportunity for a race condition. After the check is done and it indeed returns a So, what happened in the bug report is the following.
The fix just changes the safety check to bail if there are fewer fibers than a one batch worth of fibers, instead of just when there are exactly 0. In this case, a new batch can safely be added to the local queue. |
|
I was also able to elide a branch, because the same check is now done as part of the batch draining process. |
|
Running the full suite of comparative benchmarks and maybe our own suite if I find the time. |
Passed without issues. |
|
Our full suite passed as well, running over 3 hours: Unfortunately I messed up the |
Resolves #2269.
I still need to add a repro as a unit test.