perf(mpsc): rewrite and optimize wait queue by hawkw · Pull Request #22 · hawkw/thingbuf

hawkw · 2021-12-24T19:50:11Z

This branch rewrites the MPSC channel wait queue implementation (again),
in order to improve performance. This undoes a decently large amount of
the perf regression from PR #20.

In particular, I've made the following changes:

Simplified the design a bit, and reduced the number of CAS loops in
both the notify and wait paths
Factored out fast paths (which touch the state variable without
locking) from the notify and wait operations into separate functions,
and marked them as #[inline(always)]. If we weren't able to perform
the operation without actually touching the linked list, we call into
a separate #[inline(never)] function that actually locks the list
and performs the slow path. This means that code that uses these
functions still has a function call in it, but a few instructions for
performing a CAS can be inlined and the function call avoided in some
cases. This significantly improves performance!
Separated the wait function into start_wait (called the first time
a node waits) and continue_wait (called if the node is woken, to
handle spurious wakeups). This allows simplifying the code for
modifying the waker so that we don't have to pass big closures around.
Other miscellaneous optimizations, such as cache padding some
variables that should have been cache padded.

Performance Comparison

These benchmarks were run against the current main branch
(f77d534).

async/mpsc_reusable

async/mpsc_reusable/ThingBuf/10
                        time:   [43.953 us 44.522 us 45.057 us]
                        change: [+0.0419% +1.7594% +3.5099%] (p = 0.05 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
async/mpsc_reusable/ThingBuf/50
                        time:   [140.91 us 142.24 us 143.53 us]
                        change: [-31.201% -29.539% -27.824%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
async/mpsc_reusable/ThingBuf/100
                        time:   [250.31 us 255.03 us 259.68 us]
                        change: [-18.966% -17.190% -15.202%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

async/mpsc_integer

async/mpsc_integer/ThingBuf/10
                        time:   [208.99 us 215.30 us 221.32 us]
                        change: [+0.6957% +3.8603% +6.9400%] (p = 0.02 < 0.05)
                        Change within noise threshold.
async/mpsc_integer/ThingBuf/50
                        time:   [407.46 us 412.74 us 418.31 us]
                        change: [-39.128% -36.567% -33.267%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  7 (7.00%) high severe
async/mpsc_integer/ThingBuf/100
                        time:   [534.35 us 541.42 us 548.91 us]
                        change: [-44.820% -41.502% -37.120%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  7 (7.00%) high severe

async/spsc/try_send_reusable

async/spsc/try_send_reusable/ThingBuf/100
                        time:   [12.310 us 12.353 us 12.398 us]
                        thrpt:  [8.0656 Melem/s 8.0952 Melem/s 8.1236 Melem/s]
                 change:
                        time:   [-7.5146% -7.1996% -6.8566%] (p = 0.00 < 0.05)
                        thrpt:  [+7.3613% +7.7582% +8.1252%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
async/spsc/try_send_reusable/ThingBuf/500
                        time:   [46.691 us 46.778 us 46.871 us]
                        thrpt:  [10.668 Melem/s 10.689 Melem/s 10.709 Melem/s]
                 change:
                        time:   [-9.4767% -9.2760% -9.0811%] (p = 0.00 < 0.05)
                        thrpt:  [+9.9881% +10.224% +10.469%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
async/spsc/try_send_reusable/ThingBuf/1000
                        time:   [89.763 us 90.757 us 91.843 us]
                        thrpt:  [10.888 Melem/s 11.018 Melem/s 11.140 Melem/s]
                 change:
                        time:   [-9.4302% -8.8637% -8.2018%] (p = 0.00 < 0.05)
                        thrpt:  [+8.9346% +9.7257% +10.412%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  8 (8.00%) high severe
async/spsc/try_send_reusable/ThingBuf/5000
                        time:   [415.34 us 417.89 us 420.42 us]
                        thrpt:  [11.893 Melem/s 11.965 Melem/s 12.038 Melem/s]
                 change:
                        time:   [-13.113% -12.774% -12.411%] (p = 0.00 < 0.05)
                        thrpt:  [+14.170% +14.644% +15.093%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high mild
async/spsc/try_send_reusable/ThingBuf/10000
                        time:   [847.35 us 848.63 us 849.98 us]
                        thrpt:  [11.765 Melem/s 11.784 Melem/s 11.802 Melem/s]
                 change:
                        time:   [-11.345% -10.820% -10.388%] (p = 0.00 < 0.05)
                        thrpt:  [+11.592% +12.133% +12.796%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

async/spsc/try_send_integer

async/spsc/try_send_integer/ThingBuf/100
                        time:   [7.2254 us 7.2467 us 7.2690 us]
                        thrpt:  [13.757 Melem/s 13.799 Melem/s 13.840 Melem/s]
                 change:
                        time:   [-13.292% -12.912% -12.520%] (p = 0.00 < 0.05)
                        thrpt:  [+14.312% +14.826% +15.330%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
async/spsc/try_send_integer/ThingBuf/500
                        time:   [34.358 us 34.477 us 34.582 us]
                        thrpt:  [14.458 Melem/s 14.503 Melem/s 14.553 Melem/s]
                 change:
                        time:   [-18.539% -18.312% -18.072%] (p = 0.00 < 0.05)
                        thrpt:  [+22.058% +22.417% +22.758%]
                        Performance has improved.
async/spsc/try_send_integer/ThingBuf/1000
                        time:   [69.107 us 69.273 us 69.434 us]
                        thrpt:  [14.402 Melem/s 14.436 Melem/s 14.470 Melem/s]
                 change:
                        time:   [-17.759% -17.604% -17.444%] (p = 0.00 < 0.05)
                        thrpt:  [+21.130% +21.365% +21.594%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
async/spsc/try_send_integer/ThingBuf/5000
                        time:   [349.44 us 353.41 us 357.81 us]
                        thrpt:  [13.974 Melem/s 14.148 Melem/s 14.309 Melem/s]
                 change:
                        time:   [-14.832% -14.252% -13.447%] (p = 0.00 < 0.05)
                        thrpt:  [+15.537% +16.621% +17.415%]
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) high mild
  8 (8.00%) high severe
async/spsc/try_send_integer/ThingBuf/10000
                        time:   [712.89 us 732.58 us 754.24 us]
                        thrpt:  [13.258 Melem/s 13.650 Melem/s 14.027 Melem/s]
                 change:
                        time:   [-16.082% -15.161% -14.129%] (p = 0.00 < 0.05)
                        thrpt:  [+16.454% +17.870% +19.164%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

I'm actually not really sure why this also improved the try_send
benchmarks, which don't touch the wait queue...but I'll take it!

Signed-off-by: Eliza Weisman [email protected]

Signed-off-by: Eliza Weisman <[email protected]>

turns out we don't need that Signed-off-by: Eliza Weisman <[email protected]>

Signed-off-by: Eliza Weisman <[email protected]>

whoops, my bad --- trhe queue can be empty _or_ waiting when adding a waiter Signed-off-by: Eliza Weisman <[email protected]>

Signed-off-by: Eliza Weisman <[email protected]>

This reverts commit 35b8643. apparently the stupid enum really harms performance :(

i HATE that this meaningfully improves performance. i thought rust had "zero cost abstractions", wtf. Signed-off-by: Eliza Weisman <[email protected]>

Signed-off-by: Eliza Weisman <[email protected]>

this might happen if it failed to consume a wakeup Signed-off-by: Eliza Weisman <[email protected]>

Signed-off-by: Eliza Weisman <[email protected]>

hawkw added 30 commits December 23, 2021 10:28

wip wait queue rewrite

b26ddbf

Signed-off-by: Eliza Weisman <[email protected]>

actually put the queue back in EMPTY

94e3f4e

Signed-off-by: Eliza Weisman <[email protected]>

misc warnings

51cf1c5

Signed-off-by: Eliza Weisman <[email protected]>

fix missing node registration due to take

c8742e9

Signed-off-by: Eliza Weisman <[email protected]>

add parking_lot suppoirt

2d55acf

Signed-off-by: Eliza Weisman <[email protected]>

fixup attributes

7268d53

get rid of closures

323a280

turns out we don't need that Signed-off-by: Eliza Weisman <[email protected]>

what if i was actually wrong about edge weights

8264cb7

Signed-off-by: Eliza Weisman <[email protected]>

wip

cbf204a

Signed-off-by: Eliza Weisman <[email protected]>

make waiter states atomic

8ed9bd9

Signed-off-by: Eliza Weisman <[email protected]>

put back noinlines

57c7280

Signed-off-by: Eliza Weisman <[email protected]>

tweak what gets inlined a bit

6a3a058

Signed-off-by: Eliza Weisman <[email protected]>

rm bonus fn calls

02082d9

Signed-off-by: Eliza Weisman <[email protected]>

inline

0969062

Signed-off-by: Eliza Weisman <[email protected]>

fix backwards is_linked condition

04a1209

Signed-off-by: Eliza Weisman <[email protected]>

cleanup/add comments

5f26566

Signed-off-by: Eliza Weisman <[email protected]>

queue cleanup/comments

95035d1

Signed-off-by: Eliza Weisman <[email protected]>

queue: remove wrong assertion

068c93a

whoops, my bad --- trhe queue can be empty _or_ waiting when adding a waiter Signed-off-by: Eliza Weisman <[email protected]>

cleanup

798daf6

Signed-off-by: Eliza Weisman <[email protected]>

lol we can use the same state enum everywhere

35b8643

Signed-off-by: Eliza Weisman <[email protected]>

remove release mode panics from queue

9f2834e

Signed-off-by: Eliza Weisman <[email protected]>

put back missing break

4831a31

Signed-off-by: Eliza Weisman <[email protected]>

Revert "lol we can use the same state enum everywhere"

ac719d0

This reverts commit 35b8643. apparently the stupid enum really harms performance :(

completely remove stupid enum

a27c3fd

i HATE that this meaningfully improves performance. i thought rust had "zero cost abstractions", wtf. Signed-off-by: Eliza Weisman <[email protected]>

also cache pad waiter states

c1761aa

Signed-off-by: Eliza Weisman <[email protected]>

smol waitcell optimizations

549b4a6

Signed-off-by: Eliza Weisman <[email protected]>

docs update

820f4c5

Signed-off-by: Eliza Weisman <[email protected]>

typo fix

68c140f

Signed-off-by: Eliza Weisman <[email protected]>

allow re-queueing a WAKING node

17927eb

this might happen if it failed to consume a wakeup Signed-off-by: Eliza Weisman <[email protected]>

update the synchronous mpsc

08a6b3a

Signed-off-by: Eliza Weisman <[email protected]>

hawkw self-assigned this Dec 24, 2021

fix sync transitioning to continue_wait wrongly

4883012

Signed-off-by: Eliza Weisman <[email protected]>

hawkw merged commit 8c882b0 into main Dec 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(mpsc): rewrite and optimize wait queue#22

perf(mpsc): rewrite and optimize wait queue#22
hawkw merged 31 commits intomainfrom
eliza/queue-perf

hawkw commented Dec 24, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hawkw commented Dec 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Comparison

async/mpsc_reusable

async/mpsc_integer

async/spsc/try_send_reusable

async/spsc/try_send_integer

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hawkw commented Dec 24, 2021 •

edited

Loading