Skip to content

Conversation

@josalem
Copy link
Contributor

@josalem josalem commented Apr 6, 2021

<insert associated issues here>

Description

Makes the reservation of a buffer lockless. Allocating the buffer still requires us to take the lock, but now we can fail buffer creation without taking the lock.

Prior to this change, when the buffer limit is reached and the event rate is still high, we would end up having all the writing threads waiting on the buffer manager lock to request a new buffer. The writing threads would wait on that lock, even if the buffer request was going to fail. This would cause a lock caravan slowing down event writing and reducing the number of events that actually get written to the wire.

With this change, the reservation is outside the lock, allowing the saturated scenario above to make more progress. This increases the throughput in high event count and high thread count scenarios.

Performance Changes

The following all show statistics for stress runs with the following characteristics:

  • 32 writing threads
  • Each thread wrote events as fast as possible. (This is a pathological case to exacerbate the performance bottleneck this patch addresses)
  • The reader app used Stream.CopyAsync to copy the EventPipe stream directly to disk
  • 10 iterations
  • 60 second duration
  • 200 byte event payloads
  • no rundown
  • 256 MB buffer limit

macOS

2017 MacBook Pro 17" 2.8 GHz quad core i7 w/ 16GB RAM

Baseline (current main):

|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| stat                              | Min                | Max                | Average            | Standard Deviation |
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| Events Read                       |       11,072,411.00|       13,079,154.00|       12,401,127.50|          643,711.50|
| Events Dropped                    |       32,538,988.00|       35,485,877.00|       34,470,364.50|          795,709.42|
| Throughput Efficiency (%)         |               24.45|               27.93|               26.45|                1.06|
| Event Throughput (events/sec)     |          182,240.01|          215,463.05|          204,987.46|           10,736.56|
| Data Throughput (Bytes/sec)       |       36,448,002.97|       43,092,610.63|       40,997,491.94|        2,147,312.69|
| Duration (seconds)                |           58.767485|           60.757299|           60.501883|            0.578701|
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|

Patch:

|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| stat                              | Min                | Max                | Average            | Standard Deviation |
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| Events Read                       |       19,104,384.00|       21,163,737.00|       20,334,391.00|          557,927.82|
| Events Dropped                    |       43,920,738.00|       45,595,905.00|       44,732,740.60|          574,846.22|
| Throughput Efficiency (%)         |               30.20|               32.40|               31.25|                0.64|
| Event Throughput (events/sec)     |          322,641.99|          348,899.30|          335,552.91|            7,591.05|
| Data Throughput (Bytes/sec)       |       64,528,398.99|       69,779,859.40|       67,110,581.37|        1,518,210.23|
| Duration (seconds)                |           58.975128|           60.899348|           60.594507|            0.543102|
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|

Result: +63.9% events read, +47.9% total events attempted

Windows

4x12 core Intel Xeon E7 v2 2.7 GHz (48 physical cores, 96 logical cores) w/ 128 GB RAM

baseline:

|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| stat                              | Min                | Max                | Average            | Standard Deviation |
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| Events Read                       |       24,523,483.00|       28,032,820.00|       26,366,913.80|          877,321.92|
| Events Dropped                    |       12,221,072.00|       14,262,520.00|       13,237,912.30|          723,179.17|
| Throughput Efficiency (%)         |               65.18|               68.07|               66.59|                0.92|
| Event Throughput (events/sec)     |          404,859.18|          462,877.70|          434,570.92|           14,670.50|
| Data Throughput (Bytes/sec)       |       80,971,836.92|       92,575,540.29|       86,914,184.11|        2,934,099.06|
| Duration (seconds)                |           60.540638|           61.162813|           60.674750|            0.228717|
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|

Patch:

|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| stat                              | Min                | Max                | Average            | Standard Deviation |
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| Events Read                       |       28,808,525.00|       33,337,792.00|       31,523,538.30|        1,478,407.18|
| Events Dropped                    |       14,582,096.00|       25,968,323.00|       20,683,726.20|        3,643,488.96|
| Throughput Efficiency (%)         |               54.09|               69.57|               60.62|                5.05|
| Event Throughput (events/sec)     |          476,325.06|          551,206.96|          519,749.05|           24,465.25|
| Data Throughput (Bytes/sec)       |       95,265,012.24|      110,241,392.72|      103,949,809.85|        4,893,049.37|
| Duration (seconds)                |           60.480809|           61.137813|           60.652365|            0.217341|
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|

result: +20% events read, +32% total events attempted

4 core Intel i7-7700 3.6 GHz w/ 64 GB RAM

baseline:

|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| stat                              | Min                | Max                | Average            | Standard Deviation |
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| Events Read                       |       27,723,669.00|       37,027,704.00|       31,361,220.80|        3,351,993.48|
| Events Dropped                    |       31,783,330.00|       41,237,280.00|       36,677,955.00|        3,375,436.52|
| Throughput Efficiency (%)         |               40.25|               53.81|               46.09|                4.87|
| Event Throughput (events/sec)     |          445,067.36|          592,066.10|          500,029.90|           53,178.85|
| Data Throughput (Bytes/sec)       |       89,013,472.18|      118,413,219.60|      100,005,980.29|       10,635,770.67|
| Duration (seconds)                |           62.110903|           63.407405|           62.714577|            0.412705|
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|

patch:

|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| stat                              | Min                | Max                | Average            | Standard Deviation |
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| Events Read                       |       35,809,958.00|       66,600,955.00|       56,755,392.90|        8,427,113.79|
| Events Dropped                    |       26,834,785.00|       62,535,559.00|       37,961,868.60|        9,799,860.05|
| Throughput Efficiency (%)         |               36.41|               71.28|               60.06|                9.58|
| Event Throughput (events/sec)     |          533,595.16|          991,750.07|          844,763.18|          125,675.32|
| Data Throughput (Bytes/sec)       |      106,719,032.35|      198,350,013.34|      168,952,636.67|       25,135,063.65|
| Duration (seconds)                |           66.828682|           67.733301|           67.186452|            0.250799|
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|

result: +80% events read, +40% total events attempted

CC @noahfalk @sywhang @lateralusX

* prevents needless lock holding for dropping events
  which frees up the lock to make more progress on
  flushing and making new buffers allowing each thread
  to send more events when the buffer limit is reached.
@josalem josalem added this to the 6.0.0 milestone Apr 6, 2021
@josalem josalem self-assigned this Apr 6, 2021
@josalem josalem marked this pull request as ready for review April 13, 2021 02:18
@josalem josalem requested a review from marek-safar as a code owner April 13, 2021 02:18
@josalem josalem requested a review from sywhang April 13, 2021 16:15
Copy link
Contributor

@sywhang sywhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo a couple of minor things!

@josalem josalem merged commit e837eda into dotnet:main Apr 15, 2021
@josalem josalem deleted the dev/josalem/eventpipe-throughput-improvement1 branch April 15, 2021 23:20
@ghost ghost locked as resolved and limited conversation to collaborators May 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants