lockless buffer reservation/release in EventPipe #50797

josalem · 2021-04-06T19:33:52Z

Description

Makes the reservation of a buffer lockless. Allocating the buffer still requires us to take the lock, but now we can fail buffer creation without taking the lock.

Prior to this change, when the buffer limit is reached and the event rate is still high, we would end up having all the writing threads waiting on the buffer manager lock to request a new buffer. The writing threads would wait on that lock, even if the buffer request was going to fail. This would cause a lock caravan slowing down event writing and reducing the number of events that actually get written to the wire.

With this change, the reservation is outside the lock, allowing the saturated scenario above to make more progress. This increases the throughput in high event count and high thread count scenarios.

Performance Changes

The following all show statistics for stress runs with the following characteristics:

32 writing threads
Each thread wrote events as fast as possible. (This is a pathological case to exacerbate the performance bottleneck this patch addresses)
The reader app used Stream.CopyAsync to copy the EventPipe stream directly to disk
10 iterations
60 second duration
200 byte event payloads
no rundown
256 MB buffer limit

macOS

2017 MacBook Pro 17" 2.8 GHz quad core i7 w/ 16GB RAM

Baseline (current main):

|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| stat                              | Min                | Max                | Average            | Standard Deviation |
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| Events Read                       |       11,072,411.00|       13,079,154.00|       12,401,127.50|          643,711.50|
| Events Dropped                    |       32,538,988.00|       35,485,877.00|       34,470,364.50|          795,709.42|
| Throughput Efficiency (%)         |               24.45|               27.93|               26.45|                1.06|
| Event Throughput (events/sec)     |          182,240.01|          215,463.05|          204,987.46|           10,736.56|
| Data Throughput (Bytes/sec)       |       36,448,002.97|       43,092,610.63|       40,997,491.94|        2,147,312.69|
| Duration (seconds)                |           58.767485|           60.757299|           60.501883|            0.578701|
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|

Patch:

|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| stat                              | Min                | Max                | Average            | Standard Deviation |
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| Events Read                       |       19,104,384.00|       21,163,737.00|       20,334,391.00|          557,927.82|
| Events Dropped                    |       43,920,738.00|       45,595,905.00|       44,732,740.60|          574,846.22|
| Throughput Efficiency (%)         |               30.20|               32.40|               31.25|                0.64|
| Event Throughput (events/sec)     |          322,641.99|          348,899.30|          335,552.91|            7,591.05|
| Data Throughput (Bytes/sec)       |       64,528,398.99|       69,779,859.40|       67,110,581.37|        1,518,210.23|
| Duration (seconds)                |           58.975128|           60.899348|           60.594507|            0.543102|
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|

Result: +63.9% events read, +47.9% total events attempted

Windows

4x12 core Intel Xeon E7 v2 2.7 GHz (48 physical cores, 96 logical cores) w/ 128 GB RAM

baseline:

|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| stat                              | Min                | Max                | Average            | Standard Deviation |
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| Events Read                       |       24,523,483.00|       28,032,820.00|       26,366,913.80|          877,321.92|
| Events Dropped                    |       12,221,072.00|       14,262,520.00|       13,237,912.30|          723,179.17|
| Throughput Efficiency (%)         |               65.18|               68.07|               66.59|                0.92|
| Event Throughput (events/sec)     |          404,859.18|          462,877.70|          434,570.92|           14,670.50|
| Data Throughput (Bytes/sec)       |       80,971,836.92|       92,575,540.29|       86,914,184.11|        2,934,099.06|
| Duration (seconds)                |           60.540638|           61.162813|           60.674750|            0.228717|
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|

Patch:

|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| stat                              | Min                | Max                | Average            | Standard Deviation |
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| Events Read                       |       28,808,525.00|       33,337,792.00|       31,523,538.30|        1,478,407.18|
| Events Dropped                    |       14,582,096.00|       25,968,323.00|       20,683,726.20|        3,643,488.96|
| Throughput Efficiency (%)         |               54.09|               69.57|               60.62|                5.05|
| Event Throughput (events/sec)     |          476,325.06|          551,206.96|          519,749.05|           24,465.25|
| Data Throughput (Bytes/sec)       |       95,265,012.24|      110,241,392.72|      103,949,809.85|        4,893,049.37|
| Duration (seconds)                |           60.480809|           61.137813|           60.652365|            0.217341|
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|

result: +20% events read, +32% total events attempted

4 core Intel i7-7700 3.6 GHz w/ 64 GB RAM

baseline:

|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| stat                              | Min                | Max                | Average            | Standard Deviation |
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| Events Read                       |       27,723,669.00|       37,027,704.00|       31,361,220.80|        3,351,993.48|
| Events Dropped                    |       31,783,330.00|       41,237,280.00|       36,677,955.00|        3,375,436.52|
| Throughput Efficiency (%)         |               40.25|               53.81|               46.09|                4.87|
| Event Throughput (events/sec)     |          445,067.36|          592,066.10|          500,029.90|           53,178.85|
| Data Throughput (Bytes/sec)       |       89,013,472.18|      118,413,219.60|      100,005,980.29|       10,635,770.67|
| Duration (seconds)                |           62.110903|           63.407405|           62.714577|            0.412705|
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|

patch:

|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| stat                              | Min                | Max                | Average            | Standard Deviation |
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|
| Events Read                       |       35,809,958.00|       66,600,955.00|       56,755,392.90|        8,427,113.79|
| Events Dropped                    |       26,834,785.00|       62,535,559.00|       37,961,868.60|        9,799,860.05|
| Throughput Efficiency (%)         |               36.41|               71.28|               60.06|                9.58|
| Event Throughput (events/sec)     |          533,595.16|          991,750.07|          844,763.18|          125,675.32|
| Data Throughput (Bytes/sec)       |      106,719,032.35|      198,350,013.34|      168,952,636.67|       25,135,063.65|
| Duration (seconds)                |           66.828682|           67.733301|           67.186452|            0.250799|
|-----------------------------------|--------------------|--------------------|--------------------|--------------------|

result: +80% events read, +40% total events attempted

CC @noahfalk @sywhang @lateralusX

* prevents needless lock holding for dropping events which frees up the lock to make more progress on flushing and making new buffers allowing each thread to send more events when the buffer limit is reached.

src/native/eventpipe/ep-buffer-manager.c

src/native/eventpipe/ep-thread.c

sywhang

LGTM modulo a couple of minor things!

src/native/eventpipe/ep-buffer-manager.c

lockless buffer reservation/release

43c4efe

* prevents needless lock holding for dropping events which frees up the lock to make more progress on flushing and making new buffers allowing each thread to send more events when the buffer limit is reached.

josalem added EventPipe area-Tracing-coreclr labels Apr 6, 2021

josalem added this to the 6.0.0 milestone Apr 6, 2021

josalem self-assigned this Apr 6, 2021

appease mono build requirements

6dc02c2

AaronRobinsonMSFT reviewed Apr 7, 2021

View reviewed changes

src/native/eventpipe/ep-buffer-manager.c Outdated Show resolved Hide resolved

noahfalk reviewed Apr 8, 2021

View reviewed changes

src/native/eventpipe/ep-buffer-manager.c Show resolved Hide resolved

src/native/eventpipe/ep-thread.c Outdated Show resolved Hide resolved

PR feedback

7abf9dd

josalem marked this pull request as ready for review April 13, 2021 02:18

josalem requested a review from marek-safar as a code owner April 13, 2021 02:18

josalem requested a review from sywhang April 13, 2021 16:15

sywhang approved these changes Apr 13, 2021

View reviewed changes

src/native/eventpipe/ep-buffer-manager.c Show resolved Hide resolved

src/native/eventpipe/ep-buffer-manager.c Outdated Show resolved Hide resolved

josalem commented Apr 13, 2021

View reviewed changes

src/native/eventpipe/ep-buffer-manager.c Outdated Show resolved Hide resolved

make reserve release private

dad4d01

josalem mentioned this pull request Apr 14, 2021

EventPipe: delete used buffers outside the lock #51203

Closed

josalem merged commit e837eda into dotnet:main Apr 15, 2021

josalem deleted the dev/josalem/eventpipe-throughput-improvement1 branch April 15, 2021 23:20

ghost locked as resolved and limited conversation to collaborators May 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

lockless buffer reservation/release in EventPipe #50797

lockless buffer reservation/release in EventPipe #50797

Uh oh!

josalem commented Apr 6, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sywhang left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lockless buffer reservation/release in EventPipe #50797

lockless buffer reservation/release in EventPipe #50797

Uh oh!

Conversation

josalem commented Apr 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Performance Changes

macOS

2017 MacBook Pro 17" 2.8 GHz quad core i7 w/ 16GB RAM

Windows

4x12 core Intel Xeon E7 v2 2.7 GHz (48 physical cores, 96 logical cores) w/ 128 GB RAM

4 core Intel i7-7700 3.6 GHz w/ 64 GB RAM

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sywhang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

josalem commented Apr 6, 2021 •

edited

Loading