Backport #13075: Add the AdaptivePoolingAllocator #13976

chrisvest · 2024-04-13T22:26:12Z

See PR #13075 for the motivations.

The AdaptivePoolingAllocator is modified to work with the ByteBuf API. This required adding another implementation of ByteBuf, because in this API the buffer and their allocator (pooling or otherwise) are tightly coupled in their implementations.

Note that the AdaptivePoolingAllocator require at least Java 8, because it relies on StampedLock. The constructor performs a version check, and throws an exception on older Java versions.

This allocator, if merged, will be strictly experimental in Netty 4.1.

buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java

franz1981 · 2024-04-14T09:09:28Z

I like a lot the design of this, which resembles what modern GCs does with TLAB allocations:

dedicated fast path for bump only allocation
estimation of "burst" capacity based on allocation telemetry

And it allows implementing some sort of soft-limits with ease too, I think.

And this will make this very virtual thread friendly too, but, the devil is in the details...something not addressed by this is the Recycler used by the pooled allocation, which still relies on ThreadLocal: to fully close the "circle" it would be better to fix it too.

Separated advice instead: if a thread is a fast (event loop) thread, why not setting a fixed affinity with a specific magazine and let who is not (alien thread pools and virtual threads) "compete" against a striped amount of magazine using the thread id to pick one ?

This will unify the benefits of the 2 approaches making the allocation statistics of event loops threads allocation not "disturbed" by others and would likely save any atomic operations for such (actually implementing TLAB machinery).

chrisvest · 2024-04-14T16:45:47Z

Separated advice instead: if a thread is a fast (event loop) thread, why not setting a fixed affinity with a specific magazine and let who is not (alien thread pools and virtual threads) "compete" against a striped amount of magazine using the thread id to pick one ?

This will unify the benefits of the 2 approaches making the allocation statistics of event loops threads allocation not "disturbed" by others and would likely save any atomic operations for such (actually implementing TLAB machinery).

Allocation needs to coordinate with any concurrent close() call, so saving the atomic operations might not be possible.

franz1981 · 2024-04-14T19:21:43Z

It is ok, but considering that only when the current chunk is completed/exhausted we care about retiring it, maybe there is something to strip out of the hot path, for such threads. I have to play with it more to suggest something specific, but anyway; it would be an addition, given that as it is, is already very nice!

chrisvest · 2024-04-16T19:47:13Z

The usedHeapMemory() and usedDirectMemory() accounting doesn't quite work. It's somehow missing decrements.

chrisvest · 2024-04-16T22:55:05Z

Local benchmark numbers on Java 11:

Benchmark                                                  (size)   Mode  Cnt         Score        Error  Units
ByteBufAllocatorBenchmark.adaptiveDirectAllocAndFree        00256  thrpt   20  14659507.477 ± 165289.355  ops/s
ByteBufAllocatorBenchmark.adaptiveHeapAllocAndFree          00256  thrpt   20  22227938.709 ± 449200.309  ops/s
ByteBufAllocatorBenchmark.defaultPooledDirectAllocAndFree   00256  thrpt   20  13129535.618 ± 108988.642  ops/s
ByteBufAllocatorBenchmark.defaultPooledHeapAllocAndFree     00256  thrpt   20  13023927.401 ± 104843.216  ops/s
ByteBufAllocatorBenchmark.pooledDirectAllocAndFree          00256  thrpt   20  12122912.035 ± 337387.354  ops/s
ByteBufAllocatorBenchmark.pooledHeapAllocAndFree            00256  thrpt   20  12586541.154 ±  58123.116  ops/s
ByteBufAllocatorBenchmark.unpooledDirectAllocAndFree        00256  thrpt   20   2121884.292 ±  23003.444  ops/s
ByteBufAllocatorBenchmark.unpooledHeapAllocAndFree          00256  thrpt   20  19869435.020 ± 122243.355  ops/s

normanmaurer

First round of review... I am super excited about this @chrisvest

buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java

See PR netty#13075 for the motivations. The AdaptivePoolingAllocator is modified to work with the ByteBuf API. This required adding another implementation of ByteBuf, because in this API the buffer and their allocator (pooling or otherwise) are tightly coupled in their implementations. Note that the AdaptivePoolingAllocator require at least Java 8, because it relies on StampedLock. The constructor performs a version check, and throws an exception on older Java versions.

The tests passed with "adaptive" as the default allocator.

Co-authored-by: jchrys <[email protected]>

…mark

chrisvest · 2024-04-18T22:13:33Z

And the concurrent allocator benchmark, Java 17:

Benchmark                                                    (size)   Mode  Cnt          Score          Error  Units
ByteBufAllocatorConcurrentBenchmark.allocateReleaseAdaptive   00256  thrpt   20  164186255.001 ± 80355784.409  ops/s
ByteBufAllocatorConcurrentBenchmark.allocateReleasePooled     00256  thrpt   20   73136079.467 ±  2706876.076  ops/s
ByteBufAllocatorConcurrentBenchmark.allocateReleaseUnpooled   00256  thrpt   20    3022716.439 ±   137359.369  ops/s

normanmaurer · 2024-04-19T14:34:53Z

@chrisvest as this is marked as @UnstableApi we should pull it in once you are happy with it :) I am ....

chrisvest · 2024-04-19T20:40:21Z

SocketObjectEchoTest and SocketRstTest are consistently complaining. Needs to be looked into.

By having a predictable pattern in the strings, it becomes much easier to reason about the cause of any test failure.

…e lock We must never touch the chunk-delegate buffer's internal NIO buffer without holding the magazine lock. Getting the internal NIO buffer involves modifying its position and limit, which are a shared mutable resource. Therefor, we cannot lazily obtain this buffer, and have to grab it eagerly when the initialize the AdaptiveByteBuf under the magazine lock, which ensures that no other thread is touching that buffer. This does unfortunately extend the critical section a bit.

The timeout is applied to the combined run time of all the test cases. Since there are more cases, the test needs more time to complete.

chrisvest · 2024-04-19T23:48:01Z

Reran the benchmarks on Java 11:

Benchmark                                                  (size)   Mode  Cnt         Score        Error  Units
ByteBufAllocatorBenchmark.adaptiveDirectAllocAndFree        00256  thrpt   20  21938808.710 ± 267349.998  ops/s
ByteBufAllocatorBenchmark.adaptiveHeapAllocAndFree          00256  thrpt   20  22934470.112 ± 612294.938  ops/s
ByteBufAllocatorBenchmark.defaultPooledDirectAllocAndFree   00256  thrpt   20  12384652.456 ±  73364.731  ops/s
ByteBufAllocatorBenchmark.defaultPooledHeapAllocAndFree     00256  thrpt   20  12404587.038 ±  54743.049  ops/s
ByteBufAllocatorBenchmark.pooledDirectAllocAndFree          00256  thrpt   20  11271708.987 ±  45600.629  ops/s
ByteBufAllocatorBenchmark.pooledHeapAllocAndFree            00256  thrpt   20  11613996.778 ± 112700.407  ops/s
ByteBufAllocatorBenchmark.unpooledDirectAllocAndFree        00256  thrpt   20   2136907.759 ±  20863.526  ops/s
ByteBufAllocatorBenchmark.unpooledHeapAllocAndFree          00256  thrpt   20  20517192.830 ± 103002.065  ops/s

chrisvest · 2024-04-20T03:23:25Z

@normanmaurer @jchrys I think this is ready, now.

jchrys

lgtm. I'm genuinely excited about this. 🚀

normanmaurer · 2024-04-20T13:40:40Z

@chrisvest ship it

franz1981

Now checking the benchmark: sorry, had some family things going on recently...

### What changes were proposed in this pull request? The pr aims to upgrade `netty` from `4.1.109.Final` to `4.1.110.Final`. ### Why are the changes needed? - https://netty.io/news/2024/05/22/4-1-110-Final.html This version has brought some bug fixes and improvements, such as: Fix Zstd throws Exception on read-only volumes (netty/netty#13982) Add unix domain socket transport in netty 4.x via JDK16+ ([#13965](netty/netty#13965)) Backport #13075: Add the AdaptivePoolingAllocator ([#13976](netty/netty#13976)) Add no-value key handling only for form body ([#13998](netty/netty#13998)) Add support for specifying SecureRandom in SSLContext initialization ([#14058](netty/netty#14058)) - https://github.com/netty/netty/issues?q=milestone%3A4.1.110.Final+is%3Aclosed ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46744 from panbingkun/SPARK-48420. Authored-by: panbingkun <[email protected]> Signed-off-by: yangjie01 <[email protected]>

geoand · 2024-06-04T12:03:35Z

buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java

+
+    private final ChunkAllocator chunkAllocator;
+    private final Queue<ChunkByteBuf> centralQueue;
+    private final StampedLock magazineExpandLock;


Unless I am missing something obvious, this is going to cause the class to not be loadable in Java 6 and 7 and the version check in the constructor won't help because it's executed too late.

I think you're right. I'll put up a PR.

geoand · 2024-06-04T12:04:15Z

Awesome to see this backported!

We will definitely start using it in Quarkus - I know @franz1981 is eager to see how it performs in real workloads.

He-Pin · 2024-06-20T03:47:30Z

Seems like the new AdaptivePoolingAllocator uses more Memory.
Env: Java 21 with G1 and libjemalloc

chrisvest · 2024-06-20T05:17:26Z

@He-Pin can you start a discussion and describe the workload? Would also be helpful if you can look into a heap dump and see if there's too many chunks in the central queue, or being held up by long-lived ByteBuf objects, or something else.

He-Pin · 2024-06-20T05:43:14Z

I will describe it in detail after work today.

franz1981 · 2024-06-20T07:01:03Z

Many thanks @He-Pin I am super curious as well!

chrisvest requested a review from normanmaurer April 13, 2024 22:26

chrisvest marked this pull request as draft April 13, 2024 22:26

chrisvest marked this pull request as ready for review April 14, 2024 01:05

jchrys reviewed Apr 14, 2024

View reviewed changes

buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java Outdated Show resolved Hide resolved

jchrys suggested changes Apr 14, 2024

View reviewed changes

buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java Outdated Show resolved Hide resolved

jchrys reviewed Apr 14, 2024

View reviewed changes

buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java Outdated Show resolved Hide resolved

chrisvest force-pushed the 41-adaptive-allocator branch from f27032e to cbf0bda Compare April 14, 2024 23:52

chrisvest requested review from franz1981 and jchrys and removed request for jchrys April 15, 2024 19:55

chrisvest force-pushed the 41-adaptive-allocator branch from cbf0bda to f7f3005 Compare April 16, 2024 18:45

chrisvest marked this pull request as draft April 17, 2024 17:09

normanmaurer requested changes Apr 18, 2024

View reviewed changes

chrisvest and others added 11 commits April 18, 2024 08:38

Add license header to make checkstyle happy

1ccab62

Fix Java 8 compilation issue

c5d32f6

GetBytes/SetBytes IO fixes

0fbc9eb

Make "pooled" the default allocator again

b63d1b2

The tests passed with "adaptive" as the default allocator.

Apply review comment

07979ec

Co-authored-by: jchrys <[email protected]>

Only use pool for AdaptiveByteBuf when thread-locals are cleaned up

7d15e05

Add build with the AdaptiveByteBufAllocator to the build matrix

1719228

Add the adaptive allocator to the test permutations

83ec637

Add a note in the javadoc that the adaptive allocator is experimental

210a5ad

Add the adaptive allocator to the ByteBufAllocatorBenchmark

adb0967

Add more allocator implementations to ByteBufAllocatorConcurrentBench…

7c7630c

…mark

chrisvest marked this pull request as ready for review April 18, 2024 22:13

normanmaurer added this to the 4.1.110.Final milestone Apr 19, 2024

Use JCTools MPMC queues in the adaptive allocator

237ed33

chrisvest marked this pull request as draft April 19, 2024 20:39

chrisvest added 3 commits April 19, 2024 16:15

Make SocketStringEchoTest easier to debug

746ce74

By having a predictable pattern in the strings, it becomes much easier to reason about the cause of any test failure.

Increase timeout in SocketRstTest

823d577

The timeout is applied to the combined run time of all the test cases. Since there are more cases, the test needs more time to complete.

chrisvest marked this pull request as ready for review April 19, 2024 23:38

jchrys approved these changes Apr 20, 2024

View reviewed changes

chrisvest merged commit 0dc95ef into netty:4.1 Apr 20, 2024

chrisvest deleted the 41-adaptive-allocator branch April 20, 2024 14:59

franz1981 reviewed Apr 29, 2024

View reviewed changes

panbingkun mentioned this pull request May 26, 2024

[SPARK-48420][BUILD] Upgrade netty to 4.1.110.Final apache/spark#46744

Closed

geoand reviewed Jun 4, 2024

View reviewed changes

zekronium mentioned this pull request Jul 30, 2024

Vertx pooled allocator should be the same as Netty eclipse-vertx/vert.x#5168

Closed

Uh oh!

Backport #13075: Add the AdaptivePoolingAllocator #13976

Backport #13075: Add the AdaptivePoolingAllocator #13976

Uh oh!

Conversation

chrisvest commented Apr 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

franz1981 commented Apr 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisvest commented Apr 14, 2024

Uh oh!

franz1981 commented Apr 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisvest commented Apr 16, 2024

Uh oh!

chrisvest commented Apr 16, 2024

Uh oh!

normanmaurer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chrisvest commented Apr 18, 2024

Uh oh!

normanmaurer commented Apr 19, 2024

Uh oh!

chrisvest commented Apr 19, 2024

Uh oh!

chrisvest commented Apr 19, 2024

Uh oh!

chrisvest commented Apr 20, 2024

Uh oh!

jchrys left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

normanmaurer commented Apr 20, 2024

Uh oh!

franz1981 left a comment

Choose a reason for hiding this comment

Uh oh!

geoand Jun 4, 2024

Choose a reason for hiding this comment

Uh oh!

chrisvest Jun 4, 2024

Choose a reason for hiding this comment

Uh oh!

geoand Jun 4, 2024

Choose a reason for hiding this comment

Uh oh!

chrisvest Jun 4, 2024

Choose a reason for hiding this comment

Uh oh!

geoand commented Jun 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

He-Pin commented Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisvest commented Jun 20, 2024

Uh oh!

He-Pin commented Jun 20, 2024

Uh oh!

franz1981 commented Jun 20, 2024

Uh oh!

Reviewers

Assignees

chrisvest commented Apr 13, 2024 •

edited

Loading

franz1981 commented Apr 14, 2024 •

edited

Loading

franz1981 commented Apr 14, 2024 •

edited

Loading

jchrys left a comment •

edited

Loading

geoand commented Jun 4, 2024 •

edited

Loading

He-Pin commented Jun 20, 2024 •

edited

Loading