-
-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Backport #13075: Add the AdaptivePoolingAllocator #13976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java
Outdated
Show resolved
Hide resolved
buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java
Outdated
Show resolved
Hide resolved
buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java
Outdated
Show resolved
Hide resolved
|
I like a lot the design of this, which resembles what modern GCs does with TLAB allocations:
And it allows implementing some sort of soft-limits with ease too, I think. And this will make this very virtual thread friendly too, but, the devil is in the details...something not addressed by this is the Recycler used by the pooled allocation, which still relies on ThreadLocal: to fully close the "circle" it would be better to fix it too. Separated advice instead: if a thread is a fast (event loop) thread, why not setting a fixed affinity with a specific magazine and let who is not (alien thread pools and virtual threads) "compete" against a striped amount of magazine using the thread id to pick one ? This will unify the benefits of the 2 approaches making the allocation statistics of event loops threads allocation not "disturbed" by others and would likely save any atomic operations for such (actually implementing TLAB machinery). |
Allocation needs to coordinate with any concurrent |
|
It is ok, but considering that only when the current chunk is completed/exhausted we care about retiring it, maybe there is something to strip out of the hot path, for such threads. I have to play with it more to suggest something specific, but anyway; it would be an addition, given that as it is, is already very nice! |
f27032e to
cbf0bda
Compare
cbf0bda to
f7f3005
Compare
|
The |
|
Local benchmark numbers on Java 11: |
normanmaurer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round of review... I am super excited about this @chrisvest
buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java
Outdated
Show resolved
Hide resolved
buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java
Outdated
Show resolved
Hide resolved
buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java
Outdated
Show resolved
Hide resolved
buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java
Outdated
Show resolved
Hide resolved
buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java
Outdated
Show resolved
Hide resolved
buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java
Outdated
Show resolved
Hide resolved
buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java
Outdated
Show resolved
Hide resolved
buffer/src/main/java/io/netty/buffer/AdaptivePoolingAllocator.java
Outdated
Show resolved
Hide resolved
See PR netty#13075 for the motivations. The AdaptivePoolingAllocator is modified to work with the ByteBuf API. This required adding another implementation of ByteBuf, because in this API the buffer and their allocator (pooling or otherwise) are tightly coupled in their implementations. Note that the AdaptivePoolingAllocator require at least Java 8, because it relies on StampedLock. The constructor performs a version check, and throws an exception on older Java versions.
The tests passed with "adaptive" as the default allocator.
Co-authored-by: jchrys <[email protected]>
|
And the concurrent allocator benchmark, Java 17: |
|
@chrisvest as this is marked as |
|
|
By having a predictable pattern in the strings, it becomes much easier to reason about the cause of any test failure.
…e lock We must never touch the chunk-delegate buffer's internal NIO buffer without holding the magazine lock. Getting the internal NIO buffer involves modifying its position and limit, which are a shared mutable resource. Therefor, we cannot lazily obtain this buffer, and have to grab it eagerly when the initialize the AdaptiveByteBuf under the magazine lock, which ensures that no other thread is touching that buffer. This does unfortunately extend the critical section a bit.
The timeout is applied to the combined run time of all the test cases. Since there are more cases, the test needs more time to complete.
|
Reran the benchmarks on Java 11: |
|
@normanmaurer @jchrys I think this is ready, now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. I'm genuinely excited about this. 🚀
|
@chrisvest ship it |
franz1981
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now checking the benchmark: sorry, had some family things going on recently...
### What changes were proposed in this pull request? The pr aims to upgrade `netty` from `4.1.109.Final` to `4.1.110.Final`. ### Why are the changes needed? - https://netty.io/news/2024/05/22/4-1-110-Final.html This version has brought some bug fixes and improvements, such as: Fix Zstd throws Exception on read-only volumes (netty/netty#13982) Add unix domain socket transport in netty 4.x via JDK16+ ([#13965](netty/netty#13965)) Backport #13075: Add the AdaptivePoolingAllocator ([#13976](netty/netty#13976)) Add no-value key handling only for form body ([#13998](netty/netty#13998)) Add support for specifying SecureRandom in SSLContext initialization ([#14058](netty/netty#14058)) - https://github.com/netty/netty/issues?q=milestone%3A4.1.110.Final+is%3Aclosed ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46744 from panbingkun/SPARK-48420. Authored-by: panbingkun <[email protected]> Signed-off-by: yangjie01 <[email protected]>
|
|
||
| private final ChunkAllocator chunkAllocator; | ||
| private final Queue<ChunkByteBuf> centralQueue; | ||
| private final StampedLock magazineExpandLock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I am missing something obvious, this is going to cause the class to not be loadable in Java 6 and 7 and the version check in the constructor won't help because it's executed too late.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right. I'll put up a PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Awesome to see this backported! We will definitely start using it in Quarkus - I know @franz1981 is eager to see how it performs in real workloads. |
|
@He-Pin can you start a discussion and describe the workload? Would also be helpful if you can look into a heap dump and see if there's too many chunks in the central queue, or being held up by long-lived ByteBuf objects, or something else. |
|
I will describe it in detail after work today. |
|
Many thanks @He-Pin I am super curious as well! |

See PR #13075 for the motivations.
The AdaptivePoolingAllocator is modified to work with the ByteBuf API. This required adding another implementation of ByteBuf, because in this API the buffer and their allocator (pooling or otherwise) are tightly coupled in their implementations.
Note that the AdaptivePoolingAllocator require at least Java 8, because it relies on StampedLock. The constructor performs a version check, and throws an exception on older Java versions.
This allocator, if merged, will be strictly experimental in Netty 4.1.