Skip to content

Experimental: introduce mimalloc MiMallocByteBufAllocator#15525

Closed
laosijikaichele wants to merge 121 commits into
netty:4.2from
laosijikaichele:4.2-m5
Closed

Experimental: introduce mimalloc MiMallocByteBufAllocator#15525
laosijikaichele wants to merge 121 commits into
netty:4.2from
laosijikaichele:4.2-m5

Conversation

@laosijikaichele
Copy link
Copy Markdown
Contributor

@laosijikaichele laosijikaichele commented Aug 2, 2025

Motivation:

For threads-limited use cases, including threads-limited & long-running virtual threads use case, we can still utilize threadlocal to improve performance, for example, our existing allocators still use threadlocal for event-loop threads allocation.

According to the mimalloc-paper, mimalloc is designed to better deal with reference counting use cases, and has good performance advantages. This seems naturally suitable for our ByteBuf allocation.

This PR implemented a netty version mimalloc allocator MiMallocByteBufAllocator, which takes reference from https://github.com/microsoft/mimalloc, and shows better performance on initial benchmarks, the benchmark numbers will be shown later.

Modification:

Added MiMallocByteBufAllocator and related classes.

Result:

Better performance for threads-limited use cases.

@laosijikaichele
Copy link
Copy Markdown
Contributor Author

laosijikaichele commented Aug 2, 2025

To better simulate the real world allocations, the following benchmarks copied size array WEB_SOCKET_PROXY_PATTERN from AllocationPatternSimulator, and flatten the size array to a sizeList, and shuffled the sizeList.

  1. Allocate and release in same threads: jmh-link, bench-code

    截屏2025-08-05 08 57 38
  2. Allocate in one thread, and release in another thread: jmh-link, bench-code

    Image

private MiByteBuf allocate(int size, int maxCapacity, MiByteBuf byteBuf) {
LocalHeap localHeap = THREAD_LOCAL_HEAP.get();
int wSize = toWordSize(size);
if (size <= PAGES_FREE_DIRECT_SIZE_MAX) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fast path here is rather different from adaptive (the size classes): it has 128 size classes distant 8 bytes each.
This enable to save the double lookup to compute the class first.

Additionally, this is always using thread local; to have a fair comparison we should enable adaptive to always use thread local as well (i made it possible at that time iirc..)

Copy link
Copy Markdown
Contributor Author

@laosijikaichele laosijikaichele Aug 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to have a fair comparison we should enable adaptive to always use thread local as well

The adaptive already use thread local too in the benchmark, because the benchmark use event-loop threads, check the second param of the benchmark class constructor:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but the number of hops/indirections is still different due to the different way buffers are recycled...

Said that, my comment was more of a generic one

Block block = page.freeList;
if (block != null) {
if (byteBuf == null) {
byteBuf = block;
Copy link
Copy Markdown
Contributor

@franz1981 franz1981 Aug 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the best case scenario the pooling of buffers is not using a separate storage but is both of the right size (class) and held the wrapper too.
This save additional atomic operations (or using an array dequeue to hold the empty shells, as adaptive does)

@franz1981
Copy link
Copy Markdown
Contributor

As explained in few comments there are few key differences which can explain the performance advantage in microbenchs (#15509 should help as well, but there is still a double lookup + recycling cost).
I will read again the paper, but with microbenchmarks the devil is in the details, as usual!

@franz1981
Copy link
Copy Markdown
Contributor

franz1981 commented Aug 3, 2025

After a first round of assembly inspection of performance result, beware:

  • suspicious not inlined calls, see
image This can be easily fixed by making non inlineable the base benchmark call (likely) - but is not granted! You can use async profiler with `cstack vm` to "see" it easily
  • as said earlier: adaptive perform way more atomic operations for the size class case, which could be easily fixed (nudge nudge @chrisvest ) and this will make it likely on par of this allocator, for these specific benchmarks (which is high IPC and atomic uncontended instructions make it very costly!)

In order to test how the allocators reuse memory instead it should be needed to "touch" and dirty enough memory to go to L2 or even LLC: with this, the way the allocators reuse memory will be dominant compared to the orchestration required to get/release buffers (which is still important, but is a different type of quality we want to improve on).

lao added 4 commits August 4, 2025 02:59
… list; use one look up for size, instead of double data-dependent look up
…d of many, for each sucessful queue.offer() operation
@laosijikaichele
Copy link
Copy Markdown
Contributor Author

laosijikaichele commented Aug 3, 2025

There were inappropriate logic in ByteBufAllocatorProducerConsumerBenchmark, which has been corrected by this commit, also some other benchmark code optimizations(thanks @franz1981) have been made, plus a bug-fix.

So I re-run the benchmarks to check the numbers:

1. Allocate and release in same threads: jmh-link
截屏2025-08-04 03 21 59

2. Allocate in one thread, and release in another thread: jmh-link
截屏2025-08-04 03 22 14

As we can see, for the second benchmark, the adaptive_heap number becomes close with mimalloc_heap, which should be caused by this commit.

@franz1981
Copy link
Copy Markdown
Contributor

franz1981 commented Aug 3, 2025

Ywc 🙏
The numbers here are:

  • fixing the inlining problem I've mentioned earlier by using compiler control on the benchmark method to force it not be inlined?
  • reducing the benchmark input samples to a single not data dependent lookup? i.e. compute the next input samples and looking it up

The numbers are not still matching the ones on my old Xeon (I have used numactl + localalloc and a single thread to avoid any false sharing issues to add noise), but since atomic ops are a dominant factor in the fast path (we don't do anything with the allocated buffers!) and adaptive perform twice the number of such (ref cnt of buffer + release of size class segment id + release of reusable buffer wrapper) it is expected the performance difference.
I didn't yet investigated into the heap/direct difference since in Vertx/Quarkus we mostly use the latter under the hood, but is in my to-do list.

@laosijikaichele
Copy link
Copy Markdown
Contributor Author

reducing the benchmark input samples to a single not data dependent lookup? i.e. compute the next input samples and looking it up

Yes, the sizes array now is pre-shuffled in setup() method, and be used by only one lookup instead of double lookup earlier:

@laosijikaichele
Copy link
Copy Markdown
Contributor Author

laosijikaichele commented Aug 3, 2025

fixing the inlining problem I've mentioned earlier by using compiler control on the benchmark method to force it not be inlined?

Added @CompilerControl(CompilerControl.Mode.DONT_INLINE) on benchmark classes.

1. Allocate and release in same threads: jmh-link

截屏2025-08-04 04 22 54

2. Allocate in one thread, and release in another thread: jmh-link

截屏2025-08-04 04 22 45

@franz1981
Copy link
Copy Markdown
Contributor

Thanks,.I will check tomorrow if I still see other suspect call in the assembly in the hot path 🙏

@franz1981
Copy link
Copy Markdown
Contributor

franz1981 commented Aug 4, 2025

Added @CompilerControl(CompilerControl.Mode.DONT_INLINE) on benchmark classes.

I cannot see it yet..

@franz1981
Copy link
Copy Markdown
Contributor

franz1981 commented Aug 4, 2025

These are the numbers of my machine with JDk 21 (using the commit at #15509):

$ java -Djmh.executor=CUSTOM -Djmh.executor.class=io.netty.microbench.util.AbstractMicrobenchmark\$HarnessExecutor -jar microbench/target/microbenchmarks.jar io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark.*Direct -f 1 -t 1 -prof perfasm`

Benchmark                                                   Mode  Cnt         Score      Error  Units
ByteBufAllocatorAllocPatternBenchmark.adaptiveDirect       thrpt   10  19872573.143 ±  176.568  ops/s
ByteBufAllocatorAllocPatternBenchmark.mimallocDirect       thrpt   10  24398458.564 ±  130.254  ops/s
ByteBufAllocatorAllocPatternBenchmark.pooledDirect         thrpt   10   13815460.976 ± 238.903  ops/s

The adaptive ones are very interesting but as expected mostly related release to be more costly due to the additional atomic operation on the segment i.e.:

  • release of the pooled/recycled adaptive ByteBuf (which has its own reference count)
  • release of the size class chunk
  • release of the segment
....[Hottest Region 1]..............................................................................
c2, level 4, io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc, version 4, compile id 1607 

             # parm2:    r8:r8     = &apos;[Lio/netty/buffer/ByteBuf;&apos;
             #           [sp+0x70]  (sp of caller)
             0x00007f5b4c2654c0:   mov    0x8(%rsi),%r10d
             0x00007f5b4c2654c4:   movabs $0x7f5acf000000,%r11
             0x00007f5b4c2654ce:   add    %r11,%r10
             0x00007f5b4c2654d1:   cmp    %r10,%rax
             0x00007f5b4c2654d4:   jne    0x00007f5b4baad280           ;   {runtime_call ic_miss_stub}
             0x00007f5b4c2654da:   xchg   %ax,%ax
             0x00007f5b4c2654dc:   nopl   0x0(%rax)
           [Verified Entry Point]
   0.54%     0x00007f5b4c2654e0:   mov    %eax,-0x14000(%rsp)
             0x00007f5b4c2654e7:   push   %rbp
             0x00007f5b4c2654e8:   sub    $0x60,%rsp
             0x00007f5b4c2654ec:   cmpl   $0x1,0x20(%r15)
             0x00007f5b4c2654f4:   jne    0x00007f5b4c266590           ;*synchronization entry
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@-1 (line 121)
             0x00007f5b4c2654fa:   mov    %r8,%rbx
   0.47%     0x00007f5b4c2654fd:   mov    %rcx,0x8(%rsp)
             0x00007f5b4c265502:   mov    %rdx,(%rsp)
             0x00007f5b4c265506:   mov    0x34(%rsi),%r11d             ;*getfield sizes {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@1 (line 121)
             0x00007f5b4c26550a:   mov    0xc(%r12,%r11,8),%r10d       ; implicit exception: dispatches to 0x00007f5b4c266454
                                                                       ;*arraylength {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::getNextSizeIndex@16 (line 116)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@5 (line 121)
   0.02%     0x00007f5b4c26550f:   mov    0x14(%rsi),%ecx              ;*getfield nextSizeIndex {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::getNextSizeIndex@1 (line 115)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@5 (line 121)
             0x00007f5b4c265512:   lea    -0x1(%r10),%r9d
             0x00007f5b4c265516:   lea    0x1(%rcx),%edi
             0x00007f5b4c265519:   and    %r9d,%edi
   0.44%     0x00007f5b4c26551c:   mov    %edi,0x14(%rsi)              ;*putfield nextSizeIndex {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::getNextSizeIndex@20 (line 116)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@5 (line 121)
   0.01%     0x00007f5b4c26551f:   cmp    %r10d,%ecx
             0x00007f5b4c265522:   jae    0x00007f5b4c265cc1
             0x00007f5b4c265528:   mov    0x30(%rsi),%r8d              ;*getfield releaseIndexes {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::getNextReleaseIndex@13 (line 110)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@12 (line 122)
             0x00007f5b4c26552c:   mov    0xc(%r12,%r8,8),%r10d        ; implicit exception: dispatches to 0x00007f5b4c266468
                                                                       ;*arraylength {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::getNextReleaseIndex@16 (line 110)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@12 (line 122)
   0.01%     0x00007f5b4c265531:   mov    0x10(%rsi),%ebp              ;*getfield nextReleaseIndex {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::getNextReleaseIndex@1 (line 109)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@12 (line 122)
             0x00007f5b4c265534:   shl    $0x3,%r11
             0x00007f5b4c265538:   mov    0x10(%r11,%rcx,4),%r14d      ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@8 (line 121)
             0x00007f5b4c26553d:   lea    0x1(%rbp),%r11d
   0.47%     0x00007f5b4c265541:   lea    -0x1(%r10),%r9d
   0.01%     0x00007f5b4c265545:   and    %r9d,%r11d
             0x00007f5b4c265548:   mov    %r11d,0x10(%rsi)             ;*putfield nextReleaseIndex {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::getNextReleaseIndex@20 (line 110)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@12 (line 122)
   0.01%     0x00007f5b4c26554c:   cmp    %r10d,%ebp
             0x00007f5b4c26554f:   jae    0x00007f5b4c265ce4
   0.01%     0x00007f5b4c265555:   lea    (%r12,%r8,8),%r10
             0x00007f5b4c265559:   mov    0x10(%r10,%rbp,4),%r11d      ;*iaload {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::getNextReleaseIndex@28 (line 111)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@12 (line 122)
   0.02%     0x00007f5b4c26555e:   mov    %r11d,0x30(%rsp)
             0x00007f5b4c265563:   mov    %rbx,%r13
   0.32%     0x00007f5b4c265566:   mov    0xc(%rbx),%r11d              ; implicit exception: dispatches to 0x00007f5b4c26647c
             0x00007f5b4c26556a:   mov    0x30(%rsp),%r10d
   0.08%     0x00007f5b4c26556f:   cmp    %r11d,%r10d
             0x00007f5b4c265572:   jae    0x00007f5b4c265d08
   0.03%     0x00007f5b4c265578:   mov    %r10d,%r11d
   0.13%     0x00007f5b4c26557b:   lea    0x10(%rbx,%r11,4),%r8
             0x00007f5b4c265580:   mov    (%r8),%ebp                   ;*aaload {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@20 (line 123)
   0.76%     0x00007f5b4c265583:   test   %ebp,%ebp
             0x00007f5b4c265585:   je     0x00007f5b4c26587b           ;*ifnull {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@25 (line 124)
   0.11%     0x00007f5b4c26558b:   mov    0x8(%r12,%rbp,8),%r10d
  11.01%     0x00007f5b4c265590:   cmp    $0x109f400,%r10d             ;   {metadata(&apos;io/netty/buffer/AdaptivePoolingAllocator$AdaptiveByteBuf&apos;)}
             0x00007f5b4c265597:   jne    0x00007f5b4c265d44
   0.36%     0x00007f5b4c26559d:   lea    (%r12,%rbp,8),%r10           ;*invokevirtual release {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
             0x00007f5b4c2655a1:   mov    0x20(%r10),%eax              ;*invokevirtual getInt {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - sun.misc.Unsafe::getInt@5 (line 164)
                                                                       ; - io.netty.util.internal.PlatformDependent0::getInt@5 (line 679)
                                                                       ; - io.netty.util.internal.PlatformDependent::getInt@2 (line 690)
                                                                       ; - io.netty.util.internal.UnsafeReferenceCountUpdater::getRawRefCnt@5 (line 39)
                                                                       ; - io.netty.util.internal.ReferenceCountUpdater::release@2 (line 130)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@5 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   1.05%     0x00007f5b4c2655a5:   cmp    $0x2,%eax
             0x00007f5b4c2655a8:   jne    0x00007f5b4c2659b4           ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::offer@105 (line 148)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@10 (line 1438)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.41%     0x00007f5b4c2655ae:   mov    0x388(%r15),%r11
             0x00007f5b4c2655b5:   mov    (%r11),%rbx
             0x00007f5b4c2655b8:   mov    $0x2,%eax
             0x00007f5b4c2655bd:   mov    $0x1,%r11d
   0.08%     0x00007f5b4c2655c3:   lock cmpxchg %r11d,0x20(%r10)
  11.46%     0x00007f5b4c2655c9:   sete   %bpl
   0.49%     0x00007f5b4c2655cd:   movzbl %bpl,%ebp                    ;*invokevirtual compareAndSetInt {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - sun.misc.Unsafe::compareAndSwapInt@9 (line 922)
                                                                       ; - io.netty.util.internal.PlatformDependent0::compareAndSwapInt@8 (line 695)
                                                                       ; - io.netty.util.internal.PlatformDependent::compareAndSwapInt@5 (line 702)
                                                                       ; - io.netty.util.internal.UnsafeReferenceCountUpdater::casRawRefCnt@7 (line 54)
                                                                       ; - io.netty.util.internal.ReferenceCountUpdater::tryFinalRelease0@4 (line 143)
                                                                       ; - io.netty.util.internal.ReferenceCountUpdater::release@14 (line 131)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@5 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
             0x00007f5b4c2655d1:   test   %ebp,%ebp
             0x00007f5b4c2655d3:   je     0x00007f5b4c265edc           ;*ifne {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.util.internal.ReferenceCountUpdater::release@17 (line 131)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@5 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.34%     0x00007f5b4c2655d9:   mov    0x3c(%r10),%r9d              ;*getfield chunk {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@39 (line 1931)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.10%     0x00007f5b4c2655dd:   mov    0x8(%r12,%r9,8),%ebp         ; implicit exception: dispatches to 0x00007f5b4c266508
                                                                       ;*invokevirtual releaseSegment {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.07%     0x00007f5b4c2655e2:   mov    0x24(%r10),%r11d             ;*getfield startIndex {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@50 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
             0x00007f5b4c2655e6:   lea    (%r12,%r9,8),%rcx
   0.41%     0x00007f5b4c2655ea:   cmp    $0x10a46b8,%ebp              ;   {metadata(&apos;io/netty/buffer/AdaptivePoolingAllocator$SizeClassedChunk&apos;)}
             0x00007f5b4c2655f0:   jne    0x00007f5b4c265c5d           ;*invokevirtual releaseSegment {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
             0x00007f5b4c2655f6:   mov    0x14(%rcx),%eax              ;*invokevirtual getInt {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - sun.misc.Unsafe::getInt@5 (line 164)
                                                                       ; - io.netty.util.internal.PlatformDependent0::getInt@5 (line 679)
                                                                       ; - io.netty.util.internal.PlatformDependent::getInt@2 (line 690)
                                                                       ; - io.netty.util.internal.UnsafeReferenceCountUpdater::getRawRefCnt@5 (line 39)
                                                                       ; - io.netty.util.internal.ReferenceCountUpdater::release@2 (line 130)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$Chunk::release@4 (line 1238)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@1 (line 1437)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
             0x00007f5b4c2655f9:   cmp    $0x2,%eax
             0x00007f5b4c2655fc:   je     0x00007f5b4c265948           ;*if_icmpne {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.util.internal.ReferenceCountUpdater::release@8 (line 131)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$Chunk::release@4 (line 1238)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@1 (line 1437)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.04%     0x00007f5b4c265602:   cmp    $0x4,%eax
          ╭  0x00007f5b4c265605:   je     0x00007f5b4c265621           ;*if_icmpeq {reexecute=0 rethrow=0 return_oop=0}
          │                                                            ; - io.netty.util.internal.ReferenceCountUpdater::toLiveRealRefCnt@7 (line 70)
          │                                                            ; - io.netty.util.internal.ReferenceCountUpdater::release@43 (line 132)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$Chunk::release@4 (line 1238)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@1 (line 1437)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
          │                                                            ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
          │                                                            ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
          │                                                            ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.32%  │  0x00007f5b4c265607:   mov    %eax,%ebp
          │  0x00007f5b4c265609:   and    $0x1,%ebp                    ;*iand {reexecute=0 rethrow=0 return_oop=0}
          │                                                            ; - io.netty.util.internal.ReferenceCountUpdater::toLiveRealRefCnt@12 (line 70)
          │                                                            ; - io.netty.util.internal.ReferenceCountUpdater::release@43 (line 132)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$Chunk::release@4 (line 1238)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@1 (line 1437)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
          │                                                            ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
          │                                                            ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
          │                                                            ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
          │  0x00007f5b4c26560c:   test   %ebp,%ebp
          │  0x00007f5b4c26560e:   jne    0x00007f5b4c266039           ;*ifne {reexecute=0 rethrow=0 return_oop=0}
          │                                                            ; - io.netty.util.internal.ReferenceCountUpdater::toLiveRealRefCnt@13 (line 70)
          │                                                            ; - io.netty.util.internal.ReferenceCountUpdater::release@43 (line 132)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$Chunk::release@4 (line 1238)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@1 (line 1437)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
          │                                                            ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
          │                                                            ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
          │                                                            ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.01%  │  0x00007f5b4c265614:   mov    %eax,%ebp
   0.04%  │  0x00007f5b4c265616:   shr    %ebp                         ;*iushr {reexecute=0 rethrow=0 return_oop=0}
          │                                                            ; - io.netty.util.internal.ReferenceCountUpdater::toLiveRealRefCnt@18 (line 71)
          │                                                            ; - io.netty.util.internal.ReferenceCountUpdater::release@43 (line 132)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$Chunk::release@4 (line 1238)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@1 (line 1437)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
          │                                                            ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
          │                                                            ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
          │                                                            ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
          │  0x00007f5b4c265618:   cmp    $0x1,%ebp
          │  0x00007f5b4c26561b:   jle    0x00007f5b4c266080
   0.03%  ↘  0x00007f5b4c265621:   lea    -0x2(%rax),%ebp
             0x00007f5b4c265624:   lock cmpxchg %ebp,0x14(%rcx)
   8.06%     0x00007f5b4c265629:   sete   %bpl
   0.55%     0x00007f5b4c26562d:   movzbl %bpl,%ebp                    ;*invokevirtual compareAndSetInt {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - sun.misc.Unsafe::compareAndSwapInt@9 (line 922)
                                                                       ; - io.netty.util.internal.PlatformDependent0::compareAndSwapInt@8 (line 695)
                                                                       ; - io.netty.util.internal.PlatformDependent::compareAndSwapInt@5 (line 702)
                                                                       ; - io.netty.util.internal.UnsafeReferenceCountUpdater::casRawRefCnt@7 (line 54)
                                                                       ; - io.netty.util.internal.ReferenceCountUpdater::nonFinalRelease0@14 (line 149)
                                                                       ; - io.netty.util.internal.ReferenceCountUpdater::release@46 (line 132)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$Chunk::release@4 (line 1238)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@1 (line 1437)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.01%     0x00007f5b4c265631:   test   %ebp,%ebp
             0x00007f5b4c265633:   je     0x00007f5b4c2660c4           ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.util.internal.ReferenceCountUpdater::nonFinalRelease0@17 (line 149)
                                                                       ; - io.netty.util.internal.ReferenceCountUpdater::release@46 (line 132)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$Chunk::release@4 (line 1238)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@1 (line 1437)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.38%     0x00007f5b4c265639:   mov    0x30(%rcx),%ebp              ;*getfield freeList {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@6 (line 1438)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
             0x00007f5b4c26563c:   mov    0x8(%r12,%rbp,8),%ecx        ; implicit exception: dispatches to 0x00007f5b4c2664dc
             0x00007f5b4c265641:   cmp    $0x10b30d0,%ecx              ;   {metadata(&apos;io/netty/util/concurrent/MpscIntQueue$MpscAtomicIntegerArrayQueue&apos;)}
             0x00007f5b4c265647:   jne    0x00007f5b4c265e64
   0.01%     0x00007f5b4c26564d:   lea    (%r12,%rbp,8),%rcx           ;*invokeinterface offer {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@10 (line 1438)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.39%     0x00007f5b4c265651:   mov    0x2c(%rcx),%ebp              ;*getfield emptyValue {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::offer@2 (line 127)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@10 (line 1438)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.01%     0x00007f5b4c265654:   cmp    %ebp,%r11d
             0x00007f5b4c265657:   je     0x00007f5b4c266108           ;*if_icmpne {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::offer@5 (line 127)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@10 (line 1438)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.01%     0x00007f5b4c26565d:   mov    0x28(%rcx),%r9d              ;*getfield mask {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::offer@39 (line 131)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@10 (line 1438)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.01%     0x00007f5b4c265661:   mov    0x18(%rcx),%rdx              ;*getfield producerLimit {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::offer@44 (line 132)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@10 (line 1438)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.28%     0x00007f5b4c265665:   mov    0x10(%rcx),%rdi              ;*getfield producerIndex {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::offer@49 (line 135)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@10 (line 1438)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
             0x00007f5b4c265669:   movslq %r9d,%rsi                    ;*i2l {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::offer@70 (line 138)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@10 (line 1438)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
             0x00007f5b4c26566c:   cmp    %rdx,%rdi
             0x00007f5b4c26566f:   jge    0x00007f5b4c265a6b           ;*getstatic PRODUCER_INDEX {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::offer@92 (line 148)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@10 (line 1438)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
             0x00007f5b4c265675:   vmovd  %r14d,%xmm1
   0.06%     0x00007f5b4c26567a:   lea    0x1(%rdi),%rbp
             0x00007f5b4c26567e:   mov    %rdi,%rax
             0x00007f5b4c265681:   lock cmpxchg %rbp,0x10(%rcx)
   7.22%     0x00007f5b4c265687:   sete   %bpl
   0.36%     0x00007f5b4c26568b:   movzbl %bpl,%ebp                    ;*invokevirtual compareAndSetLong {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - java.util.concurrent.atomic.AtomicLongFieldUpdater$CASUpdater::compareAndSet@16 (line 464)
                                                                       ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::offer@102 (line 148)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@10 (line 1438)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
             0x00007f5b4c26568f:   test   %ebp,%ebp
             0x00007f5b4c265691:   je     0x00007f5b4c266148           ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::offer@105 (line 148)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@10 (line 1438)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.35%     0x00007f5b4c265697:   mov    0xc(%rcx),%ecx               ;*getfield array {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - java.util.concurrent.atomic.AtomicIntegerArray::lazySet@4 (line 118)
                                                                       ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::offer@120 (line 155)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@10 (line 1438)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
             0x00007f5b4c26569a:   mov    0xc(%r12,%rcx,8),%r9d        ; implicit exception: dispatches to 0x00007f5b4c2664f4
                                                                       ;*arraylength {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - java.lang.invoke.VarHandleInts$Array::setRelease@20 (line 811)
                                                                       ; - java.lang.invoke.VarHandleGuards::guard_LII_V@50 (line 757)
                                                                       ; - java.util.concurrent.atomic.AtomicIntegerArray::lazySet@9 (line 118)
                                                                       ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::offer@120 (line 155)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@10 (line 1438)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
             0x00007f5b4c26569f:   and    %rsi,%rdi
             0x00007f5b4c2656a2:   mov    %edi,%ebp                    ;*l2i {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::offer@113 (line 154)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@10 (line 1438)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.42%     0x00007f5b4c2656a4:   cmp    %r9d,%ebp
             0x00007f5b4c2656a7:   jae    0x00007f5b4c265e20
             0x00007f5b4c2656ad:   lea    (%r12,%rcx,8),%r9
             0x00007f5b4c2656b1:   mov    %r11d,0x10(%r9,%rbp,4)
             0x00007f5b4c2656b6:   cmpb   $0x0,0x40(%r15)
   0.51%     0x00007f5b4c2656bb:   jne    0x00007f5b4c265b1c
             0x00007f5b4c2656c1:   cmpb   $0x0,0x40(%r15)
             0x00007f5b4c2656c6:   jne    0x00007f5b4c265b53           ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::offer@105 (line 148)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@10 (line 1438)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
             0x00007f5b4c2656cc:   cmpb   $0x0,0x40(%r15)
   0.05%     0x00007f5b4c2656d1:   jne    0x00007f5b4c265b8a
             0x00007f5b4c2656d7:   mov    %r12d,0x38(%r10)             ;*putfield rootParent {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@69 (line 1936)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
             0x00007f5b4c2656db:   mov    0x34(%r10),%ebp              ;*getfield handle {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@73 (line 1937)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.01%     0x00007f5b4c2656df:   mov    %r12d,0x3c(%r10)             ;*putfield chunk {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@64 (line 1935)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.40%     0x00007f5b4c2656e3:   mov    %r12d,0x40(%r10)             ;*putfield tmpNioBuf {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@59 (line 1934)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
             0x00007f5b4c2656e7:   mov    0x8(%r12,%rbp,8),%ecx        ; implicit exception: dispatches to 0x00007f5b4c2664ac
                                                                       ;*invokeinterface recycle {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@103 (line 1941)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
             0x00007f5b4c2656ec:   lea    (%r12,%rbp,8),%r11
             0x00007f5b4c2656f0:   movabs $0x7f5ad00a4018,%r9          ;   {metadata(&apos;io/netty/util/Recycler$EnhancedHandle&apos;)}
   0.02%     0x00007f5b4c2656fa:   movabs $0x7f5acf000000,%rdi
             0x00007f5b4c265704:   add    %rcx,%rdi
             0x00007f5b4c265707:   mov    0x38(%rdi),%rdi
   0.05%     0x00007f5b4c26570b:   cmp    %r9,%rdi
             0x00007f5b4c26570e:   jne    0x00007f5b4c265a18           ;*checkcast {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@86 (line 1938)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.29%     0x00007f5b4c265714:   mov    0x8(%r11),%r9d
             0x00007f5b4c265718:   cmp    $0x10a4250,%r9d              ;   {metadata(&apos;io/netty/util/Recycler$1&apos;)}
             0x00007f5b4c26571f:   je     0x00007f5b4c265876
             0x00007f5b4c265725:   cmp    $0x10abe10,%r9d              ;   {metadata(&apos;io/netty/util/Recycler$DefaultHandle&apos;)}
             0x00007f5b4c26572c:   jne    0x00007f5b4c2661e8           ;*invokevirtual unguardedRecycle {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@92 (line 1939)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
....................................................................................................
  49.07%  <total for region 1>

....[Hottest Region 2]..............................................................................
c2, level 4, io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate, version 7, compile id 1642 

               # parm3:    r9        = boolean
               #           [sp+0x70]  (sp of caller)
               0x00007f5b4c2805a0:   mov    0x8(%rsi),%r10d
               0x00007f5b4c2805a4:   movabs $0x7f5acf000000,%r11
               0x00007f5b4c2805ae:   add    %r11,%r10
               0x00007f5b4c2805b1:   cmp    %r10,%rax
               0x00007f5b4c2805b4:   jne    0x00007f5b4baad280           ;   {runtime_call ic_miss_stub}
               0x00007f5b4c2805ba:   xchg   %ax,%ax
               0x00007f5b4c2805bc:   nopl   0x0(%rax)
             [Verified Entry Point]
   0.36%       0x00007f5b4c2805c0:   mov    %eax,-0x14000(%rsp)
               0x00007f5b4c2805c7:   push   %rbp
               0x00007f5b4c2805c8:   sub    $0x60,%rsp
               0x00007f5b4c2805cc:   cmpl   $0x1,0x20(%r15)
   0.01%       0x00007f5b4c2805d4:   jne    0x00007f5b4c281c0a           ;*synchronization entry
                                                                         ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@-1 (line 912)
               0x00007f5b4c2805da:   mov    %r8,0x30(%rsp)
   0.42%       0x00007f5b4c2805df:   mov    %ecx,0x2c(%rsp)
               0x00007f5b4c2805e3:   mov    %edx,0x14(%rsp)
               0x00007f5b4c2805e7:   mov    %rsi,0x40(%rsp)              ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
                                                                         ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
               0x00007f5b4c2805ec:   mov    0x18(%rsi),%r10d             ;*getfield chunkController {reexecute=0 rethrow=0 return_oop=0}
                                                                         ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@1 (line 912)
   0.03%       0x00007f5b4c2805f0:   mov    0x8(%r12,%r10,8),%r8d        ; implicit exception: dispatches to 0x00007f5b4c28193c
               0x00007f5b4c2805f5:   lea    (%r12,%r10,8),%r11
               0x00007f5b4c2805f9:   cmp    $0x10a4488,%r8d              ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
                                                                         ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
                                                                         ;   {metadata(&apos;io/netty/buffer/AdaptivePoolingAllocator$SizeClassChunkController&apos;)}
               0x00007f5b4c280600:   jne    0x00007f5b4c280f29           ;*invokeinterface computeBufferCapacity {reexecute=0 rethrow=0 return_oop=0}
                                                                         ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@8 (line 912)
   0.48%       0x00007f5b4c280606:   mov    0xc(%r11),%r10d
               0x00007f5b4c28060a:   cmp    %ecx,%r10d                   ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
                                                                         ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
               0x00007f5b4c28060d:   cmovg  %ecx,%r10d                   ;*invokestatic min {reexecute=0 rethrow=0 return_oop=0}
                                                                         ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassChunkController::computeBufferCapacity@5 (line 575)
                                                                         ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@8 (line 912)
               0x00007f5b4c280611:   mov    %r10d,0x48(%rsp)             ;*invokeinterface computeBufferCapacity {reexecute=0 rethrow=0 return_oop=0}
                                                                         ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@8 (line 912)
               0x00007f5b4c280616:   mov    0x40(%rsp),%r10              ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
                                                                         ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
   0.05%       0x00007f5b4c28061b:   mov    0xc(%r10),%r10d              ;*getfield current {reexecute=0 rethrow=0 return_oop=0}
                                                                         ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@16 (line 913)
   0.02%       0x00007f5b4c28061f:   test   %r10d,%r10d
               0x00007f5b4c280622:   je     0x00007f5b4c28091c           ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
                                                                         ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
               0x00007f5b4c280628:   mov    0x8(%r12,%r10,8),%r8d        ;*invokevirtual remainingCapacity {reexecute=0 rethrow=0 return_oop=0}
                                                                         ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@28 (line 916)
   0.45%       0x00007f5b4c28062d:   lea    (%r12,%r10,8),%r9
               0x00007f5b4c280631:   cmp    $0x10a46b8,%r8d              ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
                                                                         ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
                                                                         ;   {metadata(&apos;io/netty/buffer/AdaptivePoolingAllocator$SizeClassedChunk&apos;)}
               0x00007f5b4c280638:   jne    0x00007f5b4c280e58           ;*invokevirtual remainingCapacity {reexecute=0 rethrow=0 return_oop=0}
                                                                         ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@28 (line 916)
   0.01%       0x00007f5b4c28063e:   mov    0xc(%r9),%r11d
               0x00007f5b4c280642:   sub    0x10(%r9),%r11d
   0.06%       0x00007f5b4c280646:   mov    0x2c(%r9),%r8d
               0x00007f5b4c28064a:   cmp    %r8d,%r11d
          ╭    0x00007f5b4c28064d:   jle    0x00007f5b4c28083d           ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
   0.03%  │    0x00007f5b4c280653:   mov    0x8(%r12,%r10,8),%ebp        ;*invokevirtual readInitInto {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.01%  │    0x00007f5b4c280658:   lea    (%r12,%r10,8),%rbx
   0.43%  │    0x00007f5b4c28065c:   cmp    0x48(%rsp),%r11d
          │    0x00007f5b4c280661:   jle    0x00007f5b4c280c21
   0.08%  │    0x00007f5b4c280667:   cmp    $0x10a46b8,%ebp              ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
          │                                                              ;   {metadata(&apos;io/netty/buffer/AdaptivePoolingAllocator$SizeClassedChunk&apos;)}
          │    0x00007f5b4c28066d:   jne    0x00007f5b4c280e72           ;*invokevirtual readInitInto {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.01%  │    0x00007f5b4c280673:   mov    0x30(%rbx),%ebp              ;*getfield freeList {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@1 (line 1387)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
          │    0x00007f5b4c280676:   mov    0x8(%r12,%rbp,8),%r10d       ; implicit exception: dispatches to 0x00007f5b4c28195c
   0.16%  │    0x00007f5b4c28067b:   cmp    $0x10b30d0,%r10d             ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
          │                                                              ;   {metadata(&apos;io/netty/util/concurrent/MpscIntQueue$MpscAtomicIntegerArrayQueue&apos;)}
          │    0x00007f5b4c280682:   jne    0x00007f5b4c2813dc           ;*synchronization entry
          │                                                              ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::poll@-1 (line 162)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@4 (line 1387)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.03%  │    0x00007f5b4c280688:   lea    (%r12,%rbp,8),%rdi           ;*invokeinterface poll {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@4 (line 1387)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.01%  │    0x00007f5b4c28068c:   mov    0x20(%rdi),%r11              ;*getfield consumerIndex {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::poll@1 (line 162)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@4 (line 1387)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.04%  │    0x00007f5b4c280690:   mov    0xc(%rdi),%r8d               ;*getfield array {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - java.util.concurrent.atomic.AtomicIntegerArray::get@4 (line 95)
          │                                                              ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::poll@16 (line 165)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@4 (line 1387)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.48%  │    0x00007f5b4c280694:   mov    0xc(%r12,%r8,8),%ecx         ; implicit exception: dispatches to 0x00007f5b4c281978
          │                                                              ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
   0.58%  │    0x00007f5b4c280699:   mov    0x28(%rdi),%r9d              ;*getfield mask {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::poll@7 (line 163)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@4 (line 1387)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
          │    0x00007f5b4c28069d:   movslq %r9d,%r9
          │    0x00007f5b4c2806a0:   and    %r11,%r9                     ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
   0.04%  │    0x00007f5b4c2806a3:   mov    %r9d,%r9d                    ;*l2i {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::poll@12 (line 163)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@4 (line 1387)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.07%  │    0x00007f5b4c2806a6:   cmp    %ecx,%r9d
          │    0x00007f5b4c2806a9:   jae    0x00007f5b4c281303           ;*invokevirtual getIntVolatile {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - java.lang.invoke.VarHandleInts$Array::getVolatile@38 (line 769)
          │                                                              ; - java.lang.invoke.VarHandleGuards::guard_LI_I@45 (line 163)
          │                                                              ; - java.util.concurrent.atomic.AtomicIntegerArray::get@8 (line 95)
          │                                                              ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::poll@16 (line 165)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@4 (line 1387)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.02%  │    0x00007f5b4c2806af:   lea    (%r12,%r8,8),%r10            ;*invokestatic checkIndex {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - java.lang.invoke.VarHandleInts$Array::getVolatile@23 (line 770)
          │                                                              ; - java.lang.invoke.VarHandleGuards::guard_LI_I@45 (line 163)
          │                                                              ; - java.util.concurrent.atomic.AtomicIntegerArray::get@8 (line 95)
          │                                                              ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::poll@16 (line 165)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@4 (line 1387)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
          │    0x00007f5b4c2806b3:   mov    0x10(%r10,%r9,4),%r13d       ;*invokevirtual getIntVolatile {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - java.lang.invoke.VarHandleInts$Array::getVolatile@38 (line 769)
          │                                                              ; - java.lang.invoke.VarHandleGuards::guard_LI_I@45 (line 163)
          │                                                              ; - java.util.concurrent.atomic.AtomicIntegerArray::get@8 (line 95)
          │                                                              ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::poll@16 (line 165)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@4 (line 1387)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.86%  │    0x00007f5b4c2806b8:   mov    0x2c(%rdi),%r10d             ;*getfield emptyValue {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::poll@22 (line 166)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@4 (line 1387)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.01%  │    0x00007f5b4c2806bc:   cmp    %r13d,%r10d
          │    0x00007f5b4c2806bf:   je     0x00007f5b4c2814c8           ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
   0.16%  │    0x00007f5b4c2806c5:   mov    0xc(%rdi),%ecx               ;*getfield array {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - java.util.concurrent.atomic.AtomicIntegerArray::lazySet@4 (line 118)
          │                                                              ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::poll@69 (line 180)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@4 (line 1387)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
          │    0x00007f5b4c2806c8:   mov    0xc(%r12,%rcx,8),%ebp        ; implicit exception: dispatches to 0x00007f5b4c281994
   0.03%  │    0x00007f5b4c2806cd:   cmp    %ebp,%r9d
          │    0x00007f5b4c2806d0:   jae    0x00007f5b4c28134c           ;*invokevirtual putIntRelease {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - java.lang.invoke.VarHandleInts$Array::setRelease@42 (line 811)
          │                                                              ; - java.lang.invoke.VarHandleGuards::guard_LII_V@50 (line 757)
          │                                                              ; - java.util.concurrent.atomic.AtomicIntegerArray::lazySet@9 (line 118)
          │                                                              ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::poll@69 (line 180)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@4 (line 1387)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.03%  │    0x00007f5b4c2806d6:   add    $0x1,%r11
   0.02%  │    0x00007f5b4c2806da:   lea    (%r12,%rcx,8),%r8            ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
          │    0x00007f5b4c2806de:   mov    %r10d,0x10(%r8,%r9,4)
   0.31%  │    0x00007f5b4c2806e3:   mov    %r11,0x20(%rdi)              ;*invokevirtual putLongRelease {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - java.util.concurrent.atomic.AtomicLongFieldUpdater$CASUpdater::lazySet@14 (line 479)
          │                                                              ; - io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::poll@79 (line 181)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@4 (line 1387)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.01%  │    0x00007f5b4c2806e7:   cmp    $0xffffffff,%r13d
          │    0x00007f5b4c2806eb:   je     0x00007f5b4c281510
   0.08%  │    0x00007f5b4c2806f1:   mov    0x2c(%rbx),%r10d             ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
          │    0x00007f5b4c2806f5:   add    %r10d,0x10(%rbx)
   0.02%  │    0x00007f5b4c2806f9:   mov    $0x2,%r10d
   0.01%  │    0x00007f5b4c2806ff:   lock xadd %r10d,0x14(%rbx)          ;*invokevirtual getAndAddInt {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - sun.misc.Unsafe::getAndAddInt@7 (line 1150)
          │                                                              ; - io.netty.util.internal.PlatformDependent0::getAndAddInt@6 (line 691)
          │                                                              ; - io.netty.util.internal.PlatformDependent::getAndAddInt@3 (line 698)
          │                                                              ; - io.netty.util.internal.UnsafeReferenceCountUpdater::getAndAddRawRefCnt@6 (line 34)
          │                                                              ; - io.netty.util.internal.ReferenceCountUpdater::retain0@3 (line 115)
          │                                                              ; - io.netty.util.internal.ReferenceCountUpdater::retain@4 (line 104)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Chunk::retain@4 (line 1223)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@45 (line 1393)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   9.22%  │    0x00007f5b4c280705:   cmp    $0x2,%r10d
          │    0x00007f5b4c280709:   je     0x00007f5b4c280ccb
   0.36%  │    0x00007f5b4c28070f:   cmp    $0x4,%r10d
          │    0x00007f5b4c280713:   je     0x00007f5b4c280cdc
          │    0x00007f5b4c280719:   mov    %r10d,%ebp
          │    0x00007f5b4c28071c:   and    $0x1,%ebp
   0.03%  │    0x00007f5b4c28071f:   test   %ebp,%ebp
          │    0x00007f5b4c280721:   jne    0x00007f5b4c28156c
   0.38%  │    0x00007f5b4c280727:   test   %r10d,%r10d                  ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
          │    0x00007f5b4c28072a:   jle    0x00007f5b4c2815a4
   0.05%  │    0x00007f5b4c280730:   lea    0x2(%r10),%r11d              ;*if_icmpeq {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.util.internal.ReferenceCountUpdater::retain0@17 (line 116)
          │                                                              ; - io.netty.util.internal.ReferenceCountUpdater::retain@4 (line 104)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Chunk::retain@4 (line 1223)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@45 (line 1393)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.04%  │    0x00007f5b4c280734:   cmp    %r10d,%r11d
          │    0x00007f5b4c280737:   jl     0x00007f5b4c281528           ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
   0.22%  │    0x00007f5b4c28073d:   mov    0x1c(%rbx),%ebp              ;*getfield delegate {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@51 (line 1395)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.25%  │    0x00007f5b4c280740:   mov    0x30(%rsp),%r14
   0.01%  │    0x00007f5b4c280745:   test   %r14,%r14                    ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
          │    0x00007f5b4c280748:   je     0x00007f5b4c2813a8           ;*putfield chunk {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::init@8 (line 1466)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@64 (line 1395)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
          │    0x00007f5b4c28074e:   cmpb   $0x0,0x40(%r15)              ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
   0.15%  │    0x00007f5b4c280753:   jne    0x00007f5b4c280ced           ;*putfield chunk {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::init@8 (line 1466)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@64 (line 1395)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.09%  │    0x00007f5b4c280759:   mov    %r13d,0x24(%r14)             ;*putfield startIndex {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::init@3 (line 1465)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@64 (line 1395)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
          │    0x00007f5b4c28075d:   mov    %rbx,%r11
          │    0x00007f5b4c280760:   shr    $0x3,%r11                    ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
   0.10%  │    0x00007f5b4c280764:   mov    %r11d,0x3c(%r14)             ;*putfield chunk {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::init@8 (line 1466)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@64 (line 1395)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.18%  │    0x00007f5b4c280768:   mov    %rbx,%r10                    ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
          │    0x00007f5b4c28076b:   mov    %r14,%r11                    ;*putfield chunk {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::init@8 (line 1466)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@64 (line 1395)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
          │    0x00007f5b4c28076e:   xor    %r11,%r10
   0.09%  │    0x00007f5b4c280771:   shr    $0x17,%r10
   0.02%  │    0x00007f5b4c280775:   test   %r10,%r10
          │╭   0x00007f5b4c280778:   je     0x00007f5b4c280794
          ││   0x00007f5b4c28077a:   shr    $0x9,%r11                    ;   {no_reloc}
          ││   0x00007f5b4c28077e:   movabs $0x7f5b3dbb5000,%rdi
   0.04%  ││   0x00007f5b4c280788:   add    %r11,%rdi
   0.08%  ││   0x00007f5b4c28078b:   cmpb   $0x2,(%rdi)                  ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          ││                                                             ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
   0.03%  ││   0x00007f5b4c28078e:   jne    0x00007f5b4c280d92           ;*putfield chunk {reexecute=0 rethrow=0 return_oop=0}
          ││                                                             ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::init@8 (line 1466)
          ││                                                             ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@64 (line 1395)
          ││                                                             ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
          │↘   0x00007f5b4c280794:   mov    0x2c(%rsp),%r11d             ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
   0.01%  │    0x00007f5b4c280799:   mov    %r11d,0x1c(%r14)             ;*putfield maxCapacity {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AbstractByteBuf::maxCapacity@2 (line 102)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::init@26 (line 1469)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@64 (line 1395)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.11%  │    0x00007f5b4c28079d:   mov    0x48(%rsp),%r11d             ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
   0.10%  │    0x00007f5b4c2807a2:   mov    %r11d,0x2c(%r14)             ;*putfield maxFastCapacity {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::init@20 (line 1468)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@64 (line 1395)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
          │    0x00007f5b4c2807a6:   mov    0x14(%rsp),%r10d             ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
   0.06%  │    0x00007f5b4c2807ab:   mov    %r10d,0x28(%r14)             ;*putfield length {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::init@14 (line 1467)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@64 (line 1395)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.15%  │    0x00007f5b4c2807af:   mov    %r12d,0x10(%r14)             ;*putfield writerIndex {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AbstractByteBuf::setIndex0@7 (line 1486)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::init@33 (line 1470)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@64 (line 1395)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
          │    0x00007f5b4c2807b3:   mov    %r12d,0xc(%r14)              ;*putfield readerIndex {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AbstractByteBuf::setIndex0@2 (line 1485)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::init@33 (line 1470)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@64 (line 1395)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.04%  │    0x00007f5b4c2807b7:   mov    0x8(%r12,%rbp,8),%r11d       ; implicit exception: dispatches to 0x00007f5b4c2819b0
   0.04%  │    0x00007f5b4c2807bc:   cmp    $0x1091cb0,%r11d             ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
          │                                                              ;   {metadata(&apos;io/netty/buffer/UnpooledUnsafeDirectByteBuf&apos;)}
          │    0x00007f5b4c2807c3:   jne    0x00007f5b4c281408           ;*putfield chunk {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::init@8 (line 1466)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@64 (line 1395)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.05%  │    0x00007f5b4c2807c9:   cmpb   $0x0,0x40(%r15)              ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
   0.10%  │    0x00007f5b4c2807ce:   jne    0x00007f5b4c280d24           ;*putfield rootParent {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::init@54 (line 1473)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@64 (line 1395)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.01%  │    0x00007f5b4c2807d4:   movb   $0x1,0x31(%r14)              ;*putfield hasMemoryAddress {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::init@49 (line 1472)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@64 (line 1395)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
          │    0x00007f5b4c2807d9:   mov    %r12b,0x30(%r14)             ;*putfield hasArray {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::init@41 (line 1471)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@64 (line 1395)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.11%  │    0x00007f5b4c2807dd:   lea    (%r12,%rbp,8),%r10           ;*invokevirtual hasArray {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::init@38 (line 1471)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@64 (line 1395)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.02%  │    0x00007f5b4c2807e1:   mov    %r14,%r11
          │    0x00007f5b4c2807e4:   mov    %r10,%r8
   0.01%  │    0x00007f5b4c2807e7:   shr    $0x3,%r8
   0.16%  │    0x00007f5b4c2807eb:   mov    %r8d,0x38(%r14)
   0.12%  │    0x00007f5b4c2807ef:   xor    %r11,%r10
          │    0x00007f5b4c2807f2:   shr    $0x17,%r10
   0.07%  │    0x00007f5b4c2807f6:   test   %r10,%r10
          │ ╭  0x00007f5b4c2807f9:   je     0x00007f5b4c280816
   0.12%  │ │  0x00007f5b4c2807fb:   shr    $0x9,%r11
          │ │  0x00007f5b4c2807ff:   movabs $0x7f5b3dbb5000,%r8
          │ │  0x00007f5b4c280809:   add    %r11,%r8
          │ │  0x00007f5b4c28080c:   cmpb   $0x2,(%r8)                   ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │ │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
          │ │  0x00007f5b4c280810:   jne    0x00007f5b4c280de0           ;*putfield chunk {reexecute=0 rethrow=0 return_oop=0}
          │ │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::init@8 (line 1466)
          │ │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@64 (line 1395)
          │ │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.04%  │ ↘  0x00007f5b4c280816:   cmpb   $0x0,0x40(%r15)              ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
          │    0x00007f5b4c28081b:   jne    0x00007f5b4c280d5b
   0.08%  │    0x00007f5b4c280821:   mov    %r12d,0x40(%r14)             ;*invokevirtual readInitInto {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.07%  │    0x00007f5b4c280825:   mov    $0x1,%eax
   0.10%  │    0x00007f5b4c28082a:   add    $0x60,%rsp
   0.16%  │    0x00007f5b4c28082e:   pop    %rbp
   0.02%  │    0x00007f5b4c28082f:   cmp    0x450(%r15),%rsp             ;   {poll_return}
          │    0x00007f5b4c280836:   ja     0x00007f5b4c281bf4
   0.07%  │    0x00007f5b4c28083c:   ret                                 ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
   0.03%  ↘    0x00007f5b4c28083d:   mov    0x30(%r9),%ebp               ;*getfield freeList {reexecute=0 rethrow=0 return_oop=0}
                                                                         ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::remainingCapacity@16 (line 1414)
                                                                         ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@28 (line 916)
   0.06%       0x00007f5b4c280841:   mov    0x8(%r12,%rbp,8),%r8d        ; implicit exception: dispatches to 0x00007f5b4c2819cc
   0.05%       0x00007f5b4c280846:   cmp    $0x10b30d0,%r8d              ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
                                                                         ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
                                                                         ;   {metadata(&apos;io/netty/util/concurrent/MpscIntQueue$MpscAtomicIntegerArrayQueue&apos;)}
               0x00007f5b4c28084d:   jne    0x00007f5b4c2815e4
               0x00007f5b4c280853:   mov    %r11d,0x28(%rsp)
               0x00007f5b4c280858:   mov    %r9,0x38(%rsp)
               0x00007f5b4c28085d:   mov    %r10d,0x24(%rsp)
               0x00007f5b4c280862:   mov    %rax,-0x8(%rsp)
   0.03%       0x00007f5b4c280867:   mov    0x2c(%rsp),%eax
               0x00007f5b4c28086b:   mov    %eax,0x20(%rsp)
               0x00007f5b4c28086f:   mov    -0x8(%rsp),%rax
....................................................................................................
  18.77%  <total for region 2>

....[Hottest Regions]...............................................................................
  49.07%         c2, level 4  io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc, version 4, compile id 1607 
  18.77%         c2, level 4  io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate, version 7, compile id 1642 
   7.22%         c2, level 4  io.netty.buffer.AdaptiveByteBufAllocator::newDirectBuffer, version 2, compile id 1558 
   5.87%         c2, level 4  io.netty.buffer.AdaptiveByteBufAllocator::newDirectBuffer, version 2, compile id 1558 
   3.13%         c2, level 4  io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc, version 4, compile id 1607 
   2.63%         c2, level 4  io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc, version 4, compile id 1607 
   2.40%         c2, level 4  io.netty.buffer.AdaptiveByteBufAllocator::newDirectBuffer, version 2, compile id 1558 
   2.40%         c2, level 4  java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl::lazySet, version 2, compile id 1598 
   1.16%         c2, level 4  io.netty.microbench.buffer.jmh_generated.ByteBufAllocatorAllocPatternBenchmark_adaptiveDirect_jmhTest::adaptiveDirect_thrpt_jmhStub, version 5, compile id 1667 
   1.03%   [kernel.kallsyms]  native_write_msr 
   0.72%         c2, level 4  io.netty.buffer.AdaptiveByteBufAllocator::newDirectBuffer, version 2, compile id 1558 
   0.55%         c2, level 4  io.netty.buffer.AdaptiveByteBufAllocator::newDirectBuffer, version 2, compile id 1558 
   0.34%         c2, level 4  io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto, version 2, compile id 1585 
   0.32%         c2, level 4  io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate, version 7, compile id 1642 
   0.22%         c2, level 4  io.netty.util.internal.shaded.org.jctools.queues.MpmcArrayQueue::poll, version 2, compile id 1616 
   0.21%         c2, level 4  java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl::getAndSet, version 2, compile id 1638 
   0.18%         c2, level 4  io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseFromMagazine, version 2, compile id 1640 
   0.18%         c2, level 4  io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseFromMagazine, version 2, compile id 1640 
   0.16%         c2, level 4  io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto, version 2, compile id 1585 
   0.13%           libjvm.so  ElfSymbolTable::lookup(unsigned char*, int*, int*, int*, ElfFuncDescTable*) 
   3.29%  <...other 204 warm regions...>
....................................................................................................
  99.99%  <totals>

....[Hottest Methods (after inlining)]..............................................................
  54.87%         c2, level 4  io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc, version 4, compile id 1607 
  19.28%         c2, level 4  io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate, version 7, compile id 1642 
  16.75%         c2, level 4  io.netty.buffer.AdaptiveByteBufAllocator::newDirectBuffer, version 2, compile id 1558 
   2.40%         c2, level 4  java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl::lazySet, version 2, compile id 1598 
   1.16%         c2, level 4  io.netty.microbench.buffer.jmh_generated.ByteBufAllocatorAllocPatternBenchmark_adaptiveDirect_jmhTest::adaptiveDirect_thrpt_jmhStub, version 5, compile id 1667 
   1.03%   [kernel.kallsyms]  native_write_msr 
   0.56%         c2, level 4  io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto, version 2, compile id 1585 
   0.40%         c2, level 4  io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseFromMagazine, version 2, compile id 1640 
   0.33%         c2, level 4  java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl::getAndSet, version 2, compile id 1638 
   0.26%         c2, level 4  io.netty.util.internal.shaded.org.jctools.queues.MpmcArrayQueue::poll, version 2, compile id 1616 
   0.16%                      <unknown> 
   0.13%         c2, level 4  io.netty.buffer.AdaptivePoolingAllocator$Chunk::attachToMagazine, version 2, compile id 1615 
   0.13%           libjvm.so  ElfSymbolTable::lookup(unsigned char*, int*, int*, int*, ElfFuncDescTable*) 
   0.12%         c2, level 4  io.netty.util.concurrent.MpscIntQueue$MpscAtomicIntegerArrayQueue::size, version 2, compile id 1609 
   0.09%   [kernel.kallsyms]  clear_bhb_loop 
   0.07%   [kernel.kallsyms]  psi_account_irqtime 
   0.06%   [kernel.kallsyms]  asm_sysvec_apic_timer_interrupt 
   0.06%           libjvm.so  fileStream::write(char const*, unsigned long) 
   0.05%   [kernel.kallsyms]  mutex_unlock 
   0.05%   [kernel.kallsyms]  entry_SYSCALL_64 
   2.02%  <...other 151 warm methods...>
....................................................................................................
  99.99%  <totals>

@laosijikaichele
Copy link
Copy Markdown
Contributor Author

Added @CompilerControl(CompilerControl.Mode.DONT_INLINE) on benchmark classes.

I cannot see it yet..

I added it locally, will push it soon.

@laosijikaichele
Copy link
Copy Markdown
Contributor Author

laosijikaichele commented Aug 4, 2025

$ java -Djmh.executor=CUSTOM -Djmh.executor.class=io.netty.microbench.util.AbstractMicrobenchmark$HarnessExecutor -jar microbench/target/microbenchmarks.jar io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark.*Direct -f 1 -t 1 -prof perfasm`

Benchmark Mode Cnt Score Error Units
ByteBufAllocatorAllocPatternBenchmark.adaptiveDirect thrpt 10 19872573.143 ± 176.568 ops/s
ByteBufAllocatorAllocPatternBenchmark.mimallocDirect thrpt 10 24398458.564 ± 130.254 ops/s
ByteBufAllocatorAllocPatternBenchmark.pooledDirect thrpt 10 13815460.976 ± 238.903 ops/s

I used JDK-17, will change to JDK-21 to test it too.

@franz1981
Copy link
Copy Markdown
Contributor

franz1981 commented Aug 4, 2025

Another thing which requires some attention in the current reference count scheme is

   0.02%  │    0x00007f5b4c2806f9:   mov    $0x2,%r10d
   0.01%  │    0x00007f5b4c2806ff:   lock xadd %r10d,0x14(%rbx)          ;*invokevirtual getAndAddInt {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - sun.misc.Unsafe::getAndAddInt@7 (line 1150)
          │                                                              ; - io.netty.util.internal.PlatformDependent0::getAndAddInt@6 (line 691)
          │                                                              ; - io.netty.util.internal.PlatformDependent::getAndAddInt@3 (line 698)
          │                                                              ; - io.netty.util.internal.UnsafeReferenceCountUpdater::getAndAddRawRefCnt@6 (line 34)
          │                                                              ; - io.netty.util.internal.ReferenceCountUpdater::retain0@3 (line 115)
          │                                                              ; - io.netty.util.internal.ReferenceCountUpdater::retain@4 (line 104)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Chunk::retain@4 (line 1223)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@45 (line 1393)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   9.22%  │    0x00007f5b4c280705:   cmp    $0x2,%r10d
          │    0x00007f5b4c280709:   je     0x00007f5b4c280ccb
   0.36%  │    0x00007f5b4c28070f:   cmp    $0x4,%r10d
          │    0x00007f5b4c280713:   je     0x00007f5b4c280cdc
          │    0x00007f5b4c280719:   mov    %r10d,%ebp
          │    0x00007f5b4c28071c:   and    $0x1,%ebp
   0.03%  │    0x00007f5b4c28071f:   test   %ebp,%ebp
          │    0x00007f5b4c280721:   jne    0x00007f5b4c28156c
   0.38%  │    0x00007f5b4c280727:   test   %r10d,%r10d                  ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)
          │    0x00007f5b4c28072a:   jle    0x00007f5b4c2815a4
   0.05%  │    0x00007f5b4c280730:   lea    0x2(%r10),%r11d              ;*if_icmpeq {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.util.internal.ReferenceCountUpdater::retain0@17 (line 116)
          │                                                              ; - io.netty.util.internal.ReferenceCountUpdater::retain@4 (line 104)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Chunk::retain@4 (line 1223)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::readInitInto@45 (line 1393)
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@47 (line 918)
   0.04%  │    0x00007f5b4c280734:   cmp    %r10d,%r11d
          │    0x00007f5b4c280737:   jl     0x00007f5b4c281528           ;*invokevirtual releaseFromMagazine {reexecute=0 rethrow=0 return_oop=0}
          │                                                              ; - io.netty.buffer.AdaptivePoolingAllocator$Magazine::allocate@236 (line 979)

Which doesn't look well optimized and is related to calling this on the chunk:

    private T retain0(T instance, final int increment, final int rawIncrement) {
        int oldRef = getAndAddRawRefCnt(instance, rawIncrement);
        if (oldRef != 2 && oldRef != 4 && (oldRef & 1) != 0) {
            throw new IllegalReferenceCountException(0, increment);
        }
        // don't pass 0!
        if ((oldRef <= 0 && oldRef + rawIncrement >= 0)
                || (oldRef >= 0 && oldRef + rawIncrement < oldRef)) {
            // overflow case
            getAndAddRawRefCnt(instance, -rawIncrement);
            throw new IllegalReferenceCountException(realRefCnt(oldRef), increment);
        }
        return instance;
    }

with retain0(instance, 1, 2) parameters.
Luckly this is inlined in the allocation path, but it still produces:

  • few unpredictable branches for (oldRef != 2 && oldRef != 4 && (oldRef & 1) != 0 (chunks ref cnt is not like the "usual" unshared adaptive ByteBufs, since it depends by the allocation patterns over different size-classes which further confuse the branch predictor)
  • an additional overflow (but predictable) check oldRef >= 0 && oldRef + rawIncrement < oldRef which has to be performed (see lea 0x2(%r10),%r11d) but which perform a costly addition (with small immediate - Intel processors have an edge here, but is kind of cheating...): ideally we would like to just perform 2 predictable pre-checks on both rawIncrement and oldRef which would turn the first into a constant (since is inlined with 2, it won't happen) e.g. (rawIncrement == 2 && oldRef >= (Integer.MAX_VALUE - 1))

A siimilar problem happen with segment's release (which perform a chunk's release too!):

    public final boolean release(T instance) {
        int rawCnt = getRawRefCnt(instance);
        return rawCnt == 2 ? tryFinalRelease0(instance, 2) || retryRelease0(instance, 1)
                : nonFinalRelease0(instance, 1, rawCnt, toLiveRealRefCnt(rawCnt, 1));
    }

for many cases of ByteBuf we have a fast path in refCnt == 2 but chunks are mostly refCnt >= 4 since they will be released only when a rotation is needed - a rather different case which has been over-optimized for normal buffers but not for cases like chunks.

This case could be improved (assuming there will be a single in-flight chunk buffer segment allocation) as

    public final boolean release(T instance) {
        int rawCnt = getRawRefCnt(instance);
        if (rawCnt == 2) {
            return tryFinalRelease0(instance, 2) || retryRelease0(instance, 1);
        }
        // this is a fast-path useful for the adaptive chunk case
        if (rawCnt == 4) {
            // this is saving an expensive computation (using lea  -0x2(%rax),%ebp) to the new ref cnt
            return nonFinalRelease0(instance, 1, 4, 2);
        }
        return nonFinalRelease0(instance, 1, rawCnt, toLiveRealRefCnt(rawCnt, 1));
    }

Sadly this is not a great solution since the number of in-flight segments usually exceed 1
The original code asm for this is:

             0x00007f5b4c2655f6:   mov    0x14(%rcx),%eax              ;*invokevirtual getInt {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - sun.misc.Unsafe::getInt@5 (line 164)
                                                                       ; - io.netty.util.internal.PlatformDependent0::getInt@5 (line 679)
                                                                       ; - io.netty.util.internal.PlatformDependent::getInt@2 (line 690)
                                                                       ; - io.netty.util.internal.UnsafeReferenceCountUpdater::getRawRefCnt@5 (line 39)
                                                                       ; - io.netty.util.internal.ReferenceCountUpdater::release@2 (line 130)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$Chunk::release@4 (line 1238)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@1 (line 1437)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
             0x00007f5b4c2655f9:   cmp    $0x2,%eax
             0x00007f5b4c2655fc:   je     0x00007f5b4c265948           ;*if_icmpne {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.util.internal.ReferenceCountUpdater::release@8 (line 131)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$Chunk::release@4 (line 1238)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@1 (line 1437)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.04%     0x00007f5b4c265602:   cmp    $0x4,%eax
          ╭  0x00007f5b4c265605:   je     0x00007f5b4c265621           ;*if_icmpeq {reexecute=0 rethrow=0 return_oop=0}
          │                                                            ; - io.netty.util.internal.ReferenceCountUpdater::toLiveRealRefCnt@7 (line 70)
          │                                                            ; - io.netty.util.internal.ReferenceCountUpdater::release@43 (line 132)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$Chunk::release@4 (line 1238)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@1 (line 1437)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
          │                                                            ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
          │                                                            ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
          │                                                            ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.32%  │  0x00007f5b4c265607:   mov    %eax,%ebp
          │  0x00007f5b4c265609:   and    $0x1,%ebp                    ;*iand {reexecute=0 rethrow=0 return_oop=0}
          │                                                            ; - io.netty.util.internal.ReferenceCountUpdater::toLiveRealRefCnt@12 (line 70)
          │                                                            ; - io.netty.util.internal.ReferenceCountUpdater::release@43 (line 132)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$Chunk::release@4 (line 1238)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@1 (line 1437)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
          │                                                            ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
          │                                                            ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
          │                                                            ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
          │  0x00007f5b4c26560c:   test   %ebp,%ebp
          │  0x00007f5b4c26560e:   jne    0x00007f5b4c266039           ;*ifne {reexecute=0 rethrow=0 return_oop=0}
          │                                                            ; - io.netty.util.internal.ReferenceCountUpdater::toLiveRealRefCnt@13 (line 70)
          │                                                            ; - io.netty.util.internal.ReferenceCountUpdater::release@43 (line 132)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$Chunk::release@4 (line 1238)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@1 (line 1437)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
          │                                                            ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
          │                                                            ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
          │                                                            ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.01%  │  0x00007f5b4c265614:   mov    %eax,%ebp
   0.04%  │  0x00007f5b4c265616:   shr    %ebp                         ;*iushr {reexecute=0 rethrow=0 return_oop=0}
          │                                                            ; - io.netty.util.internal.ReferenceCountUpdater::toLiveRealRefCnt@18 (line 71)
          │                                                            ; - io.netty.util.internal.ReferenceCountUpdater::release@43 (line 132)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$Chunk::release@4 (line 1238)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@1 (line 1437)
          │                                                            ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
          │                                                            ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
          │                                                            ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
          │                                                            ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
          │  0x00007f5b4c265618:   cmp    $0x1,%ebp
          │  0x00007f5b4c26561b:   jle    0x00007f5b4c266080
   0.03%  ↘  0x00007f5b4c265621:   lea    -0x2(%rax),%ebp
             0x00007f5b4c265624:   lock cmpxchg %ebp,0x14(%rcx)
   8.06%     0x00007f5b4c265629:   sete   %bpl
   0.55%     0x00007f5b4c26562d:   movzbl %bpl,%ebp                    ;*invokevirtual compareAndSetInt {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - sun.misc.Unsafe::compareAndSwapInt@9 (line 922)
                                                                       ; - io.netty.util.internal.PlatformDependent0::compareAndSwapInt@8 (line 695)
                                                                       ; - io.netty.util.internal.PlatformDependent::compareAndSwapInt@5 (line 702)
                                                                       ; - io.netty.util.internal.UnsafeReferenceCountUpdater::casRawRefCnt@7 (line 54)
                                                                       ; - io.netty.util.internal.ReferenceCountUpdater::nonFinalRelease0@14 (line 149)
                                                                       ; - io.netty.util.internal.ReferenceCountUpdater::release@46 (line 132)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$Chunk::release@4 (line 1238)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@1 (line 1437)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)
   0.01%     0x00007f5b4c265631:   test   %ebp,%ebp
             0x00007f5b4c265633:   je     0x00007f5b4c2660c4           ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - io.netty.util.internal.ReferenceCountUpdater::nonFinalRelease0@17 (line 149)
                                                                       ; - io.netty.util.internal.ReferenceCountUpdater::release@46 (line 132)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$Chunk::release@4 (line 1238)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$SizeClassedChunk::releaseSegment@1 (line 1437)
                                                                       ; - io.netty.buffer.AdaptivePoolingAllocator$AdaptiveByteBuf::deallocate@53 (line 1932)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::handleRelease@5 (line 149)
                                                                       ; - io.netty.buffer.AbstractReferenceCountedByteBuf::release@8 (line 139)
                                                                       ; - io.netty.microbench.buffer.ByteBufAllocatorAllocPatternBenchmark::directAlloc@30 (line 125)

where the "slow path" release requires computing the "live real ref cnt" which is not needed for refCnt == 2.

@laosijikaichele
Copy link
Copy Markdown
Contributor Author

laosijikaichele commented Feb 15, 2026

The MiMallocByteBufAllocator now supports non-event-loop(normal platform threads) allocation, which use a group of shared heaps.

Done some benchmarks on normal platform threads.

Benchmark code: MiMallocByteBufAllocator.

Test server: ARM, 8-cores, 32G ram.

JDK: OpenJDK-21.0.7.

The code has been merged from latest 4.2 branch to this PR.

We use 16 threads to simulate the common use case(8-cores * 2).

First round:

  1. JMH results(the lower the better):
截屏2026-02-15 10 06 53
  1. RSS:
app_mem_usage (5)

We can see the adaptive made some huge spike on RSS, based on earlier #15525 (comment), it might be related to the chunk reuse queue size, so I enlarged the AdaptivePoolingAllocator.CHUNK_REUSE_QUEUE to 1024, and did the second round test.

Second round:

  1. JMH results(the lower the better):
截屏2026-02-15 10 17 34
  1. RSS:
app_mem_usage (6)

We can see the huge RSS spike disappear. But the performance of adaptive showed regression. I check the adaptive code, and find the expansion strategy of magazines is conservative, so I enlarged the AdaptivePoolingAllocator.INITIAL_MAGAZINES to NettyRuntime.availableProcessors() * 2, and did the third round test(Note: the CHUNK_REUSE_QUEUE still maintained 1024).

Third round:

  1. JMH results(the lower the better):
截屏2026-02-15 10 24 44
  1. RSS:
app_mem_usage (7)

We can see the adaptive performance improved, and the RSS further reduced, in some cases, the RSS is lower than mimalloc, but mimalloc still showed the best performance.

Observation:

  1. On platform threads, MiMallocByteBufAllocator still showed the best performance.
  2. For adaptive, might be worth to make AdaptivePoolingAllocator.INITIAL_MAGAZINES(default 1) configurable?

@franz1981
Copy link
Copy Markdown
Contributor

franz1981 commented Feb 15, 2026

@laosijikaichele the benchmark is configured to run on event loop threads?
Last but not least: the number of atomic ops can severely impact this use case and I haven't yet sent the changes to get rid of ref cnt for the size classes chunks...
Since you are interested into providing improvements on allocations WDYT about giving it s shot?
This would save to have another additional allocator algorithm to maintain

@franz1981
Copy link
Copy Markdown
Contributor

See #15741 (comment) the only unchecked point, in case you want to try it 🙏

@laosijikaichele
Copy link
Copy Markdown
Contributor Author

the benchmark is configured to run on event loop threads?

It run on normal platform threads(non-event loop).

See #15741 (comment) the only unchecked point, in case you want to try it

Will look into it.

@laosijikaichele
Copy link
Copy Markdown
Contributor Author

laosijikaichele commented Feb 15, 2026

to get rid of ref cnt for the size classes chunks...

To quickly check the performance effect on atomic refCnt of size classes chunks, I did a temporary commit 5ad6dec which use plain read/write on the refCnt of size classes chunks.

For ByteBufAllocatorAllocPatternBenchmark, when we use event-loop threads, since it use thread-local and a thread-safe chunk-reuse queue, so I think it's ok to use this temporary commit to quickly check the performance effect.

We use 16 event-loop threads:

  1. The adaptive code is same with 4.2-branch, but set AdaptivePoolingAllocator.CHUNK_REUSE_QUEUE to 1024, to avoid huge RSS spike, the JMH results:
截屏2026-02-15 21 12 22
  1. With this commit 5ad6dec which use plain read/write on refCnt of size classes chunks, still with AdaptivePoolingAllocator.CHUNK_REUSE_QUEUE set to 1024, the JMH results:
截屏2026-02-15 21 13 12

Observation:

  • The adaptive performance improved when use plain read/write on the refCnt of size classes chunks.
  • The mimalloc still showed better performance.

@franz1981

@franz1981
Copy link
Copy Markdown
Contributor

It's a bit weird that 256 in adaptive get better performance than 128. I would likely profile what's going on there ❤️
But yes, the point is that the performance deficiencies can be addressed and I still allow having a single allocator to rule em all

@laosijikaichele
Copy link
Copy Markdown
Contributor Author

But yes, the point is that the performance deficiencies can be addressed and I still allow having a single allocator to rule em all

I'd be also interested to try to improve the adaptive's performance too, will look into the code...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants