Skip to content

Skip finalization for PoolThreadCache instances without small/normal caches#13408

Merged
chrisvest merged 1 commit intonetty:4.1from
franz1981:4.1_no_finalizers
Jun 6, 2023
Merged

Skip finalization for PoolThreadCache instances without small/normal caches#13408
chrisvest merged 1 commit intonetty:4.1from
franz1981:4.1_no_finalizers

Conversation

@franz1981
Copy link
Contributor

Motivation:

PoolThreadCache doesn't need to be finalized if it doesn't contain small/normal caches

Modifications:

Create a base class for PoolThreadCache that won't be registered for finalization

Result:

Less GC activity and reduced footprint for short/medium living threads

…caches

Motivation:

PoolThreadCache doesn't need to be finalized if it doesn't contain small/normal caches

Modifications:

Create a base class for PoolThreadCache that won't be registered for finalization

Result:

Less GC activity and reduced footprint for short/medium living threads
@franz1981
Copy link
Contributor Author

This is the first step in the direction to make Netty more "loom-friendly" too: now that we can generate non-finalizable lightweight instances, we could find a way to associate them to a virtual thread without using any thread local (that is NOT a good idea to use with short living millions of virtual threads :"( )

@normanmaurer
Copy link
Member

@chrisvest PTAL.

Copy link
Member

@chrisvest chrisvest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice

@chrisvest chrisvest added this to the 4.1.94.Final milestone Jun 6, 2023
@chrisvest chrisvest merged commit 9412424 into netty:4.1 Jun 6, 2023
pan3793 added a commit to apache/kyuubi that referenced this pull request Jun 26, 2023
### _Why are the changes needed?_

Upgrade Netty to the latest Arrow-compatible version: 4.1.93.Final

Currently, we can not upgrade to 4.1.94.Final or above because of netty/netty#13408

```
java.lang.NoSuchMethodError: 'io.netty.buffer.PoolThreadCache io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.threadCache()'
	at io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL(PooledByteBufAllocatorL.java:164)
	at io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer(PooledByteBufAllocatorL.java:214)
	at io.netty.buffer.PooledByteBufAllocatorL.allocate(PooledByteBufAllocatorL.java:58)
	at org.apache.arrow.memory.NettyAllocationManager.<init>(NettyAllocationManager.java:77)
	at org.apache.arrow.memory.NettyAllocationManager.<init>(NettyAllocationManager.java:84)
	at org.apache.arrow.memory.NettyAllocationManager$1.create(NettyAllocationManager.java:34)
	at org.apache.arrow.memory.BaseAllocator.newAllocationManager(BaseAllocator.java:354)
	at org.apache.arrow.memory.BaseAllocator.newAllocationManager(BaseAllocator.java:349)
	at org.apache.arrow.memory.BaseAllocator.bufferWithoutReservation(BaseAllocator.java:337)
	at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:315)
	at org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:279)
	at org.apache.arrow.vector.BaseVariableWidthVector.allocateBytes(BaseVariableWidthVector.java:462)
	at org.apache.arrow.vector.BaseVariableWidthVector.allocateNew(BaseVariableWidthVector.java:420)
	at org.apache.arrow.vector.BaseVariableWidthVector.allocateNew(BaseVariableWidthVector.java:380)
	at org.apache.spark.sql.execution.arrow.ArrowWriter$.$anonfun$create$1(ArrowWriter.scala:42)
```

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #4992 from pan3793/netty.

Closes #4992

9dd8f9a [Cheng Pan] nit
e695314 [Cheng Pan] Bump Netty 4.1.93.Final

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
@slandelle
Copy link
Contributor

@chrisvest @franz1981 Ping, just in case you're not aware of the performance regression introduced by this commit.

@normanmaurer
Copy link
Member

@slandelle thanks a lot... @ejona86 I still don't get why this would cause such a problem

@franz1981
Copy link
Contributor Author

@normanmaurer given that this one was an "addition" but not required (for loom I will poet the allocator from Chris), we can revert and Investigate (without hurry). I don't want anyone to regress for something which is not a bug fix

@chrisvest
Copy link
Member

@franz1981 Could the regression be type pollution? I don't see anything suspicious in this other than two types visiting code paths where there previously was only one.

@franz1981
Copy link
Contributor Author

franz1981 commented Jul 23, 2023

Good point @chrisvest yep, it could be a funny (and scary, as usual :/ ) possibility...
If I can run their microbench, both the profiler and the agent can show that (@ejona86 too): https://github.com/RedHatPerf/type-pollution-agent/blob/master/benchmarks/WHATIF.md

At the same time the 2 concrete types doesn't implement any interface...mmm

@chrisvest
Copy link
Member

@franz1981 It's not necessarily related to instanceof. Type pollution can impede JIT optimizations, when a method is called with more than one concrete type in an argument.

@chrisvest
Copy link
Member

Instead of splitting out PoolArenasCache, we could move the finalize() method to a new class that PoolThreadCache then may-or-may-not reference from an instance field.

@franz1981
Copy link
Contributor Author

@chrisvest you mean profile type pollution then, yep. But with concrete types (not interfaces) bi-morphism should still allow inlining and save megamorphism.
Without profiling data is difficult to say what's going on, but my educated guess is that pool thread cache is not used for small/normal types (due to some logic error where I have enabled it - I have added a small condition there

5a7197f#diff-c24e989820523c812b36dea1eb81cbbde7822d060eac7bf136fa96be00956e3fR521

Which wasn't present before and it should be trivial to check if reverting just the condition fix it.
If the regression is in a jmh benchmark which cause hitting the condition (due to the jmh thread executor..?) That would explain why I didn't spotted in our perf rig.

And still I like your idea of an inner reference which can be finalized...so if you feel to do it, go for it +100
If we can have monomorphism is always a good idea

chrisvest added a commit to chrisvest/netty that referenced this pull request Jul 23, 2023
chrisvest added a commit to chrisvest/netty that referenced this pull request Jul 23, 2023
Motivation:
PoolThreadCache objects are created even when they are not meant to pool anything.
In such a case, there is no point in giving them a finalizer() method.

Modification:
The finalizer method is moved to a separate object, which is conditionally referenced from the PoolThreadCache.
Non-pooling PoolThreadCache objects will not create their FreeOnFinalize objects.

This is an alternative implementation to netty#13408, which is also reverted by this PR.

Additionally, netty#13408 added a condition to create non-caching PoolThreadCache when the size of the small and normal caches were zero.
However, it turned out that even when these were requested to be zero, a single-element cache would be created for them.
This PR also reverts the logic to the old behaviour with the single-element cache for small and normal sizes.

Result:
We avoid creating finalizable objects when we don't pool.
normanmaurer pushed a commit that referenced this pull request Jul 24, 2023
Motivation:
PoolThreadCache objects are created even when they are not meant to pool
anything.
In such a case, there is no point in giving them a finalizer() method.

Modification:
The finalizer method is moved to a separate object, which is
conditionally referenced from the PoolThreadCache.
Non-pooling PoolThreadCache objects will not create their FreeOnFinalize
objects.

This is an alternative implementation to #13408, which is also reverted
by this PR.

Additionally, #13408 added a condition to create non-caching
PoolThreadCache when the size of the small and normal caches were zero.
However, it turned out that even when these were requested to be zero, a
single-element cache would be created for them.
This PR also reverts the logic to the old behaviour with the
single-element cache for small and normal sizes.

Result:
We avoid creating finalizable objects when we don't pool.
@chrisvest
Copy link
Member

@franz1981 Apparently the extra condition check is not the culprit: grpc/grpc-java#10401 (comment)

So the most convenient explanation is ruled out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants