-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New buffers #13304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New buffers #13304
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13304 +/- ##
============================================
+ Coverage 61.75% 62.01% +0.25%
+ Complexity 207 198 -9
============================================
Files 2436 2557 +121
Lines 133233 141190 +7957
Branches 20636 21913 +1277
============================================
+ Hits 82274 87553 +5279
- Misses 44911 46986 +2075
- Partials 6048 6651 +603
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very promising in general!
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/memory/DataBuffer.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/memory/DataBuffer.java
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/memory/DataBuffer.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/memory/CompoundDataBuffer.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/memory/CompoundDataBuffer.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/memory/SeekableInputStream.java
Outdated
Show resolved
Hide resolved
...egment-spi/src/main/java/org/apache/pinot/segment/spi/memory/DataBufferPinotInputStream.java
Outdated
Show resolved
Hide resolved
Methods have been moved to PinotInputStream and PinotOutputStream
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
| private static final AtomicLong ALLOCATION_FAILURE_COUNT = new AtomicLong(); | ||
| private static final Map<PinotDataBuffer, BufferContext> BUFFER_CONTEXT_MAP = new WeakHashMap<>(); | ||
| // we need to use MapMaker instead of WeakHashMap because we want to use identity comparison for the keys | ||
| private static final Map<PinotDataBuffer, BufferContext> BUFFER_CONTEXT_MAP = new MapMaker().weakKeys().makeMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the returned map can be accessed concurrently. We might be able to remove the synchronized block
| _bufferOffsets = new long[buffers.length]; | ||
| _order = order; | ||
| long offset = 0; | ||
| for (int i = 0; i < buffers.length; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can combine the 2 for loops. Cache buffers[i].size() which is used multiple times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm splitting it in two loops on purpose. The idea (not backed by data) is that by doing so the JIT will have an easier time to optimize the second loop:
for (int i = 0; i < buffers.length; i++) {
_bufferOffsets[i] = offset;
long size = buffers[i].size();
offset += size;
}
TBH I don't think one or the other alternative is going to be specially faster, but I think this way is easier to read. The first loop is checking preconditions while the second initialize a couple of complex variables (_bufferOffsets and _size)
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/memory/CompoundDataBuffer.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/memory/CompoundDataBuffer.java
Outdated
Show resolved
Hide resolved
| @Override | ||
| public void readFrom(long offset, byte[] input, int srcOffset, int size) { | ||
| if (offset + size > _size) { | ||
| throw new BufferUnderflowException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be BufferOverflowException, same for other write methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! I always think readFrom is a read operation when it is a write operation
Add some new IO and buffer classes: - DataBuffer, a new interface that contains most (if not all) methods of PinotDataBuffer, which now implements this interface - CompoundDataBuffer, a new class that wraps a list of DataBuffers and offers a continuous DataBuffer withouth copying them. - PinotOutputStream, an OutputStream whose cursor can be moved. - PinotInputStream, the same, but for InputStream. - PagedPinotOutputStream, a PinotOutputStream that stores data into a list of ByteBuffers. When the last ByteBuffer in the list is filled, a new one is created. - DataBufferPinotInputStream, an adaptor that takes a DataBuffer and shows it as a PinotInputStream.
This PR includes some changes on the buffers and adds a couple of stream classes. Nobody is actually using the newly introduced classes here, but these classes are used in #13303 to reduce the allocation rate when building, serializing and deserializing instances of DataBlock.
The main reason to create this PR as independent from #13303 is to make the later easier to read if we decide to merge this first.
TODO: