Skip to content

Conversation

@gortiz
Copy link
Contributor

@gortiz gortiz commented Jun 3, 2024

This PR includes some changes on the buffers and adds a couple of stream classes. Nobody is actually using the newly introduced classes here, but these classes are used in #13303 to reduce the allocation rate when building, serializing and deserializing instances of DataBlock.

The main reason to create this PR as independent from #13303 is to make the later easier to read if we decide to merge this first.

TODO:

  • Clean up code
  • Properly tests new classes
  • Add javadoc

@codecov-commenter
Copy link

codecov-commenter commented Jun 3, 2024

Codecov Report

Attention: Patch coverage is 65.22811% with 282 lines in your changes missing coverage. Please review.

Project coverage is 62.01%. Comparing base (59551e4) to head (08abb41).
Report is 666 commits behind head on master.

Files Patch % Lines
...e/pinot/segment/spi/memory/CompoundDataBuffer.java 75.57% 71 Missing and 14 partials ⚠️
...segment/spi/memory/DataBufferPinotInputStream.java 52.27% 25 Missing and 17 partials ⚠️
...not/segment/spi/memory/PagedPinotOutputStream.java 76.02% 37 Missing and 4 partials ⚠️
...rg/apache/pinot/segment/spi/memory/DataBuffer.java 51.38% 32 Missing and 3 partials ⚠️
...ache/pinot/segment/spi/memory/PinotDataBuffer.java 45.31% 30 Missing and 5 partials ⚠️
...he/pinot/segment/spi/memory/PinotOutputStream.java 36.95% 29 Missing ⚠️
...che/pinot/segment/spi/memory/PinotInputStream.java 16.66% 15 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13304      +/-   ##
============================================
+ Coverage     61.75%   62.01%   +0.25%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2557     +121     
  Lines        133233   141190    +7957     
  Branches      20636    21913    +1277     
============================================
+ Hits          82274    87553    +5279     
- Misses        44911    46986    +2075     
- Partials       6048     6651     +603     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.97% <65.22%> (+0.26%) ⬆️
java-21 61.87% <65.22%> (+0.25%) ⬆️
skip-bytebuffers-false 61.99% <65.22%> (+0.24%) ⬆️
skip-bytebuffers-true 61.86% <65.22%> (+34.13%) ⬆️
temurin 62.01% <65.22%> (+0.25%) ⬆️
unittests 62.00% <65.22%> (+0.25%) ⬆️
unittests1 46.71% <65.22%> (-0.19%) ⬇️
unittests2 27.51% <0.00%> (-0.23%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gortiz gortiz marked this pull request as ready for review June 6, 2024 08:32
@gortiz gortiz requested review from Jackie-Jiang and xiangfu0 June 6, 2024 08:32
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very promising in general!

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

private static final AtomicLong ALLOCATION_FAILURE_COUNT = new AtomicLong();
private static final Map<PinotDataBuffer, BufferContext> BUFFER_CONTEXT_MAP = new WeakHashMap<>();
// we need to use MapMaker instead of WeakHashMap because we want to use identity comparison for the keys
private static final Map<PinotDataBuffer, BufferContext> BUFFER_CONTEXT_MAP = new MapMaker().weakKeys().makeMap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the returned map can be accessed concurrently. We might be able to remove the synchronized block

_bufferOffsets = new long[buffers.length];
_order = order;
long offset = 0;
for (int i = 0; i < buffers.length; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can combine the 2 for loops. Cache buffers[i].size() which is used multiple times

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm splitting it in two loops on purpose. The idea (not backed by data) is that by doing so the JIT will have an easier time to optimize the second loop:

    for (int i = 0; i < buffers.length; i++) {
      _bufferOffsets[i] = offset;
      long size = buffers[i].size();
      offset += size;
    }

TBH I don't think one or the other alternative is going to be specially faster, but I think this way is easier to read. The first loop is checking preconditions while the second initialize a couple of complex variables (_bufferOffsets and _size)

@Override
public void readFrom(long offset, byte[] input, int srcOffset, int size) {
if (offset + size > _size) {
throw new BufferUnderflowException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be BufferOverflowException, same for other write methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! I always think readFrom is a read operation when it is a write operation

@gortiz gortiz merged commit 66676d9 into apache:master Jun 24, 2024
@gortiz gortiz deleted the new-buffers branch June 24, 2024 11:01
suyashpatel98 pushed a commit to suyashpatel98/pinot that referenced this pull request Jul 6, 2024
Add some new IO and buffer classes:
- DataBuffer, a new interface that contains most (if not all) methods of PinotDataBuffer, which now implements this interface
- CompoundDataBuffer, a new class that wraps a list of DataBuffers and offers a continuous DataBuffer withouth copying them.
- PinotOutputStream, an OutputStream whose cursor can be moved.
- PinotInputStream, the same, but for InputStream.
- PagedPinotOutputStream, a PinotOutputStream that stores data into a list of ByteBuffers. When the last ByteBuffer in the list is filled, a new one is created.
- DataBufferPinotInputStream, an adaptor that takes a DataBuffer and shows it as a PinotInputStream.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants