New buffers #13304

gortiz · 2024-06-03T14:35:00Z

This PR includes some changes on the buffers and adds a couple of stream classes. Nobody is actually using the newly introduced classes here, but these classes are used in #13303 to reduce the allocation rate when building, serializing and deserializing instances of DataBlock.

The main reason to create this PR as independent from #13303 is to make the later easier to read if we decide to merge this first.

TODO:

Clean up code
Properly tests new classes
Add javadoc

codecov-commenter · 2024-06-03T15:14:14Z

Codecov Report

Attention: Patch coverage is 65.22811% with 282 lines in your changes missing coverage. Please review.

Project coverage is 62.01%. Comparing base (59551e4) to head (08abb41).
Report is 666 commits behind head on master.

Files	Patch %	Lines
...e/pinot/segment/spi/memory/CompoundDataBuffer.java	75.57%	71 Missing and 14 partials ⚠️
...segment/spi/memory/DataBufferPinotInputStream.java	52.27%	25 Missing and 17 partials ⚠️
...not/segment/spi/memory/PagedPinotOutputStream.java	76.02%	37 Missing and 4 partials ⚠️
...rg/apache/pinot/segment/spi/memory/DataBuffer.java	51.38%	32 Missing and 3 partials ⚠️
...ache/pinot/segment/spi/memory/PinotDataBuffer.java	45.31%	30 Missing and 5 partials ⚠️
...he/pinot/segment/spi/memory/PinotOutputStream.java	36.95%	29 Missing ⚠️
...che/pinot/segment/spi/memory/PinotInputStream.java	16.66%	15 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #13304      +/-   ##
============================================
+ Coverage     61.75%   62.01%   +0.25%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2557     +121     
  Lines        133233   141190    +7957     
  Branches      20636    21913    +1277     
============================================
+ Hits          82274    87553    +5279     
- Misses        44911    46986    +2075     
- Partials       6048     6651     +603

Flag	Coverage Δ
custom-integration1	`<0.01% <0.00%> (-0.01%)`	⬇️
integration	`<0.01% <0.00%> (-0.01%)`	⬇️
integration1	`<0.01% <0.00%> (-0.01%)`	⬇️
integration2	`0.00% <0.00%> (ø)`
java-11	`61.97% <65.22%> (+0.26%)`	⬆️
java-21	`61.87% <65.22%> (+0.25%)`	⬆️
skip-bytebuffers-false	`61.99% <65.22%> (+0.24%)`	⬆️
skip-bytebuffers-true	`61.86% <65.22%> (+34.13%)`	⬆️
temurin	`62.01% <65.22%> (+0.25%)`	⬆️
unittests	`62.00% <65.22%> (+0.25%)`	⬆️
unittests1	`46.71% <65.22%> (-0.19%)`	⬇️
unittests2	`27.51% <0.00%> (-0.23%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Jackie-Jiang

Looks very promising in general!

pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/memory/DataBuffer.java

pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/memory/CompoundDataBuffer.java

pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/memory/SeekableInputStream.java

...egment-spi/src/main/java/org/apache/pinot/segment/spi/memory/DataBufferPinotInputStream.java

Methods have been moved to PinotInputStream and PinotOutputStream

Jackie-Jiang

Awesome!

Jackie-Jiang · 2024-06-20T16:46:33Z

pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/memory/PinotDataBuffer.java

  private static final AtomicLong ALLOCATION_FAILURE_COUNT = new AtomicLong();
-  private static final Map<PinotDataBuffer, BufferContext> BUFFER_CONTEXT_MAP = new WeakHashMap<>();
+  // we need to use MapMaker instead of WeakHashMap because we want to use identity comparison for the keys
+  private static final Map<PinotDataBuffer, BufferContext> BUFFER_CONTEXT_MAP = new MapMaker().weakKeys().makeMap();


Seems like the returned map can be accessed concurrently. We might be able to remove the synchronized block

Jackie-Jiang · 2024-06-20T16:51:15Z

pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/memory/CompoundDataBuffer.java

+    _bufferOffsets = new long[buffers.length];
+    _order = order;
+    long offset = 0;
+    for (int i = 0; i < buffers.length; i++) {


We can combine the 2 for loops. Cache buffers[i].size() which is used multiple times

I'm splitting it in two loops on purpose. The idea (not backed by data) is that by doing so the JIT will have an easier time to optimize the second loop:

for (int i = 0; i < buffers.length; i++) { _bufferOffsets[i] = offset; long size = buffers[i].size(); offset += size; }

TBH I don't think one or the other alternative is going to be specially faster, but I think this way is easier to read. The first loop is checking preconditions while the second initialize a couple of complex variables (_bufferOffsets and _size)

pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/memory/CompoundDataBuffer.java

Jackie-Jiang · 2024-06-20T16:58:52Z

pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/memory/CompoundDataBuffer.java

+  @Override
+  public void readFrom(long offset, byte[] input, int srcOffset, int size) {
+    if (offset + size > _size) {
+      throw new BufferUnderflowException();


I think this should be BufferOverflowException, same for other write methods

Yeah! I always think readFrom is a read operation when it is a write operation

Add some new IO and buffer classes: - DataBuffer, a new interface that contains most (if not all) methods of PinotDataBuffer, which now implements this interface - CompoundDataBuffer, a new class that wraps a list of DataBuffers and offers a continuous DataBuffer withouth copying them. - PinotOutputStream, an OutputStream whose cursor can be moved. - PinotInputStream, the same, but for InputStream. - PagedPinotOutputStream, a PinotOutputStream that stores data into a list of ByteBuffers. When the last ByteBuffer in the list is filled, a new one is created. - DataBufferPinotInputStream, an adaptor that takes a DataBuffer and shows it as a PinotInputStream.

gortiz added 4 commits May 30, 2024 15:52

Add new buffers and streams

18d0865

Extra improvements on buffers

30b4359

CompoundDataBuffer.Builder will skip empty buffers

0dd8edc

Fixed a bug in PagedPinotOutputStream.write(byte[]...)

3ecff36

gortiz mentioned this pull request Jun 3, 2024

Improve performance of DataBlock serde #13303

Merged

2 tasks

Add tests and fixed some bugs

fe46889

gortiz marked this pull request as ready for review June 6, 2024 08:32

gortiz requested review from Jackie-Jiang and xiangfu0 June 6, 2024 08:32

Jackie-Jiang added enhancement performance labels Jun 11, 2024

Jackie-Jiang reviewed Jun 11, 2024

View reviewed changes

gortiz added 5 commits June 14, 2024 12:14

improve DataBuffer.sameContent

a11654f

improve code style in DataBuffer.commonHashCode

d6f1d60

Improve performance in CompoundDataBuffer constructors

a22b114

Remove SeekableInputStream and SeekableOutputStream

8bc191e

Methods have been moved to PinotInputStream and PinotOutputStream

Fix format

42e7067

Jackie-Jiang approved these changes Jun 20, 2024

View reviewed changes

gortiz added 2 commits June 24, 2024 10:30

Fix comments in PR

0d6b72c

Merge remote-tracking branch master into new-buffers

08abb41

gortiz merged commit 66676d9 into apache:master Jun 24, 2024

gortiz deleted the new-buffers branch June 24, 2024 11:01

New buffers #13304

New buffers #13304

Uh oh!

Conversation

gortiz commented Jun 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jun 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Uh oh!

Jackie-Jiang Jun 20, 2024

Choose a reason for hiding this comment

Uh oh!

Jackie-Jiang Jun 20, 2024

Choose a reason for hiding this comment

Uh oh!

gortiz Jun 24, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Jackie-Jiang Jun 20, 2024

Choose a reason for hiding this comment

Uh oh!

gortiz Jun 24, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gortiz commented Jun 3, 2024 •

edited

Loading

codecov-commenter commented Jun 3, 2024 •

edited

Loading