Aggregation delay conversion to double #8139

richardstartin · 2022-02-05T20:54:09Z

Aggregation function aggregate as double to avoid numeric overflow, but converting blocks to double[] too soon has two drawbacks:

double[] are larger than float[] and int[] and this may result in twice the footprint on heap when these arrays are retained by the DataBlockCache
If the values are read directly from a ForwardIndexReader, reading int/long/float values as double prevents autovectorization, resulting in a performance penalty.

Avoiding Premature Type Conversion

Type conversion slows down bulk reads in ForwardIndexReader - e.g. compare reading long contiguous values as long and as double (this benchmark is in the project and can be run by anyone):

Benchmark                                                     (_blockSize)  (_numBlocks)  Mode  Cnt      Score      Error  Units
BenchmarkFixedByteSVForwardIndexReader.readDoublesBatch              10000          1000  avgt    5  33931.164 ± 4052.520  us/op
BenchmarkFixedByteSVForwardIndexReader.readLongsBatch                10000          1000  avgt    5  14003.773 ± 1625.357  us/op

The root cause is that when values are read into an array from disk, the endianness needs to be swapped. Hotspot has an efficient implementation of this operation Copy::conjoint_swap which can't be used when type conversion also needs to be performed, so reading long values as longs is more efficient than reading longs as doubles, despite the same amount of data being copied. This can be see by profiling the benchmark:

....[Hottest Regions]...............................................................................
 62.36%           libjvm.so  Copy::conjoint_swap (21 bytes) 
 30.25%         c2, level 4  org.apache.pinot.perf.BenchmarkFixedByteSVForwardIndexReader::readLongsBatch, version 1447 (101 bytes) 
 
 ....[Hottest Regions]...............................................................................
 81.73%         c2, level 4  org.apache.pinot.perf.BenchmarkFixedByteSVForwardIndexReader::readDoublesBatch, version 1428 (78 bytes)

This means that a simple sum over a raw column is more efficient when the type conversion is delayed. This can be seen (in a benchmark to be contributed in another PR) when computing a sum over a raw INT column:

master

Benchmark               (_intBaseValue)  (_numRows)                              (_query)  Mode  Cnt      Score     Error  Units
BenchmarkQueries.query                0     1500000  SELECT SUM(RAW_INT_COL) FROM MyTable  avgt    5  17719.453 ± 171.739  us/op

branch

Benchmark               (_intBaseValue)  (_numRows)                              (_query)  Mode  Cnt      Score     Error  Units
BenchmarkQueries.query                0     1500000  SELECT SUM(RAW_INT_COL) FROM MyTable  avgt    5  15475.992 ± 537.057  us/op

codecov-commenter · 2022-02-05T21:24:21Z

Codecov Report

Merging #8139 (9d67cc1) into master (8bbf93a) will increase coverage by 0.06%.
The diff coverage is 91.78%.

@@             Coverage Diff              @@
##             master    #8139      +/-   ##
============================================
+ Coverage     71.39%   71.46%   +0.06%     
+ Complexity     4303     4302       -1     
============================================
  Files          1624     1624              
  Lines         84198    84254      +56     
  Branches      12602    12612      +10     
============================================
+ Hits          60116    60208      +92     
+ Misses        19970    19935      -35     
+ Partials       4112     4111       -1

Flag	Coverage Δ
integration1	`28.93% <91.78%> (+0.06%)`	⬆️
integration2	`27.67% <91.78%> (+<0.01%)`	⬆️
unittests1	`67.92% <43.83%> (-0.05%)`	⬇️
unittests2	`14.19% <0.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...y/aggregation/function/SumAggregationFunction.java	`95.45% <89.47%> (-4.55%)`	⬇️
...y/aggregation/function/MaxAggregationFunction.java	`96.36% <92.59%> (-3.64%)`	⬇️
...y/aggregation/function/MinAggregationFunction.java	`96.36% <92.59%> (-3.64%)`	⬇️
...ache/pinot/core/operator/docidsets/OrDocIdSet.java	`86.36% <0.00%> (-11.37%)`	⬇️
...inot/core/util/SegmentCompletionProtocolUtils.java	`57.69% <0.00%> (-7.70%)`	⬇️
...a/org/apache/pinot/common/utils/ServiceStatus.java	`60.00% <0.00%> (-7.15%)`	⬇️
.../pinot/core/query/scheduler/PriorityScheduler.java	`80.82% <0.00%> (-2.74%)`	⬇️
...not/broker/broker/helix/ClusterChangeMediator.java	`77.65% <0.00%> (-2.13%)`	⬇️
.../pinot/server/starter/helix/BaseServerStarter.java	`57.98% <0.00%> (-1.97%)`	⬇️
...controller/helix/core/minion/PinotTaskManager.java	`67.51% <0.00%> (-1.83%)`	⬇️
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8bbf93a...9d67cc1. Read the comment docs.

siddharthteotia · 2022-02-05T21:51:08Z

...e/src/main/java/org/apache/pinot/core/query/aggregation/function/MaxAggregationFunction.java

-      double value = valueArray[i];
-      if (value > max) {
-        max = value;
+    BlockValSet blockValSet = blockValSetMap.get(_expression);


I understand how delaying conversion to double will help with heap usage. Couple of questions on performance:

Now every aggregate() call has to execute switch condition once. Will there be any perf penalty for this ?

Why does conversion to double[] prevent auto-vectorizatonn ?

You can see some numbers on #7920 (read longs as longs vs convert to double) but I’ll pull out some disassembly to demonstrate, as well as attach some numbers here.

Regarding the switch statement, there is one per block, so it’s cost is amortised (just like all the virtual calls we do)

@siddharthteotia I've added some rationale and numbers to the PR description, please take a look.

Thanks for sharing perf numbers, @richardstartin. I was also curious to see the disassembled code as you said auto-vectorization probability increases with continuing to treat as long. Please share if you have.

I am good with the PR. Please also check-in the benchmark

The vectorized method is Copy::conjoint_swap, as mentioned above, which hsdis does not dissemble.

...e/src/main/java/org/apache/pinot/core/query/aggregation/function/MaxAggregationFunction.java

Jackie-Jiang

This is great optimization. We should also add this to group-by and MV functions

...e/src/main/java/org/apache/pinot/core/query/aggregation/function/MinAggregationFunction.java

Jackie-Jiang · 2022-02-08T17:27:39Z

...e/src/main/java/org/apache/pinot/core/query/aggregation/function/SumAggregationFunction.java

+        break;
+      }
+      default:
+        throw new IllegalStateException("Cannot compute min for non-numeric type: " + blockValSet.getValueType());


Fix the exception message to reflect the correct aggregation

Will follow up

richardstartin added 2 commits February 5, 2022 20:50

delay conversion to double in sum

66599c5

delay conversion to double in max

1dcbe6b

siddharthteotia reviewed Feb 5, 2022

View reviewed changes

...e/src/main/java/org/apache/pinot/core/query/aggregation/function/MaxAggregationFunction.java Outdated Show resolved Hide resolved

min function + error messages

9d67cc1

richardstartin force-pushed the aggregation-delay-conversion-to-double branch from 9ea5254 to 9d67cc1 Compare February 6, 2022 22:07

siddharthteotia approved these changes Feb 7, 2022

View reviewed changes

siddharthteotia merged commit 1684aee into apache:master Feb 7, 2022

Jackie-Jiang reviewed Feb 8, 2022

View reviewed changes

richardstartin mentioned this pull request Feb 8, 2022

fix aggregation error messages #8165

Merged

richardstartin added the performance label Apr 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Aggregation delay conversion to double #8139

Aggregation delay conversion to double #8139

Uh oh!

richardstartin commented Feb 5, 2022 •

edited

Loading

Uh oh!

codecov-commenter commented Feb 5, 2022 •

edited

Loading

Uh oh!

siddharthteotia Feb 5, 2022

Uh oh!

richardstartin Feb 5, 2022

Uh oh!

richardstartin Feb 5, 2022

Uh oh!

richardstartin Feb 6, 2022

Uh oh!

siddharthteotia Feb 7, 2022

Uh oh!

richardstartin Feb 7, 2022

Uh oh!

Uh oh!

Jackie-Jiang left a comment

Uh oh!

Uh oh!

Jackie-Jiang Feb 8, 2022

Uh oh!

richardstartin Feb 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Aggregation delay conversion to double #8139

Aggregation delay conversion to double #8139

Uh oh!

Conversation

richardstartin commented Feb 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Avoiding Premature Type Conversion

Uh oh!

codecov-commenter commented Feb 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

richardstartin commented Feb 5, 2022 •

edited

Loading

codecov-commenter commented Feb 5, 2022 •

edited

Loading