-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Aggregation delay conversion to double #8139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregation delay conversion to double #8139
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8139 +/- ##
============================================
+ Coverage 71.39% 71.46% +0.06%
+ Complexity 4303 4302 -1
============================================
Files 1624 1624
Lines 84198 84254 +56
Branches 12602 12612 +10
============================================
+ Hits 60116 60208 +92
+ Misses 19970 19935 -35
+ Partials 4112 4111 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
| double value = valueArray[i]; | ||
| if (value > max) { | ||
| max = value; | ||
| BlockValSet blockValSet = blockValSetMap.get(_expression); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand how delaying conversion to double will help with heap usage. Couple of questions on performance:
- Now every aggregate() call has to execute switch condition once. Will there be any perf penalty for this ?
- Why does conversion to double[] prevent auto-vectorizatonn ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see some numbers on #7920 (read longs as longs vs convert to double) but I’ll pull out some disassembly to demonstrate, as well as attach some numbers here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the switch statement, there is one per block, so it’s cost is amortised (just like all the virtual calls we do)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siddharthteotia I've added some rationale and numbers to the PR description, please take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing perf numbers, @richardstartin. I was also curious to see the disassembled code as you said auto-vectorization probability increases with continuing to treat as long. Please share if you have.
I am good with the PR. Please also check-in the benchmark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vectorized method is Copy::conjoint_swap, as mentioned above, which hsdis does not dissemble.
...e/src/main/java/org/apache/pinot/core/query/aggregation/function/MaxAggregationFunction.java
Outdated
Show resolved
Hide resolved
9ea5254 to
9d67cc1
Compare
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great optimization. We should also add this to group-by and MV functions
...e/src/main/java/org/apache/pinot/core/query/aggregation/function/MinAggregationFunction.java
Show resolved
Hide resolved
| break; | ||
| } | ||
| default: | ||
| throw new IllegalStateException("Cannot compute min for non-numeric type: " + blockValSet.getValueType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the exception message to reflect the correct aggregation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will follow up
Aggregation function aggregate as
doubleto avoid numeric overflow, but converting blocks todouble[]too soon has two drawbacks:double[]are larger thanfloat[]andint[]and this may result in twice the footprint on heap when these arrays are retained by theDataBlockCacheForwardIndexReader, readingint/long/floatvalues asdoubleprevents autovectorization, resulting in a performance penalty.Avoiding Premature Type Conversion
Type conversion slows down bulk reads in
ForwardIndexReader- e.g. compare readinglongcontiguous values aslongand asdouble(this benchmark is in the project and can be run by anyone):The root cause is that when values are read into an array from disk, the endianness needs to be swapped. Hotspot has an efficient implementation of this operation
Copy::conjoint_swapwhich can't be used when type conversion also needs to be performed, so readinglongvalues aslongs is more efficient than readinglongs asdoubles, despite the same amount of data being copied. This can be see by profiling the benchmark:This means that a simple sum over a raw column is more efficient when the type conversion is delayed. This can be seen (in a benchmark to be contributed in another PR) when computing a sum over a raw INT column:
master
branch