-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Proper null handling in Aggregation functions for SV data types #9086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9086 +/- ##
============================================
+ Coverage 68.39% 68.68% +0.29%
+ Complexity 4757 4672 -85
============================================
Files 1848 1848
Lines 97900 98386 +486
Branches 14746 14932 +186
============================================
+ Hits 66960 67579 +619
+ Misses 26251 26008 -243
- Partials 4689 4799 +110
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest breaking it into 2 parts, one for the filter and one for the aggregation
| for (Expression operand : operands) { | ||
| arguments.add(getExpression(operand)); | ||
| } | ||
| if (arguments.size() == 0 && functionName.equalsIgnoreCase(AggregationFunctionType.COUNT.getName())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguments should never be empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Count(*) has no operands. Since I allow now count(colName), I removed the if in lines 84-88 and add this if here instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
count(*) should have a single identifier operand with name "*"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is never empty except in one test: testSqlWithFinalRowCountChecker (part of pinot-query-runtime tests).
Basically, pinot-query-runtime causes this issue and send count() without any argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@walterddr I think this is a bug in the multi-stage engine where the argument is not passed for COUNT. Can you please take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok yeah so that pinot-server expects a * on count. got it I can make the fix.
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/SVScanDocIdIterator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/SVScanDocIdIterator.java
Outdated
Show resolved
Hide resolved
...-core/src/main/java/org/apache/pinot/core/operator/filter/RangeIndexBasedFilterOperator.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/PredicateEvaluator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/pinot/core/query/aggregation/function/CountAggregationFunction.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/org/apache/pinot/core/query/aggregation/function/MaxAggregationFunction.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/request/context/RequestContextUtils.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/org/apache/pinot/core/query/aggregation/function/AvgAggregationFunction.java
Show resolved
Hide resolved
...e/src/main/java/org/apache/pinot/core/query/aggregation/function/AvgAggregationFunction.java
Outdated
Show resolved
Hide resolved
| if (blockValSetMap.isEmpty()) { | ||
| aggregationResultHolder.setValue(aggregationResultHolder.getDoubleResult() + length); | ||
| } else if (!blockValSetMap.containsKey(STAR_TREE_COUNT_STAR_EXPRESSION)) { | ||
| if (!_nullHandlingEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this branch be hit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest changing the branching logic as following for clarity:
if (blockValSetMap.isEmpty()) {
...
} else if (_nullHandlingEnabled) {
...
} else {
// Star-tree pre-aggregated values
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to check for: !blockValSetMap.containsKey(STAR_TREE_COUNT_STAR_EXPRESSION)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that? When null handling is enabled, we cannot use star-tree, so we shouldn't need this check right? Basically when the blockValSetMap is empty, either null handling is enabled or star-tree is used, and they should be mutual exclusive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jackie-Jiang We have three branches:
1- blockValSetMap.isEmpty() for count(*), and no StarTree.
2- !blockValSetMap.containsKey(STAR_TREE_COUNT_STAR_EXPRESSION) for count(columnName), and no StarTree (please see the ! at the beginning). Example: count(bigDecimalColumn). Note: we can have count(bigDecimalColumn) and nullHandling is either enable or not enabled.
3- StarTree.
pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/selection/SelectionOperatorUtils.java
Outdated
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/queries/BigDecimalQueriesTest.java
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/queries/BooleanNullEnabledQueriesTest.java
Show resolved
Hide resolved
pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkScanDocIdIterators.java
Outdated
Show resolved
Hide resolved
...al/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/request/context/RequestContextUtils.java
Show resolved
Hide resolved
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost ready to be merged
| if (blockValSetMap.isEmpty()) { | ||
| aggregationResultHolder.setValue(aggregationResultHolder.getDoubleResult() + length); | ||
| } else if (!blockValSetMap.containsKey(STAR_TREE_COUNT_STAR_EXPRESSION)) { | ||
| if (!_nullHandlingEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that? When null handling is enabled, we cannot use star-tree, so we shouldn't need this check right? Basically when the blockValSetMap is empty, either null handling is enabled or star-tree is used, and they should be mutual exclusive
| public void aggregateGroupBySV(int length, int[] groupKeyArray, GroupByResultHolder groupByResultHolder, | ||
| Map<ExpressionContext, BlockValSet> blockValSetMap) { | ||
| if (blockValSetMap.isEmpty()) { | ||
| if (blockValSetMap.isEmpty() || !blockValSetMap.containsKey(STAR_TREE_COUNT_STAR_EXPRESSION)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar here, we should be able to simplify it into 3 conditions:
- no value set
- null handling enabled
- star-tree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jackie-Jiang We have three branches:
1- blockValSetMap.isEmpty() for count(*), and no StarTree.
2- !blockValSetMap.containsKey(STAR_TREE_COUNT_STAR_EXPRESSION) for count(columnName), and no StarTree (please see the ! at the beginning). Example: count(bigDecimalColumn). Note: we can have count(bigDecimalColumn) and nullHandling is either enable or not enabled.
3- StarTree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When both null handling and star-tree are not enabled, will it pass in non-empty blockValSetMap? Ideally it should pass in empty blockValSetMap, or it could add extra overhead for existing use cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I find the reason. When null handling is not enabled, ideally getInputExpressions() should return empty list to preserve the existing behavior, which is changed because we changed it to extend BaseSingleInputAggregationFunction. Since most methods in BaseSingleInputAggregationFunction cannot be reused, suggest not extending it, and keep the existing behavior when null handling is not enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied a commit to demonstrate the idea: b684534
Let's see if it can pass all the tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overhead is should be minimal. I don't have strong opinion either way. Will apply your commit.
...e/src/main/java/org/apache/pinot/core/query/aggregation/function/SumAggregationFunction.java
Outdated
Show resolved
Hide resolved
...n/java/org/apache/pinot/core/query/aggregation/function/SumPrecisionAggregationFunction.java
Outdated
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/queries/BigDecimalQueriesTest.java
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/queries/BooleanNullEnabledQueriesTest.java
Show resolved
Hide resolved
...al/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java
Outdated
Show resolved
Hide resolved
...n/java/org/apache/pinot/core/query/aggregation/function/SumPrecisionAggregationFunction.java
Outdated
Show resolved
Hide resolved
| for (int i = 0; i < length; i++) { | ||
| if (!nullBitmap.contains(i)) { | ||
| // TODO(nhejazi): throw an exception here instead of ignoring infinite values? | ||
| if (Double.isFinite(doubleValues[i])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) Suggest not having this check to keep the behavior consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi: this is gonna fail the whole SumPrecisionAgg function. So the previous behaviour is just throwing an exception and the operation failing.
@Jackie-Jiang Do you wanna still remove this check? I
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the behavior is inconsistent without null handling (throw exception) and with null handling (values ignored). We may revisit this since you already added a TODO here
...e/src/main/java/org/apache/pinot/core/query/aggregation/function/SumAggregationFunction.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/org/apache/pinot/core/query/aggregation/function/SumAggregationFunction.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/org/apache/pinot/core/query/aggregation/function/SumAggregationFunction.java
Outdated
Show resolved
Hide resolved
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! This is a very important feature. Thanks for contributing it!
|
Related to #8697 |
Proper null handling for SV data types in the following aggregation functions:
Added many unit tests (for different data types) for diff. aggregation functions.
feature,release-notes