Skip to content

Conversation

@Jackie-Jiang
Copy link
Contributor

  • Use argument types instead of argument count to lookup functions in FunctionRegistry to support polymorphism
  • Extract common code for extracting literal type and value from thrift Literal
  • Fix bug on array type response type
  • Format the literal to comply with broker response format

@Jackie-Jiang Jackie-Jiang added enhancement bugfix multi-stage Related to the multi-stage query engine labels Jul 21, 2024
@Jackie-Jiang Jackie-Jiang requested review from gortiz and xiangfu0 July 21, 2024 22:16
@codecov-commenter
Copy link

codecov-commenter commented Jul 21, 2024

Codecov Report

Attention: Patch coverage is 50.98039% with 50 lines in your changes missing coverage. Please review.

Project coverage is 57.53%. Comparing base (59551e4) to head (007c9c2).
Report is 954 commits behind head on master.

Files with missing lines Patch % Lines
...pache/pinot/common/utils/request/RequestUtils.java 14.89% 39 Missing and 1 partial ⚠️
...e/pinot/common/request/context/LiteralContext.java 0.00% 5 Missing ⚠️
...sthandler/BaseSingleStageBrokerRequestHandler.java 88.46% 1 Missing and 2 partials ⚠️
.../parsers/rewriter/CompileTimeFunctionsInvoker.java 91.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13673      +/-   ##
============================================
- Coverage     61.75%   57.53%   -4.23%     
+ Complexity      207      197      -10     
============================================
  Files          2436     2580     +144     
  Lines        133233   142424    +9191     
  Branches      20636    22117    +1481     
============================================
- Hits          82274    81939     -335     
- Misses        44911    53985    +9074     
- Partials       6048     6500     +452     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 57.51% <50.98%> (-4.20%) ⬇️
java-21 57.42% <50.98%> (-4.21%) ⬇️
skip-bytebuffers-false 57.52% <50.98%> (-4.22%) ⬇️
skip-bytebuffers-true 57.39% <50.98%> (+29.66%) ⬆️
temurin 57.53% <50.98%> (-4.23%) ⬇️
unittests 57.52% <50.98%> (-4.22%) ⬇️
unittests1 40.21% <38.15%> (-6.69%) ⬇️
unittests2 27.84% <22.54%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.



public class CompileTimeFunctionsInvoker implements QueryRewriter {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If #13711 is merged first, can we update the test for this query rewriter to also include cases where the polymorphic scalar comparison functions are invoked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, and it works as expected ;-)

operands.set(i, operand);
Literal literal = operand.getLiteral();
if (compilable && literal != null) {
RequestUtils.getLiteralTypeAndValue(literal, argumentTypes, arguments, i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to return a pair of {argumentType, argument} from RequestUtils.getLiteralTypeAndValue instead of modifying the array arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially did that, then trying to safe a Object by passing in the arrays (also explains the unused import :P). I guess it might be over optimization. Changing it back to using pair to be more readable

if (operand.getLiteral() == null) {
operands.set(i, operand);
Literal literal = operand.getLiteral();
if (compilable && literal != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just directly return expression where we set compilable to false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a recursive call (DFS). Even if we find the current level is not compilable, we still want to check the child level. Added some comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that totally makes sense, not sure how I missed that earlier.

Comment on lines 364 to 370
List<Integer> intList = literal.getIntArrayValue();
int numInts = intList.size();
int[] intArray = new int[numInts];
for (int i = 0; i < numInts; i++) {
intArray[i] = intList.get(i);
}
return intArray;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these could be added as util methods in ArrayListUtils.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there's similar logic duplicated in LiteralContext as well -

case INT_ARRAY_VALUE: {
_type = DataType.INT;
List<Integer> valueList = literal.getIntArrayValue();
int numValues = valueList.size();
int[] values = new int[numValues];
for (int i = 0; i < numValues; i++) {
values[i] = valueList.get(i);
}
_value = values;
_pinotDataType = PinotDataType.PRIMITIVE_INT_ARRAY;
break;
}
case LONG_ARRAY_VALUE: {
_type = DataType.LONG;
List<Long> valueList = literal.getLongArrayValue();
int numValues = valueList.size();
long[] values = new long[numValues];
for (int i = 0; i < numValues; i++) {
values[i] = valueList.get(i);
}
_value = values;
_pinotDataType = PinotDataType.PRIMITIVE_LONG_ARRAY;
break;
}
case FLOAT_ARRAY_VALUE: {
_type = DataType.FLOAT;
List<Integer> valueList = literal.getFloatArrayValue();
int numValues = valueList.size();
float[] values = new float[numValues];
for (int i = 0; i < numValues; i++) {
values[i] = Float.intBitsToFloat(valueList.get(i));
}
_value = values;
_pinotDataType = PinotDataType.PRIMITIVE_FLOAT_ARRAY;
break;
}
case DOUBLE_ARRAY_VALUE: {
_type = DataType.DOUBLE;
List<Double> valueList = literal.getDoubleArrayValue();
int numValues = valueList.size();
double[] values = new double[numValues];
for (int i = 0; i < numValues; i++) {
values[i] = valueList.get(i);
}
_value = values;
_pinotDataType = PinotDataType.PRIMITIVE_DOUBLE_ARRAY;
break;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I didn't move it to ArrayListUtils because of the special handling for float array (we use int array to send float array), but expose these methods as util and simplified LiteralContext

return expression;
}
String canonicalName = FunctionRegistry.canonicalize(function.getOperator());
FunctionInfo functionInfo = FunctionRegistry.lookupFunctionInfo(canonicalName, argumentTypes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we might run into some type conversion issues here. For instance, a polymorphic scalar comparison function may implement support for comparing two arguments of the same type (see #13711 for instance). Earlier with the argument count based function lookup, if the two arguments were of a different type, the scalar comparison function invocation might still have worked based on the type conversion in FunctionInvoker. But now, each such polymorphic scalar function implementation would need to either account for all these different type combinations or else fall back to a default implementation and then rely on the FunctionInvoker's type conversion in case of heterogeneous argument types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scalar function should take care of all possible combinations of argument types, similar to the backward compatible logic in PolymorphicComparisonScalarFunction when argument type is different

@Jackie-Jiang Jackie-Jiang force-pushed the literal_only_argument_type branch 2 times, most recently from 3d10494 to 091a535 Compare August 21, 2024 23:23
Comment on lines 1516 to 1519
if (expression.getType() == ExpressionType.LITERAL) {
computeResultsForLiteral(expression.getLiteral(), columnNames, columnTypes, values, index);
}
if (expression.getType() == ExpressionType.FUNCTION) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: AFAIU computeResultsForLiteral cannot change expressiong.getType(). Am I right? In that case can we add a else to this if (expression.getType() == ExpressionType.FUNCTION) { to improve readability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

@gortiz
Copy link
Contributor

gortiz commented Aug 28, 2024

Question: This PR modifies the way functions are resolved, but do not change semantics because right now we don't have overload functions right? ie we have a single sum function for doubles.

Once we add overload functions into our registry (ie sum for int, sum for longs, etc) queries that previously returned a double may return different types. Am I right?

@Jackie-Jiang
Copy link
Contributor Author

Question: This PR modifies the way functions are resolved, but do not change semantics because right now we don't have overload functions right? ie we have a single sum function for doubles.

Once we add overload functions into our registry (ie sum for int, sum for longs, etc) queries that previously returned a double may return different types. Am I right?

This PR only affects scalar functions. Aggregates are handled differently. Currently in multi-stage engine, results are converted into the correct type (e.g. result type of SUM(longCol) is converted from double to long before sending to the next stage).
See the tests to see the behavior change on comparisons for different data types

@Jackie-Jiang Jackie-Jiang force-pushed the literal_only_argument_type branch from 091a535 to 007c9c2 Compare August 28, 2024 17:34
@Jackie-Jiang Jackie-Jiang merged commit f8de958 into apache:master Aug 28, 2024
@Jackie-Jiang Jackie-Jiang deleted the literal_only_argument_type branch August 28, 2024 18:50
rajagopr pushed a commit to rajagopr/pinot that referenced this pull request Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix enhancement multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants