Skip to content

Conversation

@61yao
Copy link
Contributor

@61yao 61yao commented Mar 4, 2023

This PR preserves the type information of literal in LiteralContext and LiteralTransform.

This means numerical literals will be passed down from broker either as a long or double except for big decimal.

String version of numerical values are not longer accepted. For example, "123" won't be treated as a numerical anymore.

@codecov-commenter
Copy link

codecov-commenter commented Mar 4, 2023

Codecov Report

Merging #10380 (0835a37) into master (d1227e4) will increase coverage by 0.03%.
The diff coverage is 90.30%.

@@             Coverage Diff              @@
##             master   #10380      +/-   ##
============================================
+ Coverage     70.25%   70.29%   +0.03%     
+ Complexity     6117     6114       -3     
============================================
  Files          2070     2070              
  Lines        111953   111966      +13     
  Branches      17052    17051       -1     
============================================
+ Hits          78652    78701      +49     
+ Misses        27760    27728      -32     
+ Partials       5541     5537       -4     
Flag Coverage Δ
integration1 24.48% <34.54%> (-0.05%) ⬇️
integration2 24.32% <35.15%> (+0.06%) ⬆️
unittests1 67.93% <88.48%> (+0.02%) ⬆️
unittests2 13.98% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...inot/common/request/context/ExpressionContext.java 84.61% <ø> (+1.28%) ⬆️
.../pinot/core/query/reduce/GapfillFilterHandler.java 85.71% <0.00%> (ø)
...ot/core/operator/filter/H3IndexFilterOperator.java 82.47% <50.00%> (ø)
...form/function/MultiplicationTransformFunction.java 93.10% <50.00%> (ø)
...ansform/function/SubtractionTransformFunction.java 89.39% <50.00%> (ø)
...ache/pinot/core/query/reduce/GapfillProcessor.java 90.86% <50.00%> (ø)
...transform/function/DateTruncTransformFunction.java 83.87% <75.00%> (ø)
.../transform/function/CoalesceTransformFunction.java 85.20% <80.00%> (+0.96%) ⬆️
...r/transform/function/LiteralTransformFunction.java 63.09% <81.81%> (-8.06%) ⬇️
...e/pinot/common/request/context/LiteralContext.java 85.00% <91.48%> (+24.47%) ⬆️
... and 39 more

... and 31 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Jackie-Jiang Jackie-Jiang added feature backward-incompat Referenced by PRs that introduce or fix backward compat issues enhancement labels Mar 7, 2023
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

if (left.isNumeric() && right.isNumeric())

then remove Preconditions.checkState on the next line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: new line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: new line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: testCaseTransformFunctionWithFloatResults -> testCaseTransformFunctionWithDoubleResults

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@gortiz
Copy link
Contributor

gortiz commented Mar 21, 2023

These changes already include the ones in #10376, right?

Copy link
Contributor

@gortiz gortiz Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me what is the difference between _value and _bigDecimalValue in each case.

  • If literal.getSetField is DOUBLE_VALUE or LONG_VALUE, _value is either a Double or a Long.
  • If literal.getSetField is STRING_VALUE:
    • And the literal is an actual number (like "123" or "1.3"), then _value is the more specific implementation Number possible.
    • And the literal is not an actual number:
      * If it is a boolean, _value is a String like "true" or "false" and _bigDecimanValue is 1 or 0.
      * If it is a timestamp, _value is a String (why?) like "21312312" and _bigDecimanValue is the representation as a BigDecimal.

Then _bigDecimalValue is only used in getXValue where X is Int, Double or BigDecimal.

For the context, BigDecimal instances are quite large. They contain several attributes and therefore they consume a significant amount of memory. I guess normal queries should not have tons of instances of LiteralContext, but we will receive degenerated queries like where X in [list with hundreds/thousand of elements] or where X = L1 or X = L2 or ... X = L1000. I had to deal with these cases in other databases and suddenly inefficiencies that looked innocent imply OOMs.

It also seems that getXValue are always being called in the initialization of the transformation functions. Therefore they are not in the hotpath and catching the BigDecimal doesn't seem to be that useful. This is why I would recommend to:

  • In getIntValue and getDoubleValue:
    • When _value is instance of Number, cast it and call the appropriate method.
    • Otherwise, calculate the value lazily each time the method is called.
  • In getBigDecimalValue case I would just calculate the value lazily each time the method is called.

By doing that we may reduce the amount of memory used in degenerated queries like the ones I listed above.

PS: I edited the comment because in the original text I was assuming getXValue was called in the hotpath. After reading the rest of the PR I would say it is not the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_bigDecimalValue is a cached representation for numerical types. We did the same cache in LiteralTransform func so we don't need to calculate numerical representation of value multiple times.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but LiteralContext.getXValue is only called once from LiteralTransform constructor/init, right? Then we don't need to catch it.

I think LiteralTransform shouldn't catch all the values just in case someone needs them. I'm not saying we should change LiteralTransform asap, but it would be great to do not replicate antipatterns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really. for example, LiteralContext.getIntValue() gets called in HistogramAggregationFunction. that's why I cache it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change in the hash code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because originally we are not table to tell "123" and 123. After the type reservation, "123" and 123 are different. thus we add the type encoding into the hashcode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because originally we are not table to tell "123" and 123. After the type reservation

Yeah, but I mean... why do we add the hash of types? That isn't usual when producing hashes. In fact if you do return Objects.hashCode(_value, _type) it will produce better hashCode (at the cost of creating a short living array).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. changed to Objects.hash(...)

Copy link
Contributor

@gortiz gortiz Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: In order to reduce the number of changes... why not just change this code to call _literal.getStringValue()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we want caller to explicitly specify what type they want to use. so they have to use getLiteral().getTypeLiteral()

Copy link
Contributor

@gortiz gortiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change the HashCode function, but apart from that, it looks fine to me.

@61yao 61yao force-pushed the null2 branch 2 times, most recently from 9728bbc to 4547195 Compare March 25, 2023 04:26
@61yao
Copy link
Contributor Author

61yao commented Mar 25, 2023

I would change the HashCode function, but apart from that, it looks fine to me.
Thanks a lot for the review!

@walterddr walterddr merged commit a44d535 into apache:master Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backward-incompat Referenced by PRs that introduce or fix backward compat issues enhancement feature null support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants