-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[feature] [backward-incompat] [null support # 2] Preserve null literal information in literal context and literal transform #10380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #10380 +/- ##
============================================
+ Coverage 70.25% 70.29% +0.03%
+ Complexity 6117 6114 -3
============================================
Files 2070 2070
Lines 111953 111966 +13
Branches 17052 17051 -1
============================================
+ Hits 78652 78701 +49
+ Misses 27760 27728 -32
+ Partials 5541 5537 -4
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 31 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
if (left.isNumeric() && right.isNumeric())
then remove Preconditions.checkState on the next line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: new line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: new line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: testCaseTransformFunctionWithFloatResults -> testCaseTransformFunctionWithDoubleResults
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
These changes already include the ones in #10376, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me what is the difference between _value and _bigDecimalValue in each case.
- If literal.getSetField is
DOUBLE_VALUEorLONG_VALUE,_valueis either aDoubleor aLong. - If literal.getSetField is
STRING_VALUE:- And the literal is an actual number (like
"123"or"1.3"), then_valueis the more specific implementationNumberpossible. - And the literal is not an actual number:
* If it is a boolean,_valueis a String like"true"or"false"and_bigDecimanValueis1or0.
* If it is a timestamp,_valueis a String (why?) like "21312312" and_bigDecimanValueis the representation as a BigDecimal.
- And the literal is an actual number (like
Then _bigDecimalValue is only used in getXValue where X is Int, Double or BigDecimal.
For the context, BigDecimal instances are quite large. They contain several attributes and therefore they consume a significant amount of memory. I guess normal queries should not have tons of instances of LiteralContext, but we will receive degenerated queries like where X in [list with hundreds/thousand of elements] or where X = L1 or X = L2 or ... X = L1000. I had to deal with these cases in other databases and suddenly inefficiencies that looked innocent imply OOMs.
It also seems that getXValue are always being called in the initialization of the transformation functions. Therefore they are not in the hotpath and catching the BigDecimal doesn't seem to be that useful. This is why I would recommend to:
- In
getIntValueandgetDoubleValue:- When
_valueis instance ofNumber, cast it and call the appropriate method. - Otherwise, calculate the value lazily each time the method is called.
- When
- In
getBigDecimalValuecase I would just calculate the value lazily each time the method is called.
By doing that we may reduce the amount of memory used in degenerated queries like the ones I listed above.
PS: I edited the comment because in the original text I was assuming getXValue was called in the hotpath. After reading the rest of the PR I would say it is not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_bigDecimalValue is a cached representation for numerical types. We did the same cache in LiteralTransform func so we don't need to calculate numerical representation of value multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, but LiteralContext.getXValue is only called once from LiteralTransform constructor/init, right? Then we don't need to catch it.
I think LiteralTransform shouldn't catch all the values just in case someone needs them. I'm not saying we should change LiteralTransform asap, but it would be great to do not replicate antipatterns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really. for example, LiteralContext.getIntValue() gets called in HistogramAggregationFunction. that's why I cache it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change in the hash code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because originally we are not table to tell "123" and 123. After the type reservation, "123" and 123 are different. thus we add the type encoding into the hashcode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because originally we are not table to tell "123" and 123. After the type reservation
Yeah, but I mean... why do we add the hash of types? That isn't usual when producing hashes. In fact if you do return Objects.hashCode(_value, _type) it will produce better hashCode (at the cost of creating a short living array).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. changed to Objects.hash(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: In order to reduce the number of changes... why not just change this code to call _literal.getStringValue()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because we want caller to explicitly specify what type they want to use. so they have to use getLiteral().getTypeLiteral()
gortiz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change the HashCode function, but apart from that, it looks fine to me.
9728bbc to
4547195
Compare
|
This PR preserves the type information of literal in LiteralContext and LiteralTransform.
This means numerical literals will be passed down from broker either as a long or double except for big decimal.
String version of numerical values are not longer accepted. For example, "123" won't be treated as a numerical anymore.