Skip to content

Conversation

@richardstartin
Copy link
Member

@richardstartin richardstartin commented Apr 13, 2022

Fixes #8534

Allows CompileTimeFunctionsInvoker to perform casts on literals and constant functions of literals, which ensures indexes and metadata can be used in queries.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see we can leverage the PinotDataType to make it so simple

@codecov-commenter
Copy link

codecov-commenter commented Apr 13, 2022

Codecov Report

Merging #8535 (6e1f35d) into master (900f01f) will decrease coverage by 6.62%.
The diff coverage is 35.57%.

❗ Current head 6e1f35d differs from pull request most recent head bab4481. Consider uploading reports for the commit bab4481 to get more accurate results

@@             Coverage Diff              @@
##             master    #8535      +/-   ##
============================================
- Coverage     70.65%   64.03%   -6.63%     
- Complexity     4286     4292       +6     
============================================
  Files          1674     1636      -38     
  Lines         87743    86054    -1689     
  Branches      13279    13115     -164     
============================================
- Hits          61999    55102    -6897     
- Misses        21439    26956    +5517     
+ Partials       4305     3996     -309     
Flag Coverage Δ
integration1 ?
integration2 ?
unittests1 67.02% <64.53%> (+0.03%) ⬆️
unittests2 14.05% <5.03%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...g/apache/pinot/common/metrics/ControllerMeter.java 0.00% <0.00%> (-100.00%) ⬇️
...apache/pinot/common/utils/SqlResultComparator.java 0.00% <0.00%> (-19.09%) ⬇️
.../apache/pinot/sql/parsers/parser/UnparseUtils.java 0.00% <0.00%> (ø)
...roller/api/exception/NoTaskScheduledException.java 0.00% <0.00%> (ø)
...ller/api/exception/TaskAlreadyExistsException.java 0.00% <0.00%> (ø)
...roller/api/exception/UnknownTaskTypeException.java 0.00% <0.00%> (ø)
...roller/api/resources/PinotTaskRestletResource.java 0.00% <0.00%> (ø)
...lix/core/minion/PinotHelixTaskResourceManager.java 2.30% <0.00%> (-38.86%) ⬇️
...controller/helix/core/minion/PinotTaskManager.java 40.25% <0.00%> (-27.27%) ⬇️
...helix/core/minion/generator/BaseTaskGenerator.java 57.14% <0.00%> (-32.86%) ⬇️
... and 442 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 900f01f...bab4481. Read the comment docs.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently CastTransformFunction doesn't support multi-valued column casting. We should keep the scalar function and transform function consistent (either both support MV or both not support MV). IMO multi-value handling is a little bit tricky because of the array type (primitive vs boxed), and we should add some queries test to ensure it works end-to-end and covers ingestion, compilation, aggregation and post-aggregation. I'd suggest first adding the basic support for SV, and then having a separate PR to add the MV support

@amrishlal
Copy link
Contributor

Some thoughts:

  1. one option may be to drop the cast function if it is casting something to a numeric type. For example intColumn > cast(10.3 as long) could just be rewritten to intColumn > 10.3 and things would still work fine.
  2. we should probably throw a syntax error if user tries to do something like intColumn > '12.3' or intColumn > cast('10.3' as INT) (casting a string literal to an int value) and that will encourage the user to write query with proper literal values.
  3. A cast function is probably more useful during run time if for example the user wants to force cast string column values into numerical values (for example WHERE CAST(stringColumn AS INTEGER) > intColumn. Otherwise, as far as compile time literals are concerned, it seems like just replacing the CAST function with literal would work? If the literal type is not compatible, then an exception will be thrown which will encourage user to fix the literal in the query?

@richardstartin
Copy link
Member Author

Currently CastTransformFunction doesn't support multi-valued column casting. We should keep the scalar function and transform function consistent (either both support MV or both not support MV). IMO multi-value handling is a little bit tricky because of the array type (primitive vs boxed), and we should add some queries test to ensure it works end-to-end and covers ingestion, compilation, aggregation and post-aggregation. I'd suggest first adding the basic support for SV, and then having a separate PR to add the MV support

I agree that dropping MV support would simplify this a lot.

@richardstartin
Copy link
Member Author

richardstartin commented Apr 14, 2022

Some thoughts:

  1. one option may be to drop the cast function if it is casting something to a numeric type. For example intColumn > cast(10.3 as long) could just be rewritten to intColumn > 10.3 and things would still work fine.
  2. we should probably throw a syntax error if user tries to do something like intColumn > '12.3' or intColumn > cast('10.3' as INT) (casting a string literal to an int value) and that will encourage the user to write query with proper literal values.
  3. A cast function is probably more useful during run time if for example the user wants to force cast string column values into numerical values (for example WHERE CAST(stringColumn AS INTEGER) > intColumn. Otherwise, as far as compile time literals are concerned, it seems like just replacing the CAST function with literal would work? If the literal type is not compatible, then an exception will be thrown which will encourage user to fix the literal in the query?

I agree, this is essentially what adding this scalar function allows an existing mechanism to do. This change allows replacing the cast function over a literal or over another function which can be calculated at compile time with the literal, by making a function available during query compilation - see CompileTimeFunctionsInvoker.

@richardstartin
Copy link
Member Author

Dropped MV support.

Testing for:

  • unit
  • calcite sql compilation
  • use post aggregation, in filters
  • consistency with transform function used for projections

}

@Test
public void testCastSum() {
Copy link
Contributor

@walterddr walterddr Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CAST is not being tested in this SQL because the operator chain compiled in server side doesn't do the casting (it is actually done in PostAggregationFunction.java)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I realise it isn't being called here.

}

@ScalarFunction
public static Object cast(Object value, String targetTypeLiteral) {
Copy link
Contributor

@walterddr walterddr Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScalarFunction / FunctionInvoker doesn't provide a dynamic return type resolution at runtime --> e.g. determine what's being returned based on the argument passed in. which is exactly what's needed here.

The SQL in the non-literal test cases are actually compiled into CastTransformFunction which does support result metadata --> that's dynamically determined.

What's missing here is a "CAST" that's not a transform function, which require changes to function invoker.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This this works fine when invoked in a filter to handle the case this PR targets - compile time replacement. You're right that it's not getting invoked after aggregation functions though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can follow up with a separate fix if you agree @richardstartin

@walterddr
Copy link
Contributor

walterddr commented Apr 14, 2022

and we should add some queries test to ensure it works end-to-end and covers ingestion, compilation, aggregation and post-aggregation. I'd suggest first adding the basic support for SV, and then having a separate PR to add the MV support

Agreed @Jackie-Jiang
but post-aggregation is not supported (described in #8424) which should be a separate fix IMO

@richardstartin
Copy link
Member Author

Can we get this merged then? @Jackie-Jiang @walterddr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CAST function is not calculated at compile time which prevents usage of index/metadata

6 participants