Skip to content

Conversation

@yashmayya
Copy link
Contributor

  • Currently, a query like SELECT JSON_EXTRACT_SCALAR(payload_commits, '$[*].author.name', 'STRING_ARRAY') FROM github_events fails on the multi-stage query engine with an error like: Invalid type: STRING_ARRAY.
  • This is because the type inference from string literals in TransformFunctionType doesn't cover all the types supported by JSON_EXTRACT_SCALAR / JSON_EXTRACT_INDEX.
  • This patch fixes the above issue by covering all the supported types taken from here -
    DataType dataType;
    try {
    dataType = DataType.valueOf(isSingleValue ? resultsType : resultsType.substring(0, resultsType.length() - 6));
    } catch (Exception e) {
    throw new IllegalArgumentException(String.format(
    "Unsupported results type: %s for jsonExtractScalar function. Supported types are: "
    + "INT/LONG/FLOAT/DOUBLE/BOOLEAN/BIG_DECIMAL/TIMESTAMP/STRING/INT_ARRAY/LONG_ARRAY/FLOAT_ARRAY"
    + "/DOUBLE_ARRAY/STRING_ARRAY", resultsType));
    }
    .

…for JSON extract functions in the multi-stage query engine
@yashmayya yashmayya added bugfix multi-stage Related to the multi-stage query engine labels Oct 24, 2024
@codecov-commenter
Copy link

codecov-commenter commented Oct 24, 2024

Codecov Report

Attention: Patch coverage is 85.71429% with 1 line in your changes missing coverage. Please review.

Project coverage is 63.77%. Comparing base (59551e4) to head (b987037).
Report is 1267 commits behind head on master.

Files with missing lines Patch % Lines
...e/pinot/common/function/TransformFunctionType.java 85.71% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14289      +/-   ##
============================================
+ Coverage     61.75%   63.77%   +2.02%     
- Complexity      207     1556    +1349     
============================================
  Files          2436     2660     +224     
  Lines        133233   145837   +12604     
  Branches      20636    22313    +1677     
============================================
+ Hits          82274    93012   +10738     
- Misses        44911    45961    +1050     
- Partials       6048     6864     +816     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.75% <85.71%> (+2.04%) ⬆️
java-21 63.66% <85.71%> (+2.04%) ⬆️
skip-bytebuffers-false 63.76% <85.71%> (+2.02%) ⬆️
skip-bytebuffers-true 63.65% <85.71%> (+35.92%) ⬆️
temurin 63.77% <85.71%> (+2.02%) ⬆️
unittests 63.77% <85.71%> (+2.02%) ⬆️
unittests1 55.47% <85.71%> (+8.57%) ⬆️
unittests2 34.16% <0.00%> (+6.43%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

}

private static RelDataType inferTypeFromStringLiteral(String operandTypeStr, RelDataTypeFactory typeFactory) {
switch (operandTypeStr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the list also contain:

  • BOOLEAN
  • TIMESTAMP
  • FLOAT
  • DOUBLE
    ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are handled in the default branch.

"SELECT /*+ aggOptions(is_skip_leaf_stage_group_by='true') */ a.col2, a.col3 FROM a JOIN b "
+ "ON a.col1 = b.col1 WHERE a.col3 >= 0 GROUP BY a.col2, a.col3"
},
new Object[]{"SELECT ROUND(ts_timestamp, 10000) FROM a"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be good add tests for other result types.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

In Calcite, FLOAT is a synonym for DOUBLE (see here), and we'll need to map FLOAT in pinot to REAL.
Suggest adding a test to verify all supported types

@yashmayya
Copy link
Contributor Author

In Calcite, FLOAT is a synonym for DOUBLE (see here), and we'll need to map FLOAT in pinot to REAL

Ah yeah, I did do this for FLOAT_ARRAY, but missed adding it for FLOAT.

Suggest adding a test to verify all supported types

I've added test cases for all supported types (from here) to QueryCompilationTest which will verify that the Calcite type inference is valid and that the query compiles. There's already existing tests for the transform function itself so I don't think we need to validate that separately for the multi-stage engine.

@Jackie-Jiang Jackie-Jiang merged commit e61d503 into apache:master Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants