Upgrade Calcite to 1.39.0 #15263

yashmayya · 2025-03-13T14:39:57Z

Upgrades Calcite from 1.37.0 to 1.39.0.
Apart from the parser changes, some of the notable changes are:
- Casts to decimal were unintentionally ignored earlier, and this was fixed in Calcite 1.38.0 (https://issues.apache.org/jira/browse/CALCITE-6322) leading to a number of changes in the type handling logic in Pinot in order to maintain consistency with previous releases. In the long term though, we need to relook at our handling of the DECIMAL type and add first class support for it in the query engine. Currently, it's usually converted to DOUBLE leading to loss in precision. Another DECIMAL related change is the way default precision and scale are handled (specifically, the meaning of the value 0) that changes some expected query plans in tests.
- Previously, floating point literal values were encoded into BigDecimal values. After https://issues.apache.org/jira/browse/CALCITE-2067, RexLiteral stores the value of a SQL DOUBLE, FLOAT or REAL values using a Java double type.
- Some minor changes to window functions related classes due to the introduction of the EXCLUDE clause in window aggregates (https://issues.apache.org/jira/browse/CALCITE-5855). We can choose to add support for these clauses in our window function implementations in the future. For now, we will stick to the default value of EXCLUDE_NO_OTHER.
- Some minor changes to type handling for search arguments that required changing expected query plans in tests.
The major motivation for this upgrade is Multi-stage: Perf issue when IN expression has a lot of entries #13617 which is resolved by this upgrade (and Improve performance of multi-stage queries with large IN clauses #14615).

codecov-commenter · 2025-03-13T15:22:50Z

Codecov Report

Attention: Patch coverage is 66.66667% with 11 lines in your changes missing coverage. Please review.

Project coverage is 62.80%. Comparing base (2d8d8c3) to head (cd76846).
Report is 9 commits behind head on master.

Files with missing lines	Patch %	Lines
...ot/calcite/rel/rules/PinotEvaluateLiteralRule.java	57.14%	2 Missing and 4 partials ⚠️
...inot/query/planner/logical/RexExpressionUtils.java	33.33%	4 Missing ⚠️
...n/java/org/apache/pinot/query/type/TypeSystem.java	75.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #15263      +/-   ##
============================================
- Coverage     62.82%   62.80%   -0.02%     
- Complexity     1384     1386       +2     
============================================
  Files          2864     2864              
  Lines        162671   163677    +1006     
  Branches      24902    25065     +163     
============================================
+ Hits         102191   102804     +613     
- Misses        52782    53157     +375     
- Partials       7698     7716      +18

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (ø)`
integration	`100.00% <ø> (ø)`
integration1	`100.00% <ø> (ø)`
integration2	`0.00% <ø> (ø)`
java-11	`62.74% <66.66%> (-0.03%)`	⬇️
java-21	`62.79% <66.66%> (-0.02%)`	⬇️
skip-bytebuffers-false	`62.80% <66.66%> (-0.01%)`	⬇️
skip-bytebuffers-true	`62.72% <66.66%> (-0.05%)`	⬇️
temurin	`62.80% <66.66%> (-0.02%)`	⬇️
unittests	`62.80% <66.66%> (-0.02%)`	⬇️
unittests1	`55.69% <66.66%> (-0.09%)`	⬇️
unittests2	`33.73% <18.18%> (+0.20%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ankitsultana · 2025-03-13T20:15:03Z

@yashmayya : I suppose you folks must have considered this already, but for the large IN clause optimization, have you folks thought of rewriting the IN to a UDF in perhaps the first optimization rule that gets run in the optimization phase?

The UDF could be expanded back to an IN clause in the end to ensure there are no behavior changes.

yashmayya · 2025-03-14T04:44:17Z

@ankitsultana the large IN clause compilation issue comes from two sources. One is in certain optimization rules like CoreRules.FILTER_REDUCE_EXPRESSIONS and PinotSortExchangeNodeInsertRule when run after PinotFilterExpandSearchRule. This was fixed in #14615 (after the original fix in #13614 was unintentionally undone in #14448). The other source of the issue was in the SqlToRelConverter phase when converting a tree of SqlNodes to a tree of RelNodes which is even before any of our optimization rules can be applied.

You can attach a profiler to this test on master to see this -

pinot/pinot-query-planner/src/test/java/org/apache/pinot/query/QueryCompilationTest.java

Line 490 in ebfa236

public void testLargeIn() {

This issue has been fixed in Calcite 1.39.0. In the long run though, we need to explore using the default value for expand here since the value true we're using has been deprecated. Setting expand to false keeps IN clauses with literals as IN clauses when we also set InSubQueryThreshold to 0 (i.e., no conversion to a JOIN or to an OR list of = expressions) although it also introduces a bunch of things that Pinot can't currently handle, for instance planning a correlated subquery using a SCALAR_QUERY filter.

yashmayya

We don't currently have a framework to add actual performance regression tests, but I was wondering whether it would be worth it to modify this test to add some assertions on compilation time (we could increase the number of values in the IN clause and use a fairly conservative value on the expected time to hopefully prevent CI flakiness).

gortiz

Added some comments. I think we needs to study the default scale, but apart from that is fine

...t-query-planner/src/main/java/org/apache/pinot/query/planner/logical/RexExpressionUtils.java

gortiz · 2025-03-17T11:36:01Z

pinot-query-planner/src/test/resources/queries/JoinPlans.json

          "\n  LogicalJoin(condition=[AND(=($0, $3), =($1, $4), >($2, $5))], joinType=[inner])",
          "\n    PinotLogicalExchange(distribution=[hash[0, 1]])",
-          "\n      LogicalProject(col1=[$0], col2=[$1], col4=[$3])",
+          "\n      LogicalProject(col1=[$0], col2=[$1], EXPR$0=[CAST($3):DECIMAL(1000, 982) NOT NULL])",


Do we know why the precision is 982?

Seems somehow related to this (#11151) because the values in the cast in this plan change depending on the values there. But even removing those overrides altogether (deriveDecimal*Type) results in a cast of DECIMAL(1000, 980) being added so there's something else going on here from the Calcite side as well. a.col4 is a BIG_DECIMAL type column to which we're applying the default scale and precision of 1000 (each); if we use an INTEGER column instead (a.col3) the cast used is DECIMAL(19, 1) which makes sense because on the other side we're multiplying 0.5 with a BIGINT value (result of the sum).

The scale is 982 btw, the precision is 1000 here.

I found out that this is actually coming from type coercion during the SQL -> Rel conversion phase, where the casts are being applied because the comparison (>) is being made on different types. One is DECIMAL(1000, 1000) (the BigDecimal column using default scale / precision) and the other is DECIMAL(19, 1) (0.5 is DECIMAL(2, 1) and we're multiplying it with a BIGINT value). The logic that computes the resultant precision and scale when coercing two decimal types is here - https://github.com/apache/calcite/blob/9b51667eebc825d5d3ece72cfcdb5d2efaa2d56f/core/src/main/java/org/apache/calcite/sql/type/SqlTypeFactoryImpl.java#L460-L498.

It's a little convoluted, but the resultant scale of 982 is arrived at like this:

Math.min(maxScale, Math.min(Math.max(s1, s2), maxPrecision - Math.min(maxPrecision, Math.max(p1 - s1, p2 - s2))))

So this unexpected value does seem to be because we're using the same max and default value for scale and precision, but I'm not convinced that this is an issue given the large values we're using. For reference, Calcite seems to be using 0 as the default scale for DECIMAL types and max precision for default precision (which is 19 in Calcite).

gortiz · 2025-03-17T12:01:38Z

pinot-query-planner/src/main/java/org/apache/pinot/query/type/TypeSystem.java


+  @Override
+  public int getDefaultPrecision(SqlTypeName typeName) {
+    return typeName == SqlTypeName.DECIMAL ? MAX_DECIMAL_PRECISION : super.getDefaultPrecision(typeName);


I don't think this is correct. IICU this means our default is DECIMAL(1000, 2000), which means use a 1000 digits as total number of digits (both before and after the decimal point) and 1000 digits to the right of the decimal point. Therefore we could only store values in the range of (-1, 1).

Also, I don't know the implications these digits may have, but in BigDecimal scales from 1 to 10 (or -1 to -10, but we don't care about them) are more efficient than the others

This is a good question - TBH, I just copied this part from @Jackie-Jiang's draft PR for the 1.38.0 upgrade - #15141. I can spend some time to study all the implications here and figure out the best way forward.

Okay so using a smaller default scale / precision in the range of 1-10 doesn't seem like a great idea. That would truncate values unnecessarily. For instance, if we used a default scale of 2, SELECT CAST(123.4567 AS DECIMAL) FROM mytable LIMIT 1; would return 123.45 - Postgres returns 123.4567 for reference. So I think it makes sense to use our max scale / precision as the default scale / precision as well to avoid unnecessarily truncating / rounding values?

Also using a max / default precision and scale of 1000 doesn't mean we're limited to values in the range of (-1, 1). That would only be the case if we're using 1000 digits after the decimal point. But for instance, a value like 1234567890.0123456789 that has precision 20 and scale 10 can be represented just fine as well.

On a side note, the weird and IMO inexplicable logic we have here causes strange and unexpected results - if a decimal literal's precision is <= 30 (with non-zero scale), we use Double instead of BigDecimal leading to imprecision, so we actually see perfectly accurate results only if using numbers with precision > 30. But this is orthogonal to the Calcite upgrade and we can discuss / solve this separately.

Edit: Another important point is that the default default DECIMAL scale of 0 actually does mean a 0 scale, and not unlimited - i.e., CAST(123.4567 AS DECIMAL) would become 123.

Btw our default here is actually DECIMAL(1000, 1000) - precision of 1000, and scale of 1000. DECIMAL(1000, 2000) is impossible because scale can never be larger than precision. Our default means that we can have a total of 1000 significant digits, but also up to 1000 digits after the decimal. We don't allow specifying precision and scale for our BIG_DECIMAL columns and FWIW Postgres also allows NUMERIC without explicit precision and scale to mean an "unconstrained" numeric - https://www.postgresql.org/docs/current/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL. Postgres also supports a maximum specified precision of 1000 (and scale can be between -1000 to 1000).

Sorry, I wanted to say DECIMAL(1000, 1000).

Also using a max / default precision and scale of 1000 doesn't mean we're limited to values in the range of (-1, 1).

I think that is wrong. For example in Posgres:

postgres=# select cast(10 as DECIMAL(10, 10)); ERROR: numeric field overflow DETAIL: A field with precision 10, scale 10 must round to an absolute value less than 1. postgres=# select cast(10 as DECIMAL(10, 0)); numeric --------- 10 (1 row) postgres=# select cast(10 as DECIMAL(1, 0)); ERROR: numeric field overflow DETAIL: A field with precision 1, scale 0 must round to an absolute value less than 10^1. postgres=# select cast(10 as DECIMAL(2, 0)); numeric --------- 10 (1 row) postgres=# select cast(10 as DECIMAL(2, 1)); ERROR: numeric field overflow DETAIL: A field with precision 2, scale 1 must round to an absolute value less than 10^1.

AFAIU DECIMAL(1000, 1000) can only represent from -0.999... to 0.999... with 1000 digits at the right of the decimal point. If that is the case we must change the default used for scale. And even if it isn't... shoudln't we use a smaller scale by default? That would be more efficient once implemented as BigDecimal

You're right, the reason it sort of works is because of further Pinot specific oddities. For small precision / scales, we use DOUBLE instead of BIG_DECIMAL for DECIMAL -

pinot/pinot-query-planner/src/main/java/org/apache/pinot/query/planner/logical/RelToPlanNodeConverter.java

Lines 456 to 460 in c5514f6

} else {

// NOTE: Do not use FLOAT to represent DECIMAL to be consistent with single-stage engine behavior.

// See {@link RequestUtils#getLiteralExpression(SqlLiteral)}.

if (precision <= 30) {

return isArray ? ColumnDataType.DOUBLE_ARRAY : ColumnDataType.DOUBLE;

So here, the precision / scale won't be effectively enforced because we're using floating point representations. And then for larger precisions / scales as well, we don't actually enforce that while casting -

pinot/pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/LiteralTransformFunction.java

Line 194 in c5514f6

Arrays.fill(bigDecimalResult, getBigDecimalLiteral());

This just converts whatever type it is to the appropriate BigDecimal value without enforcing the specified precision / scales. It's basically yet another place where our support for the SQL DECIMAL type is lacking.

However, using a smaller default scale for DECIMAL will break backward compatibility for us. Take this example from one of our tests - SELECT CAST('12345678901234567890123456789.1234567890123456789' AS DECIMAL) FROM a. This currently returns 12345678901234567890123456789.1234567890123456789, but if we reduce the default scale to 10, this will become 12345678901234567890123456789.1234567890 because Calcite seems to be evaluating such calls by itself in SqlToRelConverter and Pinot's cast function won't be called at all. Are we okay with such a change?

cc - @Jackie-Jiang for an additional opinion as well.

Then, let's use other values larger than 10 without using the same value for precision and scale. I suggest using a precision that is at least twice the scale so we have enough room after and before the dot.

Then, after this PR is merged, we can test the performance implications of using smaller precision and scale. If it is significantly better to use smaller values, open a new PR where the default precision and scale can be changed at startup time.

Yeah, that makes sense. I also discussed with @Jackie-Jiang offline and in his opinion, the original intention of 1000, 1000 was actually probably to actually have 2000 precision and 1000 scale. I've updated this but it does mean that Calcite is now forcing some literals (direct cast to DECIMAL without explicit precision / scale or comparison between BIG_DECIMAL column and literals) to have 1000 scale which like you've pointed out might not be very efficient for Java's BigDecimal that is used internally by both Calcite and Pinot to represent SQL DECIMAL types.

Jackie-Jiang · 2025-03-27T22:11:40Z

Per PostgreSQL:

The maximum precision that can be explicitly specified in a numeric type declaration is 1000.

PostgreSQL permits the scale in a numeric type declaration to be any value in the range -1000 to 1000. However, the SQL standard requires the scale to be in the range 0 to precision. Using scales outside that range may not be portable to other database systems.

So (2000, 1000) should be a good enough bound as long as it is not adding too much overhead

…value is a BigDecimal or Double

...-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java

...query-planner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotEvaluateLiteralRule.java

This reverts commit ceee8c5.

yashmayya added dependencies Pull requests that update a dependency file performance multi-stage Related to the multi-stage query engine labels Mar 13, 2025

yashmayya linked an issue Mar 13, 2025 that may be closed by this pull request

Multi-stage: Perf issue when IN expression has a lot of entries #13617

Closed

yashmayya force-pushed the upgrade-calcite-1.39.0 branch from 76c137f to 87cf58a Compare March 13, 2025 15:37

yashmayya force-pushed the upgrade-calcite-1.39.0 branch from 87cf58a to 91e29be Compare March 17, 2025 04:59

yashmayya marked this pull request as ready for review March 17, 2025 05:32

yashmayya requested review from Jackie-Jiang and gortiz March 17, 2025 05:32

yashmayya commented Mar 17, 2025

View reviewed changes

gortiz reviewed Mar 17, 2025

View reviewed changes

yashmayya mentioned this pull request Mar 17, 2025

Bump calcite.version from 1.37.0 to 1.39.0 #15287

Closed

yashmayya force-pushed the upgrade-calcite-1.39.0 branch from 5638910 to bd7b21b Compare March 27, 2025 09:58

yashmayya added 5 commits April 21, 2025 18:33

Upgrade Calcite to 1.39.0

8d8870d

Add TODO on withExpand in SqlToRelConverter config

714852a

Minor updates

72463ad

Cast to Number and use doubleValue() instead of checking whether the …

40ef4ee

…value is a BigDecimal or Double

Increase DECIMAL default / max precision from 1000 to 2000

51f8183

gortiz approved these changes Apr 21, 2025

View reviewed changes

Jackie-Jiang approved these changes Apr 21, 2025

View reviewed changes

...-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java Outdated Show resolved Hide resolved

...query-planner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotEvaluateLiteralRule.java Show resolved Hide resolved

Remove redundant cast in test

cd76846

yashmayya force-pushed the upgrade-calcite-1.39.0 branch 2 times, most recently from cac81fb to cd76846 Compare April 22, 2025 10:51

yashmayya merged commit ceee8c5 into apache:master Apr 22, 2025
47 checks passed

leujean02 pushed a commit to leujean02/pinot that referenced this pull request Apr 22, 2025

Upgrade Calcite to 1.39.0 (apache#15263)

1a497c1

yashmayya mentioned this pull request Apr 24, 2025

ASOF JOIN #15630

Merged

yashmayya added a commit to yashmayya/pinot that referenced this pull request May 16, 2025

Revert "Upgrade Calcite to 1.39.0 (apache#15263)"

6fe48aa

This reverts commit ceee8c5.

praveenc7 mentioned this pull request Jul 7, 2025

[Backward Incompatible]Calcite 1.37 → 1.39 bump introduces new reserved keywords & bracket-modifier syntax changes #16295

Closed

yashmayya mentioned this pull request Jul 30, 2025

Make UUID and VARIANT non-reserved keywords in the parser configs to preserve backward compatibility post Calcite upgrade #16470

Closed

yashmayya mentioned this pull request Jan 9, 2026

Reject MSE queries with a large number of elements in an IN clause #17481

Open

	} else {
	// NOTE: Do not use FLOAT to represent DECIMAL to be consistent with single-stage engine behavior.
	// See {@link RequestUtils#getLiteralExpression(SqlLiteral)}.
	if (precision <= 30) {
	return isArray ? ColumnDataType.DOUBLE_ARRAY : ColumnDataType.DOUBLE;

Upgrade Calcite to 1.39.0 #15263

Upgrade Calcite to 1.39.0 #15263

Uh oh!

Conversation

yashmayya commented Mar 13, 2025

Uh oh!

codecov-commenter commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ankitsultana commented Mar 13, 2025

Uh oh!

yashmayya commented Mar 14, 2025

Uh oh!

yashmayya left a comment

Choose a reason for hiding this comment

Uh oh!

gortiz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yashmayya Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yashmayya Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yashmayya Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jackie-Jiang commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov-commenter commented Mar 13, 2025 •

edited

Loading

yashmayya Mar 18, 2025 •

edited

Loading

yashmayya Mar 18, 2025 •

edited

Loading

yashmayya Mar 19, 2025 •

edited

Loading

Jackie-Jiang commented Mar 27, 2025 •

edited

Loading