Skip to content

Conversation

@yashmayya
Copy link
Contributor

@yashmayya yashmayya added dependencies Pull requests that update a dependency file performance multi-stage Related to the multi-stage query engine labels Mar 13, 2025
@yashmayya yashmayya linked an issue Mar 13, 2025 that may be closed by this pull request
@codecov-commenter
Copy link

codecov-commenter commented Mar 13, 2025

Codecov Report

Attention: Patch coverage is 66.66667% with 11 lines in your changes missing coverage. Please review.

Project coverage is 62.80%. Comparing base (2d8d8c3) to head (cd76846).
Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
...ot/calcite/rel/rules/PinotEvaluateLiteralRule.java 57.14% 2 Missing and 4 partials ⚠️
...inot/query/planner/logical/RexExpressionUtils.java 33.33% 4 Missing ⚠️
...n/java/org/apache/pinot/query/type/TypeSystem.java 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #15263      +/-   ##
============================================
- Coverage     62.82%   62.80%   -0.02%     
- Complexity     1384     1386       +2     
============================================
  Files          2864     2864              
  Lines        162671   163677    +1006     
  Branches      24902    25065     +163     
============================================
+ Hits         102191   102804     +613     
- Misses        52782    53157     +375     
- Partials       7698     7716      +18     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 62.74% <66.66%> (-0.03%) ⬇️
java-21 62.79% <66.66%> (-0.02%) ⬇️
skip-bytebuffers-false 62.80% <66.66%> (-0.01%) ⬇️
skip-bytebuffers-true 62.72% <66.66%> (-0.05%) ⬇️
temurin 62.80% <66.66%> (-0.02%) ⬇️
unittests 62.80% <66.66%> (-0.02%) ⬇️
unittests1 55.69% <66.66%> (-0.09%) ⬇️
unittests2 33.73% <18.18%> (+0.20%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yashmayya yashmayya force-pushed the upgrade-calcite-1.39.0 branch from 76c137f to 87cf58a Compare March 13, 2025 15:37
@ankitsultana
Copy link
Contributor

@yashmayya : I suppose you folks must have considered this already, but for the large IN clause optimization, have you folks thought of rewriting the IN to a UDF in perhaps the first optimization rule that gets run in the optimization phase?

The UDF could be expanded back to an IN clause in the end to ensure there are no behavior changes.

@yashmayya
Copy link
Contributor Author

@ankitsultana the large IN clause compilation issue comes from two sources. One is in certain optimization rules like CoreRules.FILTER_REDUCE_EXPRESSIONS and PinotSortExchangeNodeInsertRule when run after PinotFilterExpandSearchRule. This was fixed in #14615 (after the original fix in #13614 was unintentionally undone in #14448). The other source of the issue was in the SqlToRelConverter phase when converting a tree of SqlNodes to a tree of RelNodes which is even before any of our optimization rules can be applied.

You can attach a profiler to this test on master to see this -

flamegraph
flamegraph

This issue has been fixed in Calcite 1.39.0. In the long run though, we need to explore using the default value for expand here since the value true we're using has been deprecated. Setting expand to false keeps IN clauses with literals as IN clauses when we also set InSubQueryThreshold to 0 (i.e., no conversion to a JOIN or to an OR list of = expressions) although it also introduces a bunch of things that Pinot can't currently handle, for instance planning a correlated subquery using a SCALAR_QUERY filter.

@yashmayya yashmayya force-pushed the upgrade-calcite-1.39.0 branch from 87cf58a to 91e29be Compare March 17, 2025 04:59
@yashmayya yashmayya marked this pull request as ready for review March 17, 2025 05:32
Copy link
Contributor Author

@yashmayya yashmayya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently have a framework to add actual performance regression tests, but I was wondering whether it would be worth it to modify this test to add some assertions on compilation time (we could increase the number of values in the IN clause and use a fairly conservative value on the expected time to hopefully prevent CI flakiness).

Copy link
Contributor

@gortiz gortiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments. I think we needs to study the default scale, but apart from that is fine

"\n LogicalJoin(condition=[AND(=($0, $3), =($1, $4), >($2, $5))], joinType=[inner])",
"\n PinotLogicalExchange(distribution=[hash[0, 1]])",
"\n LogicalProject(col1=[$0], col2=[$1], col4=[$3])",
"\n LogicalProject(col1=[$0], col2=[$1], EXPR$0=[CAST($3):DECIMAL(1000, 982) NOT NULL])",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know why the precision is 982?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems somehow related to this (#11151) because the values in the cast in this plan change depending on the values there. But even removing those overrides altogether (deriveDecimal*Type) results in a cast of DECIMAL(1000, 980) being added so there's something else going on here from the Calcite side as well. a.col4 is a BIG_DECIMAL type column to which we're applying the default scale and precision of 1000 (each); if we use an INTEGER column instead (a.col3) the cast used is DECIMAL(19, 1) which makes sense because on the other side we're multiplying 0.5 with a BIGINT value (result of the sum).

Copy link
Contributor Author

@yashmayya yashmayya Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scale is 982 btw, the precision is 1000 here.

I found out that this is actually coming from type coercion during the SQL -> Rel conversion phase, where the casts are being applied because the comparison (>) is being made on different types. One is DECIMAL(1000, 1000) (the BigDecimal column using default scale / precision) and the other is DECIMAL(19, 1) (0.5 is DECIMAL(2, 1) and we're multiplying it with a BIGINT value). The logic that computes the resultant precision and scale when coercing two decimal types is here - https://github.com/apache/calcite/blob/9b51667eebc825d5d3ece72cfcdb5d2efaa2d56f/core/src/main/java/org/apache/calcite/sql/type/SqlTypeFactoryImpl.java#L460-L498.

It's a little convoluted, but the resultant scale of 982 is arrived at like this:

Math.min(maxScale, Math.min(Math.max(s1, s2), maxPrecision - Math.min(maxPrecision, Math.max(p1 - s1, p2 - s2))))

So this unexpected value does seem to be because we're using the same max and default value for scale and precision, but I'm not convinced that this is an issue given the large values we're using. For reference, Calcite seems to be using 0 as the default scale for DECIMAL types and max precision for default precision (which is 19 in Calcite).


@Override
public int getDefaultPrecision(SqlTypeName typeName) {
return typeName == SqlTypeName.DECIMAL ? MAX_DECIMAL_PRECISION : super.getDefaultPrecision(typeName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. IICU this means our default is DECIMAL(1000, 2000), which means use a 1000 digits as total number of digits (both before and after the decimal point) and 1000 digits to the right of the decimal point. Therefore we could only store values in the range of (-1, 1).

Also, I don't know the implications these digits may have, but in BigDecimal scales from 1 to 10 (or -1 to -10, but we don't care about them) are more efficient than the others

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question - TBH, I just copied this part from @Jackie-Jiang's draft PR for the 1.38.0 upgrade - #15141. I can spend some time to study all the implications here and figure out the best way forward.

Copy link
Contributor Author

@yashmayya yashmayya Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so using a smaller default scale / precision in the range of 1-10 doesn't seem like a great idea. That would truncate values unnecessarily. For instance, if we used a default scale of 2, SELECT CAST(123.4567 AS DECIMAL) FROM mytable LIMIT 1; would return 123.45 - Postgres returns 123.4567 for reference. So I think it makes sense to use our max scale / precision as the default scale / precision as well to avoid unnecessarily truncating / rounding values?

Also using a max / default precision and scale of 1000 doesn't mean we're limited to values in the range of (-1, 1). That would only be the case if we're using 1000 digits after the decimal point. But for instance, a value like 1234567890.0123456789 that has precision 20 and scale 10 can be represented just fine as well.

On a side note, the weird and IMO inexplicable logic we have here causes strange and unexpected results - if a decimal literal's precision is <= 30 (with non-zero scale), we use Double instead of BigDecimal leading to imprecision, so we actually see perfectly accurate results only if using numbers with precision > 30. But this is orthogonal to the Calcite upgrade and we can discuss / solve this separately.

Edit: Another important point is that the default default DECIMAL scale of 0 actually does mean a 0 scale, and not unlimited - i.e., CAST(123.4567 AS DECIMAL) would become 123.

Copy link
Contributor Author

@yashmayya yashmayya Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw our default here is actually DECIMAL(1000, 1000) - precision of 1000, and scale of 1000. DECIMAL(1000, 2000) is impossible because scale can never be larger than precision. Our default means that we can have a total of 1000 significant digits, but also up to 1000 digits after the decimal. We don't allow specifying precision and scale for our BIG_DECIMAL columns and FWIW Postgres also allows NUMERIC without explicit precision and scale to mean an "unconstrained" numeric - https://www.postgresql.org/docs/current/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL. Postgres also supports a maximum specified precision of 1000 (and scale can be between -1000 to 1000).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wanted to say DECIMAL(1000, 1000).

Also using a max / default precision and scale of 1000 doesn't mean we're limited to values in the range of (-1, 1).

I think that is wrong. For example in Posgres:

postgres=# select cast(10 as DECIMAL(10, 10));
ERROR:  numeric field overflow
DETAIL:  A field with precision 10, scale 10 must round to an absolute value less than 1.
postgres=# select cast(10 as DECIMAL(10, 0));
 numeric 
---------
      10
(1 row)

postgres=# select cast(10 as DECIMAL(1, 0));
ERROR:  numeric field overflow
DETAIL:  A field with precision 1, scale 0 must round to an absolute value less than 10^1.
postgres=# select cast(10 as DECIMAL(2, 0));
 numeric 
---------
      10
(1 row)

postgres=# select cast(10 as DECIMAL(2, 1));
ERROR:  numeric field overflow
DETAIL:  A field with precision 2, scale 1 must round to an absolute value less than 10^1.

AFAIU DECIMAL(1000, 1000) can only represent from -0.999... to 0.999... with 1000 digits at the right of the decimal point. If that is the case we must change the default used for scale. And even if it isn't... shoudln't we use a smaller scale by default? That would be more efficient once implemented as BigDecimal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the reason it sort of works is because of further Pinot specific oddities. For small precision / scales, we use DOUBLE instead of BIG_DECIMAL for DECIMAL -

} else {
// NOTE: Do not use FLOAT to represent DECIMAL to be consistent with single-stage engine behavior.
// See {@link RequestUtils#getLiteralExpression(SqlLiteral)}.
if (precision <= 30) {
return isArray ? ColumnDataType.DOUBLE_ARRAY : ColumnDataType.DOUBLE;

So here, the precision / scale won't be effectively enforced because we're using floating point representations. And then for larger precisions / scales as well, we don't actually enforce that while casting -

This just converts whatever type it is to the appropriate BigDecimal value without enforcing the specified precision / scales. It's basically yet another place where our support for the SQL DECIMAL type is lacking.

However, using a smaller default scale for DECIMAL will break backward compatibility for us. Take this example from one of our tests - SELECT CAST('12345678901234567890123456789.1234567890123456789' AS DECIMAL) FROM a. This currently returns 12345678901234567890123456789.1234567890123456789, but if we reduce the default scale to 10, this will become 12345678901234567890123456789.1234567890 because Calcite seems to be evaluating such calls by itself in SqlToRelConverter and Pinot's cast function won't be called at all. Are we okay with such a change?

cc - @Jackie-Jiang for an additional opinion as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, let's use other values larger than 10 without using the same value for precision and scale. I suggest using a precision that is at least twice the scale so we have enough room after and before the dot.

Then, after this PR is merged, we can test the performance implications of using smaller precision and scale. If it is significantly better to use smaller values, open a new PR where the default precision and scale can be changed at startup time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense. I also discussed with @Jackie-Jiang offline and in his opinion, the original intention of 1000, 1000 was actually probably to actually have 2000 precision and 1000 scale. I've updated this but it does mean that Calcite is now forcing some literals (direct cast to DECIMAL without explicit precision / scale or comparison between BIG_DECIMAL column and literals) to have 1000 scale which like you've pointed out might not be very efficient for Java's BigDecimal that is used internally by both Calcite and Pinot to represent SQL DECIMAL types.

@Jackie-Jiang
Copy link
Contributor

Jackie-Jiang commented Mar 27, 2025

Per PostgreSQL:

The maximum precision that can be explicitly specified in a numeric type declaration is 1000.
PostgreSQL permits the scale in a numeric type declaration to be any value in the range -1000 to 1000. However, the SQL standard requires the scale to be in the range 0 to precision. Using scales outside that range may not be portable to other database systems.

So (2000, 1000) should be a good enough bound as long as it is not adding too much overhead

@yashmayya yashmayya force-pushed the upgrade-calcite-1.39.0 branch 2 times, most recently from cac81fb to cd76846 Compare April 22, 2025 10:51
@yashmayya yashmayya merged commit ceee8c5 into apache:master Apr 22, 2025
47 checks passed
leujean02 pushed a commit to leujean02/pinot that referenced this pull request Apr 22, 2025
@yashmayya yashmayya mentioned this pull request Apr 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file multi-stage Related to the multi-stage query engine performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-stage: Perf issue when IN expression has a lot of entries

5 participants