Skip to content

Conversation

@songwdfu
Copy link
Contributor

@songwdfu songwdfu commented Jun 4, 2025

Implemented knobs as queryOption that controls turning on / off rules per query.
Currently supported all rules in optProgram. One can specify disabling rule by setting a comma separated list of rule names to be skipped (no spaces), like:

SET skipPlannerRules='FilterProjectTranspose,PruneEmptySort,EvaluateProjectLiteral,AggregateRemove,AggregateProjectMerge,PruneEmptyAggregate';

The purpose of these knobs are for users to turn off rules that are taking too long / produces suboptimal plan. All rules are enabled by default.

The rule name here is consistent with the rule name shown on query console from EXPLAIN PLAN FOR (which is the description of the rule)

A major change to QueryEnvironment is, the optProgram in PlannerContext is now constructed per-query (to contain only enabled rules), this is acceptable as creating HepProgram is not expensive. TraitProgram is already created per-query before this PR.

Another major change is, a convention is added that every rule used in optProgram in PinotQueryRuleSets is now instantiated with a defined constant description. These descriptions would be used to show rule in EXPLAIN_PLAN_FOR and also for the knob. This allows us to manage and identify rule instances more easily.

Several tests are added to QueryCompilationTest and QueryPlannerRuleOptionsTest to test the new rule and knobs.

A logger is added to QueryEnvironment, which will be used to debug parsed / decorrelated / trimmed plans in the future.

@Jackie-Jiang Jackie-Jiang added feature multi-stage Related to the multi-stage query engine labels Jun 5, 2025
@ankitsultana
Copy link
Contributor

The purpose of these knobs are for users to turn off rules that are taking too long / produces suboptimal plan. All rules are enabled by default.

I suppose these knobs are not just for debugging. It is expected that users may use this in their production workloads. Correct?

The naming convention here is consistent with the rule name shown on query console from EXPLAIN PLAN FOR, it's just plannerRule.skip.

I think this makes sense to some extent for rules in Calcite. But for rules implemented by us, we should instead use shorter names. We could create a new abstract class that overrides RelOptRule and adds an abstract method to define a name to be used for this purpose.

@songwdfu
Copy link
Contributor Author

songwdfu commented Jun 5, 2025

I suppose these knobs are not just for debugging. It is expected that users may use this in their production workloads. Correct?

Yes.
One intended use case is that, user sees the rule timing by running EXPLAIN PLAN FOR, sees specific rule taking too long, then simply copy the rule name and prepend it with plannerRule.skip to disable it. The current naming convention accounts for this scenario as well.
For users who knows which plan resulted in the bad plan, they could go ahead and disable it as well.

I think this makes sense to some extent for rules in Calcite. But for rules implemented by us, we should instead use shorter names. We could create a new abstract class that overrides RelOptRule and adds an abstract method to define a name to be used for this purpose.

Certainly, there are rules whose name is too long / doesn't make sense. I found out our EXPLAIN PLAN FOR query timing simply uses rule's description (just that for Calcite CORE_RULES the description is the same as class name), so we could "rename" rules just using withDescription() when creating them. This applies to both Calcite rules and our custom rules.
This comes with three benefits:

  1. it preserves naming consistency between EXPLAIN PLAN FOR and the knob, both are just the description of the rule.
  2. we could now theoretically seperate same rules in different phases, e.g. FilterAggregateTransposeRule in BASIC_RULES vs FILTER_PUSHDOWN_RULES, if it's needed.
  3. no special cases are needed now, checking logic in QueryEnvironment is simplified

Will update this now if it makes sense

@ankitsultana
Copy link
Contributor

so we could "rename" rules just using withDescription() when creating them.

That may not be guaranteed to be the same across Calcite version upgrades. I suppose Calcite won't change class names since that would be a breaking change, but descriptions could be changed.

@songwdfu
Copy link
Contributor Author

songwdfu commented Jun 5, 2025

That may not be guaranteed to be the same across Calcite version upgrades.

I see. Our current explain plan uses RelOptRule.toString() as the display name, which returns the rule description. This is also subject to change if the default description is changed in Calcite.

So I'm proposing to explicitly assign a description with a defined constant when instantiating each rule. In this way the identifier of the rule would always be controlled by us (which will always be its description), renaming rules would be easy by just changing the constant, and we can upgrade / replace certain rules with Pinot version while keeping the same name. I think this is similar to the idea of defining a wrapper around rules, except we utilize the description field of RelOptRules, which is intended for this.

Assigning descriptions to rules should be a good practice as it enables us to distinguish instances of same rule class under different context, plus we would have a nice list of all rule name constants used in CommonConstants, which is easier to maintain.

For example, in PinotQueryRuleSets.BASIC_RULES, we instantiate rule like

JoinPushExpressionsRule.Config.DEFAULT
    .withDescription(PlannerRuleNames.JOIN_PUSH_EXPRESSIONS).toRule()

Now we can rename it easily or upgrade it to something like PinotJoinPushExpressionsRule with the same description. It also won't be effected by Calcite change. If we were to add another instance of JoinPushExpressionsRule somewhere else, we can provide a different description to distinguish. Also all used names would be shown under PlannerRuleNames.

Please let me know if this makes sense?

@songwdfu songwdfu force-pushed the rule-switches branch 5 times, most recently from 65ce02b to beefa20 Compare June 13, 2025 06:45
@codecov-commenter
Copy link

codecov-commenter commented Jun 13, 2025

Codecov Report

Attention: Patch coverage is 96.93878% with 3 lines in your changes missing coverage. Please review.

Project coverage is 63.22%. Comparing base (1a476de) to head (0828211).
Report is 288 commits behind head on master.

Files with missing lines Patch % Lines
...va/org/apache/pinot/spi/utils/CommonConstants.java 0.00% 2 Missing ⚠️
.../java/org/apache/pinot/query/QueryEnvironment.java 97.05% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #15999      +/-   ##
============================================
+ Coverage     62.90%   63.22%   +0.31%     
+ Complexity     1386     1352      -34     
============================================
  Files          2867     2951      +84     
  Lines        163354   169639    +6285     
  Branches      24952    25943     +991     
============================================
+ Hits         102755   107249    +4494     
- Misses        52847    54279    +1432     
- Partials       7752     8111     +359     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.19% <96.93%> (+0.32%) ⬆️
java-21 63.17% <96.93%> (+0.35%) ⬆️
skip-bytebuffers-false ?
skip-bytebuffers-true ?
temurin 63.22% <96.93%> (+0.31%) ⬆️
unittests 63.21% <96.93%> (+0.31%) ⬆️
unittests1 64.71% <96.93%> (+8.89%) ⬆️
unittests2 33.32% <73.46%> (-0.25%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good

.defaultSchema(rootSchema.plus()).sqlToRelConverterConfig(PinotRuleUtils.PINOT_SQL_TO_REL_CONFIG).build();
_catalogReader = new PinotCatalogReader(
rootSchema, List.of(database), _typeFactory, CONNECTION_CONFIG, config.isCaseSensitive());
_optProgram = getOptProgram();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still want to keep this default _optProgram, so that for majority of the cases (no rule override) we don't need to construct a new one

@gortiz
Copy link
Contributor

gortiz commented Jun 16, 2025

IMHO, it would be a better idea to introduce a single skipPlannerRules=[PinotAggregateReduceFunctionsRule, ProjectJoinTransposeRule, ...]

@songwdfu
Copy link
Contributor Author

songwdfu commented Jun 16, 2025

changed option format to
SET skipPlannerRules='FilterProjectTransposeRule,PruneEmptySort';

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done!

assertEquals(response1Json.get("rows").get(0).get(2).asText(), "Rule Execution Times\n"
+ "Rule: AggregateProjectMergeRule -> Time:*\n"
+ "Rule: Project -> Time:*\n"
+ "Rule: EvaluateProjectLiteralRule -> Time:*\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Consider changing the HashMap into a LinkedHashMap to preserve the order of rule application, and also make this deterministic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do this in a seperate PR since this might change a dozen tests

songwdfu and others added 8 commits June 17, 2025 12:48
Implemented knobs as queryOption that controls turning on / off rules
per query. Currently supported all rules in optProgram. All rules are
enbaled by default. One can specify disabling rule like:
```
SET plannerRule_skipPinotAggregateReduceFunctionsRule=true;
SET plannerRule_skipAggregateCaseToFilterRule=true;
```
The naming convention here is consistent with the rule name shown on
query console from `EXPLAIN PLAN FOR`, it's just `plannerRule_skip<name
as shown on console>`. The name shown on console is the description of
the rule.

This commit introduced a convention that all rules when instantiated
should be explicitly provided with a description. This is added to all
rules used in optProgram now. The rule description will serve as its
identifier in both `EXPLAIN PLAN FOR` and for turning on and off using
knobs.

Relevant constructors for Pinot rules that passes the description as arg
were added.

All constants introduced
are declared in pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java.

A major change to QueryEnvironment is, the optProgram in the QueryEnvironment is now
constructed per-query (to contain only enabled rules), this is
acceptable as creating HepProgram is not expensive. TraitProgram is
already created per-query before this PR.

Several tests are added to QueryCompilationTest and
QueryPlannerRuleOptionsTest.
…cuase of the description change

This is due to that we use a HashMap with rule name as the key to record the rule timings, and the
display order is the traversal order of the hash map.
…Environment.java

Co-authored-by: Xiaotian (Jackie) Jiang <[email protected]>
Now disable rules by setting skipPlannerRules to a comma separated list
of rule names. For example

```
SET skipPlannerRules='FilterProjectTransposeRule,PruneEmptySort,EvaluateProjectLiteralRule';
```
…Environment.java

Co-authored-by: Xiaotian (Jackie) Jiang <[email protected]>
…Environment.java

Co-authored-by: Xiaotian (Jackie) Jiang <[email protected]>
@Jackie-Jiang Jackie-Jiang merged commit 95c0bb3 into apache:master Jun 18, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation feature multi-stage Related to the multi-stage query engine query release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants