Skip to content

feat(sql): speed-up queries with multiple IN predicates on indexed columns#6112

Merged
bluestreak01 merged 8 commits intomasterfrom
jh_indexes_in_selectivty_with_binding_vars
Oct 2, 2025
Merged

feat(sql): speed-up queries with multiple IN predicates on indexed columns#6112
bluestreak01 merged 8 commits intomasterfrom
jh_indexes_in_selectivty_with_binding_vars

Conversation

@jerrinot
Copy link
Copy Markdown
Contributor

@jerrinot jerrinot commented Sep 8, 2025

This speeds up PGWire queries with IN() predicates on multiple indexed columns.

Example:

[...]
where symA in (?, ?) and symB in (?, ?)
[...]

Details:
When both symA and symbB are indexed then the optimizer should plan index-scan on the column with higher cardinality. Since this column will likely result in a greater selectivity. However, before this PR the optimize would refuse to consider column cardinality in the presence of binding variables. Thus yielding a suboptimal plan.

This change it more relaxed. Reasoning: We cannot merge keys when binding variables are used. But if a column is 'new' (=different from the current key column) then we can still analyze it and possibly swap it for the existing key column. This does not involve any key merging => it's safe with binding variables.

TODO:

  • Proper testing, come up with scenarios where this could back-fire
  • User-centric PR description :)

@jerrinot jerrinot added the Performance Performance improvements label Sep 8, 2025
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 8, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Updates WhereClauseParser to conditionally allow merging list-of-values with bind variables based on whether a new column is being created. Adds a PG wire test verifying that query plans with bind variables favor the high-cardinality index regardless of predicate order.

Changes

Cohort / File(s) Summary
Parser: in-list merge logic with bind variables
core/src/main/java/io/questdb/griffin/WhereClauseParser.java
Adjusts conditions in analyzeListOfValues: merging is now gated by both key-value knowledge and whether it’s a new column. Bind-variable handling in the merge loop is conditioned on !newColumn. Adds clarifying comments.
Tests: PG wire plan with binding variables
core/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java
Adds testPlanWithIndexAndBindingVariables: creates table with two symbol indexes, inserts rows, runs EXPLAIN with bind variables for IN lists, and asserts index scan on high-cardinality column for both predicate orders.

Sequence Diagram(s)

sequenceDiagram
  participant Query as Query (IN-list with binds)
  participant Parser as WhereClauseParser
  participant Analyzer as analyzeListOfValues
  participant Planner as Planner

  Query->>Parser: Parse WHERE with IN-list
  Parser->>Analyzer: analyzeListOfValues(nodes, newColumn)
  alt all key values known OR newColumn
    Analyzer->>Analyzer: Merge/validate values
    alt existing column AND bind variable present
      note over Analyzer: Skip merging when !newColumn and bind seen with prior values
    else new column or all known
      Analyzer->>Analyzer: Allow merge
    end
  else
    Analyzer->>Analyzer: Do not merge lists
  end
  Parser->>Planner: Emit analyzed predicates
  Planner->>Planner: Choose index based on cardinality
  Planner-->>Query: Plan (index scan on high-cardinality col)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title succinctly identifies the feature and performance improvement by stating it speeds up SQL queries with multiple IN predicates on indexed columns, which directly matches the main change in the optimizer for PGWire queries.
Description Check ✅ Passed The author’s description directly discusses speeding up PGWire queries with IN predicates on multiple indexed columns, explains the optimizer behavior with binding variables, and includes examples and testing notes, making it clearly related to the changeset.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jerrinot jerrinot marked this pull request as draft September 8, 2025 13:40
@jerrinot
Copy link
Copy Markdown
Contributor Author

jerrinot commented Sep 8, 2025

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 8, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@jerrinot jerrinot marked this pull request as ready for review September 16, 2025 08:50
@jerrinot jerrinot changed the title feat(sql): improve execution plan when using multiple IN predicates on indexed columns feat(sql): speed-up queries with multiple IN predicates on indexed columns Sep 17, 2025
@puzpuzpuz puzpuzpuz self-requested a review September 25, 2025 11:42
@puzpuzpuz
Copy link
Copy Markdown
Contributor

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 25, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
core/src/main/java/io/questdb/griffin/WhereClauseParser.java (1)

885-887: Use short-circuit boolean operator (&&) instead of bitwise &.

Replace & with && for clarity and to avoid unnecessary RHS evaluation.

-            if (!allKeyValuesAreKnown & !newColumn) {
+            if (!allKeyValuesAreKnown && !newColumn) {
                 return false;
             }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7eb1a51 and 27bea2c.

📒 Files selected for processing (2)
  • core/src/main/java/io/questdb/griffin/WhereClauseParser.java (2 hunks)
  • core/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: New pull request (Coverage Report Coverage Report)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
  • GitHub Check: New pull request (Hosted Running tests on windows-other)
  • GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin)
  • GitHub Check: New pull request (Hosted Running tests on mac-other)
  • GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
  • GitHub Check: New pull request (SelfHosted Other tests Start X64Zfs Agent)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start ARM Agent)
  • GitHub Check: New pull request (SelfHosted Other tests Start ARM Agent)
  • GitHub Check: New pull request (Hosted Running tests on mac-griffin)
  • GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start X64Zfs Agent)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-other)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-pgwire)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-cairo)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-fuzz)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-griffin)
  • GitHub Check: New pull request (Check Changes Check changes)

@puzpuzpuz puzpuzpuz self-requested a review October 1, 2025 15:55
@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 3 / 3 (100.00%)

file detail

path covered line new line coverage
🔵 io/questdb/griffin/WhereClauseParser.java 3 3 100.00%

@bluestreak01 bluestreak01 merged commit 9ebcd3d into master Oct 2, 2025
35 checks passed
@bluestreak01 bluestreak01 deleted the jh_indexes_in_selectivty_with_binding_vars branch October 2, 2025 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants