Skip to content

perf(sql): improve performance of UNION and friends by pushing down timestamp filter into subqueries#6745

Merged
bluestreak01 merged 21 commits intomasterfrom
mt_ts-pushdown-union
Feb 13, 2026
Merged

perf(sql): improve performance of UNION and friends by pushing down timestamp filter into subqueries#6745
bluestreak01 merged 21 commits intomasterfrom
mt_ts-pushdown-union

Conversation

@mtopolnik
Copy link
Copy Markdown
Contributor

@mtopolnik mtopolnik commented Feb 4, 2026

Fixes #2302

Summary

Push designated-timestamp filters from outside a UNION / UNION ALL / EXCEPT / INTERSECT subquery into each branch of the set operation. This enables per-branch partition pruning on time-series tables.

The optimizer only pushes filters that exclusively reference the designated timestamp column. It deliberately excludes non-timestamp filters (value columns, expressions, mixed references) — they risk type mismatches and column resolution issues across branches with different schemas, and don't unlock partition pruning anyway. The outer filter always stays on the wrapper model, so correctness is unchanged.

Production code change (SqlOptimiser.java)

The substantive logic is small — two sites:

Site 1 — the else if arm that routes union-bearing nested models to the new method instead of blocking pushdown:

} else if (nested.getUnionModel() != null) {
    tryPushFilterIntoSetOperationBranches(node, parent, nested);
    addWhereNode(parent, node);
}

addWhereNode(parent, node) always keeps the parent filter, so correctness holds regardless of what happens in the per-branch push.

Site 2 — tryPushFilterIntoSetOperationBranches: Gates on the designated timestamp, then walks the union chain, deep-clones the filter for each branch, rewrites column references via the parent's alias maps, and adds the filter to the branch. Safety mechanisms:

  1. Timestamp-only gate: findTimestamp + referencesOnlyTimestampAliasRecursive reject any filter that references non-timestamp columns.
  2. Per-branch semantic guards: skip branches with LATEST BY, LIMIT, or non-pushable SAMPLE BY — mirrors the checks for the first branch.
  3. Column existence check: allLiteralsMatchBranch skips branches where rewritten column names don't exist (e.g. branches with different or missing aliases).
  4. NonLiteralException catch: skips branches where the timestamp column maps to a computed expression (e.g., dateadd(...) on the timestamp).
  5. deepClone: gives each branch its own expression copy, preventing cross-branch mutation.

Composite filters

For WHERE ts IN '2025-12-01T01;2h' AND x > 5, the optimizer splits the top-level AND into separate conjuncts and processes each independently: it pushes the timestamp conjunct into branches, while x > 5 stays at the parent level only.

Test plan

  • 8 new SqlParserTest plan-level tests covering timestamp pushdown through UNION, UNION ALL, EXCEPT, INTERSECT, 3-branch chains, non-literal branches, same-table branches, and mixed timestamp/non-timestamp filters
  • 4 SqlParserTest cases verifying that non-timestamp filters are NOT pushed into branches
  • 1 new UnionTest runtime regression test verifying that filter pushdown into a UNION ALL branch with SAMPLE BY does not change aggregation semantics
  • Existing UnionAllCastTest cases pass (type mismatches and column resolution across branches with different schemas)

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 4, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch mt_ts-pushdown-union

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mtopolnik
Copy link
Copy Markdown
Contributor Author

@CodeRabbit Please review this PR. One of commits is just auto-reordering of code. Please try to ignore the changes of that commit in your review. Focus your review on all possible edge cases you can think of that would make this change break SQL.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 4, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# First, let's see the commits in this PR
gh pr view 6745 --repo questdb/questdb --json commits --jq '.commits[] | "\(.oid[0:7]) \(.messageHeadline)"'

echo "---"
echo "Getting PR diff stats to identify the reordering commit..."
gh pr diff 6745 --repo questdb/questdb --name-only

Length of output: 494


#!/bin/bash

# Get detailed info about each commit to understand which is the auto-reorder
echo "=== Commit details ==="
for commit in 7bbeed0 9089994 c964298 22cf0b8 1cd5565; do
  echo "--- Commit $commit ---"
  git show --stat $commit | head -20
  echo ""
done

🧠 Learnings used
Learnt from: jerrinot
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-04T11:18:36.475Z
Learning: For PRs by jerrinot, always use a thorough and critical review mode: question design decisions, challenge assumptions, scrutinize implementations for edge cases and potential issues, and avoid superficial validation. They prefer hard-core reviews that push back on the code.

@mtopolnik mtopolnik changed the title perf(sql): push down filters into set operation subqueries (UNION, INTERSECT etc.) perf(sql): push down timestamp filters into set operation branches for partition pruning Feb 5, 2026
@mtopolnik mtopolnik changed the title perf(sql): push down timestamp filters into set operation branches for partition pruning perf(sql): push down timestamp filter into set operations (UNION etc) Feb 5, 2026
@mtopolnik mtopolnik changed the title perf(sql): push down timestamp filter into set operations (UNION etc) perf(sql): improve performance of UNION and friends by pushing down timestamp filter into subqueries Feb 5, 2026
@mtopolnik mtopolnik added the Performance Performance improvements label Feb 6, 2026
mtopolnik and others added 13 commits February 11, 2026 09:35
When a NonLiteralException prevents pushing a timestamp filter
into a set operation branch, log the filter expression and the
branch table name at DEBUG level. This replaces a silent catch
block and aids diagnosis of why partition pruning was not applied
to a particular union branch.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
UNION ALL matches columns by position, not by name. When a
WHERE clause alias maps to different columns in different
branches (e.g., "ts" is the timestamp in branch 1 but a
symbol in branch 2), the filter must be remapped by positional
index before being pushed into non-first branches.

Add collectPositionalAliasRemap() which resolves each literal's
positional index in the first branch, then maps it to the alias
at the same index in the current branch. Apply this remapping
after the existing literalRewritingVisitor pass for all non-first
UNION branches.

This fixes testUnionAllWithFilterUsesAliasFromFirstBranch which
was returning one row instead of two because the filter resolved
to the wrong column in the second branch.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The findTimestamp() method in SqlOptimiser failed to resolve
timestamp columns through CTE wrapper models because these
models have an empty columnNameToAliasMap. This prevented
timestamp filters from being pushed into UNION ALL branches
when the branches were defined as separate CTEs (e.g.,
WITH l AS (...), r AS (...) SELECT ... FROM (l UNION ALL r)
WHERE ts IN '...').

Fix findTimestamp() to fall back to aliasToColumnMap when the
columnNameToAliasMap lookup returns null, which correctly
handles CTE wrapper models.

Add tests verifying that filter pushdown is blocked by
semantic barriers (LATEST ON, LIMIT) in UNION ALL branches,
and a model-level test confirming pushdown works through
both single-CTE and two-CTE forms.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
When the optimizer successfully pushes a timestamp filter into every
branch of a UNION/UNION ALL/INTERSECT/EXCEPT, it now removes the
parent-level copy of that filter. Previously it always retained the
parent filter as a safety net, causing an unnecessary re-check on the
combined result set.

Three production changes in SqlOptimiser:
- tryPushFilterIntoSetOperationBranches returns a boolean (true when
  all branches received the filter), and the call site keeps the
  parent filter only when pushdown was incomplete.
- The call site evaluates the union check before the blanket barrier
  check (LATEST ON / LIMIT / non-pushable SAMPLE BY), so UNION models
  with a barrier on the first branch still enter per-branch pushdown.
- findTimestamp iterates branches to resolve the timestamp alias when
  the first branch obscures it (e.g. SAMPLE BY wraps it in
  timestamp_floor), mapping back by column position.

All filter-pushdown-into-union tests move into the new
FilterPushdownIntoUnionTest file, including four new partial-pushdown
tests that verify the optimizer retains the parent filter when
SAMPLE BY blocks pushdown on specific branches.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 178 / 202 (88.12%)

file detail

path covered line new line coverage
🔵 io/questdb/griffin/SqlOptimiser.java 170 194 87.63%
🔵 io/questdb/griffin/SqlKeywords.java 8 8 100.00%

@bluestreak01 bluestreak01 merged commit f644994 into master Feb 13, 2026
44 checks passed
@bluestreak01 bluestreak01 deleted the mt_ts-pushdown-union branch February 13, 2026 18:24
maciulis pushed a commit to maciulis/questdb that referenced this pull request Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SQL: push UNION result filters into each part

4 participants