perf(sql): improve performance of UNION and friends by pushing down timestamp filter into subqueries#6745
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@CodeRabbit Please review this PR. One of commits is just auto-reordering of code. Please try to ignore the changes of that commit in your review. Focus your review on all possible edge cases you can think of that would make this change break SQL. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 494 #!/bin/bash
# Get detailed info about each commit to understand which is the auto-reorder
echo "=== Commit details ==="
for commit in 7bbeed0 9089994 c964298 22cf0b8 1cd5565; do
echo "--- Commit $commit ---"
git show --stat $commit | head -20
echo ""
done🧠 Learnings used |
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
When a NonLiteralException prevents pushing a timestamp filter into a set operation branch, log the filter expression and the branch table name at DEBUG level. This replaces a silent catch block and aids diagnosis of why partition pruning was not applied to a particular union branch. Co-Authored-By: Claude Opus 4.6 <[email protected]>
UNION ALL matches columns by position, not by name. When a WHERE clause alias maps to different columns in different branches (e.g., "ts" is the timestamp in branch 1 but a symbol in branch 2), the filter must be remapped by positional index before being pushed into non-first branches. Add collectPositionalAliasRemap() which resolves each literal's positional index in the first branch, then maps it to the alias at the same index in the current branch. Apply this remapping after the existing literalRewritingVisitor pass for all non-first UNION branches. This fixes testUnionAllWithFilterUsesAliasFromFirstBranch which was returning one row instead of two because the filter resolved to the wrong column in the second branch. Co-Authored-By: Claude Opus 4.6 <[email protected]>
The findTimestamp() method in SqlOptimiser failed to resolve timestamp columns through CTE wrapper models because these models have an empty columnNameToAliasMap. This prevented timestamp filters from being pushed into UNION ALL branches when the branches were defined as separate CTEs (e.g., WITH l AS (...), r AS (...) SELECT ... FROM (l UNION ALL r) WHERE ts IN '...'). Fix findTimestamp() to fall back to aliasToColumnMap when the columnNameToAliasMap lookup returns null, which correctly handles CTE wrapper models. Add tests verifying that filter pushdown is blocked by semantic barriers (LATEST ON, LIMIT) in UNION ALL branches, and a model-level test confirming pushdown works through both single-CTE and two-CTE forms. Co-Authored-By: Claude Opus 4.6 <[email protected]>
When the optimizer successfully pushes a timestamp filter into every branch of a UNION/UNION ALL/INTERSECT/EXCEPT, it now removes the parent-level copy of that filter. Previously it always retained the parent filter as a safety net, causing an unnecessary re-check on the combined result set. Three production changes in SqlOptimiser: - tryPushFilterIntoSetOperationBranches returns a boolean (true when all branches received the filter), and the call site keeps the parent filter only when pushdown was incomplete. - The call site evaluates the union check before the blanket barrier check (LATEST ON / LIMIT / non-pushable SAMPLE BY), so UNION models with a barrier on the first branch still enter per-branch pushdown. - findTimestamp iterates branches to resolve the timestamp alias when the first branch obscures it (e.g. SAMPLE BY wraps it in timestamp_floor), mapping back by column position. All filter-pushdown-into-union tests move into the new FilterPushdownIntoUnionTest file, including four new partial-pushdown tests that verify the optimizer retains the parent filter when SAMPLE BY blocks pushdown on specific branches. Co-Authored-By: Claude Opus 4.6 <[email protected]>
[PR Coverage check]😍 pass : 178 / 202 (88.12%) file detail
|
…imestamp filter into subqueries (questdb#6745)
Fixes #2302
Summary
Push designated-timestamp filters from outside a UNION / UNION ALL / EXCEPT / INTERSECT subquery into each branch of the set operation. This enables per-branch partition pruning on time-series tables.
The optimizer only pushes filters that exclusively reference the designated timestamp column. It deliberately excludes non-timestamp filters (value columns, expressions, mixed references) — they risk type mismatches and column resolution issues across branches with different schemas, and don't unlock partition pruning anyway. The outer filter always stays on the wrapper model, so correctness is unchanged.
Production code change (
SqlOptimiser.java)The substantive logic is small — two sites:
Site 1 — the
else ifarm that routes union-bearing nested models to the new method instead of blocking pushdown:addWhereNode(parent, node)always keeps the parent filter, so correctness holds regardless of what happens in the per-branch push.Site 2 —
tryPushFilterIntoSetOperationBranches: Gates on the designated timestamp, then walks the union chain, deep-clones the filter for each branch, rewrites column references via the parent's alias maps, and adds the filter to the branch. Safety mechanisms:findTimestamp+referencesOnlyTimestampAliasRecursivereject any filter that references non-timestamp columns.allLiteralsMatchBranchskips branches where rewritten column names don't exist (e.g. branches with different or missing aliases).NonLiteralExceptioncatch: skips branches where the timestamp column maps to a computed expression (e.g.,dateadd(...)on the timestamp).deepClone: gives each branch its own expression copy, preventing cross-branch mutation.Composite filters
For
WHERE ts IN '2025-12-01T01;2h' AND x > 5, the optimizer splits the top-level AND into separate conjuncts and processes each independently: it pushes the timestamp conjunct into branches, whilex > 5stays at the parent level only.Test plan
SqlParserTestplan-level tests covering timestamp pushdown through UNION, UNION ALL, EXCEPT, INTERSECT, 3-branch chains, non-literal branches, same-table branches, and mixed timestamp/non-timestamp filtersSqlParserTestcases verifying that non-timestamp filters are NOT pushed into branchesUnionTestruntime regression test verifying that filter pushdown into a UNION ALL branch with SAMPLE BY does not change aggregation semanticsUnionAllCastTestcases pass (type mismatches and column resolution across branches with different schemas)🤖 Generated with Claude Code