fix(core): fix assertion errors in UNION ALL with column count mismatch#6744
fix(core): fix assertion errors in UNION ALL with column count mismatch#6744bluestreak01 merged 13 commits intomasterfrom
Conversation
…sed columns
In SqlOptimiser.propagateTopDownColumns0(), WHERE clause and timestamp
column literals were emitted to union model branches by name resolution.
Since UNION matches columns by position, not name, the same alias (e.g.
"ts") can resolve to different column indices in different branches. This
caused one branch to receive extra top-down columns, producing a column
count mismatch that triggered the assert in
SqlCodeGenerator.checkIfSetCastIsRequired().
For example, given:
select ts from (
select name1, ts1, sym1 ts from t1
union all
select name2, ts2 ts, sym2 from t2
) where ts in '2025-12-01T01;2h'
The alias "ts" maps to position 2 (sym1) in the left branch but
position 1 (ts2) in the right branch. The name-based emission added the
wrong positional column to the right branch, while the subsequent
index-based propagation added the correct one, leaving the right branch
with one extra column.
The fix removes the name-based emission loops for both WHERE clause and
explicit timestamp literals. The existing index-based propagation lower
in the method already correctly handles union branches by deriving
positional indices from the left branch's top-down columns.
Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@mtopolnik this does NOT fix the referenced issue |
|
Reference removed, the issue solved by this PR is related, but different. This PR doesn't have an open issue it fixes; I discovered this problem and fixed it immediately without creating an issue. |
When a GROUP BY/SAMPLE BY model is the first UNION member and the outer query selects a subset of columns, the optimizer adds non- aggregate key columns to topDownColumns but doesn't propagate them to union siblings, causing a column count mismatch assertion in SqlCodeGenerator.checkIfSetCastIsRequired. Propagate the same key columns to all union siblings whose topDownColumns are already populated. Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
Update to previous comment: added another fix, so now the referenced issue #5751 is fixed as well. |
core/src/test/java/io/questdb/test/griffin/engine/SqlCodeGeneratorTest.java
Outdated
Show resolved
Hide resolved
core/src/test/java/io/questdb/test/griffin/engine/SqlCodeGeneratorTest.java
Outdated
Show resolved
Hide resolved
core/src/test/java/io/questdb/test/griffin/engine/SqlCodeGeneratorTest.java
Outdated
Show resolved
Hide resolved
core/src/test/java/io/questdb/test/griffin/engine/SqlCodeGeneratorTest.java
Outdated
Show resolved
Hide resolved
The previous approach propagated GROUP BY key columns forward through the union chain from inside the GROUP BY branch. This was order-dependent: it only worked when the GROUP BY branch came first, since later branches could not propagate backward. Move the logic to the outer query level, before the existing indexed propagation loop. Pre-scan all UNION branches for any GROUP BY model and add its key column positions to nested's topDownColumns. The indexed loop then distributes them to all siblings by position, regardless of where the GROUP BY branch sits in the chain. Co-Authored-By: Claude Opus 4.6 <[email protected]>
[PR Coverage check]😍 pass : 11 / 11 (100.00%) file detail
|
Fixes #5751
Two related bugs in
SqlOptimiser.propagateTopDownColumns0()cause the sameAssertionErrorinSqlCodeGenerator.checkIfSetCastIsRequired— both stemfrom UNION sibling models ending up with different
topDownColumnscounts.Bug 1: WHERE clause on aliased columns
WHERE clause and timestamp column literals were emitted to union model
branches by name resolution. Since UNION matches columns by position, not
name, the same alias (e.g. "ts") can resolve to different column indices in
different branches. This caused one branch to receive extra top-down columns.
Reproducer:
Fix: Remove the name-based emission loops for both WHERE clause and
explicit timestamp literals. The existing index-based propagation lower in
the method already correctly handles union branches by deriving positional
indices from the left branch's top-down columns.
Bug 2: GROUP BY / SAMPLE BY key column propagation
When a GROUP BY/SAMPLE BY model is the first UNION member and the outer
query selects only a subset of columns, the GROUP BY model adds its
non-aggregate key columns (e.g.
timestamp) to its owntopDownColumnsbutdoesn't propagate them to union siblings.
Reproducer (from #5751):
Fix: After the GROUP BY model adds non-aggregate key columns to its own
topDownColumns, propagate the same columns to all union siblings whosetopDownColumnsare already populated.Test plan
SqlCodeGeneratorTest#testUnionAllAssertionError— regression test for Bug 1SqlCodeGeneratorTest#testUnionAllSampleByAssertionError— regression test for Bug 2UnionTest— full suite passesNestedSetOperationTest— full suite passes🤖 Generated with Claude Code