Skip to content

fix(core): fix assertion errors in UNION ALL with column count mismatch#6744

Merged
bluestreak01 merged 13 commits intomasterfrom
mt_internal-error-fix
Feb 11, 2026
Merged

fix(core): fix assertion errors in UNION ALL with column count mismatch#6744
bluestreak01 merged 13 commits intomasterfrom
mt_internal-error-fix

Conversation

@mtopolnik
Copy link
Copy Markdown
Contributor

@mtopolnik mtopolnik commented Feb 4, 2026

Fixes #5751

Two related bugs in SqlOptimiser.propagateTopDownColumns0() cause the same
AssertionError in SqlCodeGenerator.checkIfSetCastIsRequired — both stem
from UNION sibling models ending up with different topDownColumns counts.

Bug 1: WHERE clause on aliased columns

WHERE clause and timestamp column literals were emitted to union model
branches by name resolution. Since UNION matches columns by position, not
name, the same alias (e.g. "ts") can resolve to different column indices in
different branches. This caused one branch to receive extra top-down columns.

Reproducer:

SELECT ts FROM (
  SELECT name1, ts1, sym1 ts FROM t1
  UNION ALL
  SELECT name2, ts2 ts, sym2 FROM t2
) WHERE ts IN '2025-12-01T01;2h'

Fix: Remove the name-based emission loops for both WHERE clause and
explicit timestamp literals. The existing index-based propagation lower in
the method already correctly handles union branches by deriving positional
indices from the left branch's top-down columns.

Bug 2: GROUP BY / SAMPLE BY key column propagation

When a GROUP BY/SAMPLE BY model is the first UNION member and the outer
query selects only a subset of columns, the GROUP BY model adds its
non-aggregate key columns (e.g. timestamp) to its own topDownColumns but
doesn't propagate them to union siblings.

Reproducer (from #5751):

SELECT high FROM (
  SELECT timestamp, max(price) AS high FROM trades SAMPLE BY 1m
  UNION ALL
  SELECT timestamp, high FROM trades_agg
)

Fix: After the GROUP BY model adds non-aggregate key columns to its own
topDownColumns, propagate the same columns to all union siblings whose
topDownColumns are already populated.

Test plan

  • SqlCodeGeneratorTest#testUnionAllAssertionError — regression test for Bug 1
  • SqlCodeGeneratorTest#testUnionAllSampleByAssertionError — regression test for Bug 2
  • UnionTest — full suite passes
  • NestedSetOperationTest — full suite passes

🤖 Generated with Claude Code

…sed columns

In SqlOptimiser.propagateTopDownColumns0(), WHERE clause and timestamp
column literals were emitted to union model branches by name resolution.
Since UNION matches columns by position, not name, the same alias (e.g.
"ts") can resolve to different column indices in different branches. This
caused one branch to receive extra top-down columns, producing a column
count mismatch that triggered the assert in
SqlCodeGenerator.checkIfSetCastIsRequired().

For example, given:
  select ts from (
    select name1, ts1, sym1 ts from t1
    union all
    select name2, ts2 ts, sym2 from t2
  ) where ts in '2025-12-01T01;2h'

The alias "ts" maps to position 2 (sym1) in the left branch but
position 1 (ts2) in the right branch. The name-based emission added the
wrong positional column to the right branch, while the subsequent
index-based propagation added the correct one, leaving the right branch
with one extra column.

The fix removes the name-based emission loops for both WHERE clause and
explicit timestamp literals. The existing index-based propagation lower
in the method already correctly handles union branches by deriving
positional indices from the left branch's top-down columns.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@mtopolnik mtopolnik added the Bug Incorrect or unexpected behavior label Feb 4, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 4, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch mt_internal-error-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bluestreak01
Copy link
Copy Markdown
Member

@mtopolnik this does NOT fix the referenced issue

@mtopolnik
Copy link
Copy Markdown
Contributor Author

Reference removed, the issue solved by this PR is related, but different. This PR doesn't have an open issue it fixes; I discovered this problem and fixed it immediately without creating an issue.

When a GROUP BY/SAMPLE BY model is the first UNION member and the
outer query selects a subset of columns, the optimizer adds non-
aggregate key columns to topDownColumns but doesn't propagate them
to union siblings, causing a column count mismatch assertion in
SqlCodeGenerator.checkIfSetCastIsRequired.

Propagate the same key columns to all union siblings whose
topDownColumns are already populated.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@mtopolnik mtopolnik changed the title fix(core): fix internal error in UNION ALL with WHERE clause on aliased columns fix(core): fix assertion errors in UNION ALL with column count mismatch Feb 9, 2026
@mtopolnik
Copy link
Copy Markdown
Contributor Author

Update to previous comment: added another fix, so now the referenced issue #5751 is fixed as well.

@puzpuzpuz puzpuzpuz self-requested a review February 10, 2026 08:45
mtopolnik and others added 8 commits February 10, 2026 10:45
The previous approach propagated GROUP BY key columns forward
through the union chain from inside the GROUP BY branch. This
was order-dependent: it only worked when the GROUP BY branch
came first, since later branches could not propagate backward.

Move the logic to the outer query level, before the existing
indexed propagation loop. Pre-scan all UNION branches for any
GROUP BY model and add its key column positions to nested's
topDownColumns. The indexed loop then distributes them to all
siblings by position, regardless of where the GROUP BY branch
sits in the chain.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 11 / 11 (100.00%)

file detail

path covered line new line coverage
🔵 io/questdb/griffin/SqlOptimiser.java 11 11 100.00%

@puzpuzpuz puzpuzpuz added the SQL Issues or changes relating to SQL execution label Feb 11, 2026
@bluestreak01 bluestreak01 merged commit 4bf5f9b into master Feb 11, 2026
44 checks passed
@bluestreak01 bluestreak01 deleted the mt_internal-error-fix branch February 11, 2026 10:24
maciulis pushed a commit to maciulis/questdb that referenced this pull request Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Incorrect or unexpected behavior SQL Issues or changes relating to SQL execution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AssertionError in UNION ALL query

4 participants