fix(sql): breaking change 💥- invalid column error returned from GROUP BY on joined tables#6275
Conversation
WalkthroughShifts select metadata handling to a new queryMetadata in SqlCodeGenerator and revises designated timestamp selection to consider ORDER BY/aliases. In SqlOptimiser, refactors select-column/alias propagation across translating/inner/group-by/window/outer models and adds a rewrite for timestamps mixed with aggregates. Updates tests to new planner/formatter outputs and expands SAMPLE BY coverage. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Client
participant P as SqlParser
participant O as SqlOptimiser
participant G as SqlCodeGenerator
participant E as Execution
C->>P: SQL text
P->>O: QueryModel (with select/group-by/window/sample-by)
note right of O: Propagate/assign aliases across translating/inner/group-by/window/outer\nRewrite timestamp mixed with aggregates (sample-by)
O-->>G: Optimised QueryModel
G->>G: Build queryMetadata (columns)
G->>G: Determine designated timestamp\n- Prefer first ORDER BY column\n- Else base timestamp/alias match\n- Else previous fallback
G-->>E: CursorFactory with queryMetadata
E-->>C: Result cursor
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…_select_and_sample_by_rewrites
…ltiple timestamp aliases
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java (1)
5869-5914: Use base metadata, not queryMeta, for LATEST BY key resolutionprepareLatestByColumnIndexes(latestBy, …) and all index/type/symbol checks in generateLatestBy (isColumnIndexed, getColumnType, isSymbolTableStatic) as well as RecordSinkFactory.getInstance calls currently use queryMeta, forcing LATEST BY keys into the SELECT list. Change to:
- Call prepareLatestByColumnIndexes(latestBy, metadata) instead of queryMeta
- Perform all index/type/symbol checks (including RecordSinkFactory.getInstance) on metadata
- Keep queryMeta strictly as the output schema and for timestampIndex
Add a regression test:
create table t(ts timestamp, sym symbol, v int) timestamp(ts); select ts, v from t latest by sym;to verify behavior when the LATEST BY key isn’t projected.
🧹 Nitpick comments (13)
core/src/test/java/io/questdb/test/griffin/SqlCompilerImplTest.java (1)
5267-5269: Unquoted alias variant covered.Same outcome without quotes; good negative/positive coverage of alias normalization.
If alias-generation rules evolve, these hard-coded names may churn. Consider a small helper to derive/verify de-duplicated names from metadata to reduce brittleness.
core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java (3)
7237-7271: Good addition: verifies SAMPLE BY dedup of duplicate keysCovers both alias-first and alias-second permutations; expected keys use a single k/k1 alongside ts. Plan formatting matches existing tests. Consider adding a calendar/time zone align variant to ensure dedup holds across both SAMPLE BY paths.
10678-10697: ORDER BY with alias and projection preserved correctlyPlan shows partiallySorted Sort with keys [ts, l1] and filter applied above. Reads consistent with earlier reorder tests. Optionally add a case ordering by explicit aliases (e.g., “order by ts1, l1”) to complement the ordinal tests.
10721-10739: ORDER BY by alias ordinal validated“order by 2, 3” mapping to [ts1, l1] is correct and aligns with the alias propagation refactor. Consider adding a mixed-direction variant (e.g., “order by 2 desc, 3 asc”) to guard against direction handling regressions.
core/src/test/java/io/questdb/test/griffin/SqlParserTest.java (5)
4526-4529: GROUP BY alias/key dedup test updates LGTM; add a regression for referencing deduped keysThe updated expectation removes duplicate keys (k/k1) under SAMPLE BY. Consider adding a follow-up test that orders/filters by both the original expr and the deduped alias to ensure name resolution remains stable after rewrite.
4544-4547: Consistent SAMPLE BY output after virtual + group-by rewriteExpectation now aligns with “align to first observation.” Looks good. Please also add one variant with ALIGN TO CALENDAR to cover both branches of the planner rewrite here.
9536-9564: Aliased column in SAMPLE BY: verify dedup under different align modesThese three tests cover dedup behavior for aliased k/k1 across:
- ALIGN TO FIRST OBSERVATION
- default
- ALIGN TO CALENDAR
Suggestion: include an assertion on column ordering (b, sum, k, k1) to guard against non-deterministic reordering in future refactors.
10429-10451: SAMPLE BY + UNION ALL: column shape and determinismThe UNION ALL expectations for SAMPLE BY plan output look correct. Please add one mixed case where the two branches have differing align modes to verify the normalizer makes them compatible or produces a clear error.
10606-10624: Extend collision alias tests for other join types
Existing tests already verify multi-collision renames (TS, ts1, ts11) and case-insensitive aliasing. Add a complementary test using a right-hand ASOF/LT join to confirm star expansion order and deterministic suffixing under that join direction.core/src/test/java/io/questdb/test/griffin/engine/groupby/SampleByTest.java (2)
6223-6244: Plan assertion is brittle: avoid hard‑coding “workers: 1”.“Async Group By workers: 1” may vary with configuration. Prefer asserting the presence of “Async Group By” (or set max workers to 1 via test config) to keep this stable across environments.
6280-6345: Freeze time to avoid flakiness from now().The last two queries divide by max(i) using datediff(..., now()). Unless the test clock is frozen globally, these are time‑dependent. Either pin “now” in the test base or use a fixed timestamp literal in the queries.
Suggested query tweaks (adjust expected values accordingly):
- SELECT ts, datediff('h', ts, now()) / max(i) diff1, datediff('d', ts, now()) / MaX(i) diff2 + SELECT ts, datediff('h', ts, timestamp '2025-01-01T00:00:00.000000Z') / max(i) diff1, datediff('d', ts, timestamp '2025-01-01T00:00:00.000000Z') / MaX(i) diff2- SELECT ts, max(i), datediff('h', ts, now()) / max(i) diff1, datediff('d', ts, now()) / MaX(i) diff2 + SELECT ts, max(i), datediff('h', ts, timestamp '2025-01-01T00:00:00.000000Z') / max(i) diff1, datediff('d', ts, timestamp '2025-01-01T00:00:00.000000Z') / MaX(i) diff2core/src/main/java/io/questdb/griffin/SqlOptimiser.java (2)
6634-6688: Aggregate aliasing in SAMPLE BY rewrite — small improvementCurrent aliasing uses only the function name (e.g., "sum"), which may reduce readability when multiple aggregates exist. Prefer basing the alias on the full expression node to get more descriptive, yet still unique, aliases.
Apply this minimal change:
- aggregateAlias = createColumnAlias(node.rhs.token, sampleByModel); + aggregateAlias = createColumnAlias(node.rhs, sampleByModel);and similarly for the lhs branch:
- aggregateAlias = createColumnAlias(node.lhs.token, sampleByModel); + aggregateAlias = createColumnAlias(node.lhs, sampleByModel);
1585-1619: Alias source for new inner aliases (optional)When the column isn’t yet referenced by the translating model,
innerAliasis sometimes derived from the group-by model’s map. Consider always deriving new aliases from the translatingModel to avoid rare cross‑model alias collisions, then wiring into groupBy/inner accordingly.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java(34 hunks)core/src/main/java/io/questdb/griffin/SqlOptimiser.java(12 hunks)core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java(2 hunks)core/src/test/java/io/questdb/test/griffin/GroupByRewriteTest.java(1 hunks)core/src/test/java/io/questdb/test/griffin/SampleBySqlParserTest.java(11 hunks)core/src/test/java/io/questdb/test/griffin/SqlCompilerImplTest.java(1 hunks)core/src/test/java/io/questdb/test/griffin/SqlParserTest.java(7 hunks)core/src/test/java/io/questdb/test/griffin/engine/groupby/SampleByTest.java(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Other tests Start X64Zfs Agent)
- GitHub Check: New pull request (Hosted Running tests on windows-other)
- GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin)
- GitHub Check: New pull request (Hosted Running tests with cover on linux-other)
- GitHub Check: New pull request (Hosted Running tests on mac-other)
- GitHub Check: New pull request (Hosted Running tests with cover on linux-pgwire)
- GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
- GitHub Check: New pull request (Hosted Running tests with cover on linux-cairo)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
- GitHub Check: New pull request (Hosted Running tests with cover on linux-fuzz)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo)
- GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
- GitHub Check: New pull request (Hosted Running tests with cover on linux-griffin)
- GitHub Check: New pull request (Hosted Running tests on mac-griffin)
- GitHub Check: New pull request (SelfHosted Other tests Start ARM Agent)
- GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start ARM Agent)
- GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start X64Zfs Agent)
- GitHub Check: New pull request (Check Changes Check changes)
- GitHub Check: build
🔇 Additional comments (14)
core/src/test/java/io/questdb/test/griffin/SqlCompilerImplTest.java (3)
5250-5251: Alias de-dup looks correct (case-insensitive).Expecting TS, ts2, ts1 matches the new unique-name policy when mixing explicit alias + unaliased + conflicting alias.
5256-5257: Wildcard + explicit alias ordering validated.Headers TS, ts1, ts2, x, ts3, x1 are consistent with stable, unique auto‑rename under star expansion.
5261-5263: t1. + trailing alias conflict handled as expected.*ts1 for t1.ts and ts11 for the trailing t2.ts alias confirm deterministic conflict resolution.
core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java (1)
10700-10718: Ordinal ORDER BY resolution looks right“order by 1, 3” maps to [ts, l1] as expected; plan structure is consistent.
core/src/test/java/io/questdb/test/griffin/SampleBySqlParserTest.java (2)
32-376: Issue #6273 already covered by existing test A test at core/src/test/java/io/questdb/test/griffin/GroupByRewriteTest.java:192 verifies the query involvingt2.price / sum(t1.amount); no further tests needed.
203-203: Manual test verification needed
The test run produced no output; please ensure the Gradle wrapper (gradlew) is present/executable and thatSampleBySqlParserTestcompiles and passes all assertions with the updated expected SQL strings.core/src/test/java/io/questdb/test/griffin/GroupByRewriteTest.java (2)
32-212: Excellent comprehensive test coverage for duplicate key prevention.This test method thoroughly exercises the query rewrite logic to ensure duplicate keys are not created when the same column appears multiple times in different positions or with aliases. The test at lines 190-211 is particularly important as it directly addresses the bug fix for issue #6273, verifying that columns from different tables (t1.price and t2.price) are correctly kept as separate keys rather than being incorrectly deduplicated.
214-233: LGTM! Focused test for constant key extraction.This test correctly verifies that constant literals in the SELECT clause are not treated as group keys, ensuring optimal execution plan generation. Only the non-constant, non-aggregate column
amountis correctly identified as a key.core/src/test/java/io/questdb/test/griffin/engine/groupby/SampleByTest.java (2)
6207-6213: Validate expected rows for the first query (3 inserts → 3 buckets).You insert 3 rows spanning different days/years; SAMPLE BY 1d ALIGN TO CALENDAR (TZ 'Europe/Copenhagen') should produce 3 buckets. Please confirm the expected block includes all rows (the snippet shows only one line).
6246-6278: LGTM: rewrite path and plan look correct.The plan shows Async Group By with keys [ts] and values [count(*), max(ts)], and the projection computes max::long/count as expected.
core/src/main/java/io/questdb/griffin/SqlOptimiser.java (4)
1538-1583: Group-by key de‑duplication and alias propagation look solidReusing existing translated/group-by aliases and exposing them through outer/inner/window models avoids duplicate GBY keys and fixes JOIN ambiguity. Good improvement.
Also applies to: 1601-1619
1621-1632: Wildcard expansion correctly respects group-by contextThreading the isGroupBy flag through wildcard expansion ensures columns are pushed to the right models. Looks good.
Also applies to: 1641-1653, 1673-1684, 1702-1713
3009-3012: Helper overload for position-aware columnsGood extraction. Using SqlUtil.nextColumn with position centralizes column creation.
4078-4116: Avoid duplicate GBY keys by reusing selected columnsRecognizing when a literal is already selected (via translating→groupBy maps) and substituting the alias prevents extra GBY keys. This should address the “invalid column” on JOINs with aggregates.
|
nit: you can remove the ignore tag from |
Thanks for the heads-up. I didn't check the disabled tests. Enabled in 6dc3685 |
36e7a15 to
6dc3685
Compare
ideoma
left a comment
There was a problem hiding this comment.
Still reviewing, just wanted to ask to boost the overall test coverage, please. I do not see exception handling that is hard to hit; this is query logic, which should be fully testable.
…_select_and_sample_by_rewrites
Covered one branch and removed two never-taken in 7372ad2. Everything else seems covered. |
core/src/test/java/io/questdb/test/griffin/engine/groupby/SampleByTest.java
Outdated
Show resolved
Hide resolved
[PR Coverage check]😍 pass : 120 / 121 (99.17%) file detail
|
Fixes #6273
Other than that, SAMPLE BY queries with selected expressions mixing timestamp and aggregate functions now use multi-threaded execution plan. Example of such query:
Also improves the way we choose execution plan for SAMPLE BY and GROUP BY queries. Namely, now we avoid adding duplicate group keys and, instead, reuse the existing ones. This means that in a query like the following one:
we now end up with
symbol, priceas the group keys instead ofsymbol, price, price price1(notice the extraprice1key).Breaking change
When designated timestamp is selected multiple times, under different aliases, the ordered (sorted in asc order) column picked up by the query engine will be the first one. For instance, in the below example column
ts1will be marked as the ordered column:If one of the selected columns is used as the first ORDER BY column, it's always preferred, no matter what its position is in the
SELECTclause:Otherwise, unaliased column, i.e. the selected column matching with the designated column name, is preferred.