Skip to content

fix(sql): fix SELECT DISTINCT queries with table name prefixed columns#6061

Merged
bluestreak01 merged 11 commits intomasterfrom
puzpuzpuz_select_distinct_broken_aliases
Aug 24, 2025
Merged

fix(sql): fix SELECT DISTINCT queries with table name prefixed columns#6061
bluestreak01 merged 11 commits intomasterfrom
puzpuzpuz_select_distinct_broken_aliases

Conversation

@puzpuzpuz
Copy link
Copy Markdown
Contributor

@puzpuzpuz puzpuzpuz commented Aug 18, 2025

Fixes #6058
Fixes #6066

Queries like

select distinct sensors.sensor_id, sensors.apptype
from sensors;

and any queries that involve table name-prefixed columns in SELECT weren't compiling. That's due to select distinct to group by rewrite combined with expression alias generation. The alias generation kept sensors.sensor_id as the column alias, without any quotes, and later SqlOptimiser removed the sensors. leading to broken column references between the outer and the nested query models.

The fix is to use sensor_id instead of sensors.sensor_id as the implicit alias.

@puzpuzpuz puzpuzpuz self-assigned this Aug 18, 2025
@puzpuzpuz puzpuzpuz added Bug Incorrect or unexpected behavior SQL Issues or changes relating to SQL execution labels Aug 18, 2025
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Aug 18, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Updates adjust alias generation to handle dotted identifiers and disallowed names case-insensitively, add a ranged contains() to LowerCaseCharSequenceHashSet, and refine local immutability/comments in SqlOptimiser. Tests are expanded for DISTINCT with joins and alias behaviors, including toggling column-alias-expression settings.

Changes

Cohort / File(s) Summary
Alias generation and disallowed set
core/src/main/java/io/questdb/griffin/SqlUtil.java
Reworked createExprColumnAlias to be dot-aware; switched disallowedAliases to LowerCaseCharSequenceHashSet; updated quoting/length/substring rules; adjusted early-exit and duplicate handling.
LowerCase set API
core/src/main/java/io/questdb/std/LowerCaseCharSequenceHashSet.java, core/src/test/.../LowerCaseCharSequenceHashSetTest.java
Added public contains(CharSequence, int, int); extended tests to validate ranged contains.
Optimizer local immutability and comments
core/src/main/java/io/questdb/griffin/SqlOptimiser.java
Marked locals final in DISTINCT rewrite path; corrected typos and comments; no logic changes.
DISTINCT and join alias tests
core/src/test/java/io/questdb/test/griffin/DistinctTest.java
Added tests for DISTINCT with table.column aliases across inner joins; variants with column-alias-expression enabled/disabled; helper methods; note duplication present.
SqlUtil alias tests
core/src/test/java/io/questdb/test/griffin/SqlUtilTest.java
Expanded/renamed tests: duplicates, trim-end, non-literal handling, prefixed table.column behavior, trimming; aligned maps/imports.

Sequence Diagram(s)

sequenceDiagram
  participant Parser
  participant SqlUtil as SqlUtil.createExprColumnAlias
  participant Set as LowerCaseCharSequenceHashSet

  Parser->>SqlUtil: base, aliasMap, nonLiteral, maxLen
  alt base contains '.'
    SqlUtil->>Set: contains(base, dot+1, end)
    Set-->>SqlUtil: true/false
    SqlUtil->>SqlUtil: decide quoting, take substring after dot
  else no dot
    SqlUtil->>Set: contains(base)
    Set-->>SqlUtil: true/false
    SqlUtil->>SqlUtil: decide quoting, use full base
  end
  SqlUtil->>SqlUtil: trim/limit, dedupe with suffixes
  SqlUtil-->>Parser: alias
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Assessment against linked issues

Objective Addressed Explanation
Fix “Invalid table name or alias” when selecting DISTINCT with table-prefixed columns in INNER JOIN [#6058]

Assessment against linked issues: Out-of-scope changes

None found.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch puzpuzpuz_select_distinct_broken_aliases

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@puzpuzpuz puzpuzpuz marked this pull request as ready for review August 18, 2025 13:39
@ideoma
Copy link
Copy Markdown
Collaborator

ideoma commented Aug 19, 2025

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Aug 19, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
core/src/main/java/io/questdb/griffin/SqlUtil.java (3)

106-110: Optional: early-exit for prefixed literals (micro-optimization)

You can extend the early-exit to return the unprefixed column substring when not quoting and constraints are met. This avoids allocating a CharacterStoreEntry for simple, non-conflicting cases.

Non-blocking; current logic is correct.


133-139: Fix trimming logic for prefixed literals (trims the wrong slice today)

When prefixedLiteral is true, the trailing-space check uses base.charAt(len - 1) and Chars.lastIndexOfDifferent(base, 0, len, ' '), which index into the start of the full base rather than the substring after the dot. This can produce incorrect trimming if the column segment has trailing spaces.

Adjust the indices to operate within the substring range:

-            // We don't want the alias to finish with a space.
-            if (!quote && len > 0 && base.charAt(len - 1) == ' ') {
-                final int lastSpace = Chars.lastIndexOfDifferent(base, 0, len, ' ');
-                if (lastSpace > 0) {
-                    len = lastSpace + 1;
-                }
-            }
+            // We don't want the alias to finish with a space.
+            if (!quote && len > 0) {
+                final int baseStart = prefixedLiteral ? indexOfDot + 1 : 0;
+                final int baseEnd = baseStart + len; // exclusive
+                if (base.charAt(baseEnd - 1) == ' ') {
+                    final int lastNonSpace = Chars.lastIndexOfDifferent(base, baseStart, baseEnd, ' ');
+                    if (lastNonSpace >= baseStart) {
+                        len = lastNonSpace - baseStart + 1;
+                    } else {
+                        len = 0;
+                    }
+                }
+            }

Low impact in practice (parsing typically strips trailing spaces), but this makes the logic correct for all inputs, including quoted or unusual spacing.


99-104: Non-blocking: consider last dot for multi-part identifiers

Currently the substring is taken after the first dot. If multi-part identifiers (e.g., schema.table.column) ever appear in base, using the last dot would yield the column segment. QuestDB doesn’t use multi-part names today, so this is a future-proofing note only.

core/src/test/java/io/questdb/test/griffin/DistinctTest.java (1)

335-356: Good reproduction of the original failure mode; minor DRY opportunity

  • This precisely matches the reported failure (table-prefixed columns with DISTINCT).
  • Optional: you could factor table setup/insert into a small helper to reduce duplication across the three helper methods.

Non-blocking.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 330ac8b and 31efe4a.

📒 Files selected for processing (6)
  • core/src/main/java/io/questdb/griffin/SqlOptimiser.java (5 hunks)
  • core/src/main/java/io/questdb/griffin/SqlUtil.java (3 hunks)
  • core/src/main/java/io/questdb/std/LowerCaseCharSequenceHashSet.java (1 hunks)
  • core/src/test/java/io/questdb/test/griffin/DistinctTest.java (2 hunks)
  • core/src/test/java/io/questdb/test/griffin/SqlUtilTest.java (2 hunks)
  • core/src/test/java/io/questdb/test/std/LowerCaseCharSequenceHashSetTest.java (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: New pull request (Coverage Report Coverage Report)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-other)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start X64Zfs Agent)
  • GitHub Check: New pull request (SelfHosted Other tests Start ARM Agent)
  • GitHub Check: New pull request (SelfHosted Other tests Start X64Zfs Agent)
  • GitHub Check: New pull request (Hosted Running tests on windows-other)
  • GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin)
  • GitHub Check: New pull request (Hosted Running tests on mac-other)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-other)
  • GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-cairo)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-fuzz)
  • GitHub Check: New pull request (Hosted Running tests on mac-griffin)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-griffin)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start ARM Agent)
  • GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
  • GitHub Check: New pull request (Check Changes Check changes)
🔇 Additional comments (14)
core/src/main/java/io/questdb/griffin/SqlOptimiser.java (2)

4340-4346: Immutability in rewriteDistinct: good cleanup and zero semantic risk

Making wrapperNested/wrapperModel/bottomUpColumns and loop locals (qc, ast, alias) final, as well as countAst/countColumn, improves readability and guards against accidental reassignment without altering behavior. This aligns with the rest of the method’s intent and should be a no-op semantically.

Also applies to: 4351-4354, 4373-4386


6754-6762: validateColumnAndGetModelIndex: finalized locals and clearer comment — LGTM

  • final joinModels improves clarity.
  • The updated comment clarifies the special-casing for group-by and projection references; it matches the logic that returns -1 only when the projection match is allowed (and groupByCall is false).
    No functional changes observed.
core/src/test/java/io/questdb/test/std/LowerCaseCharSequenceHashSetTest.java (1)

88-88: Good addition: range-based contains() is covered

The new assertion exercises the contains(CharSequence, int, int) overload and strengthens the saturation verification.

core/src/main/java/io/questdb/std/LowerCaseCharSequenceHashSet.java (1)

70-72: LGTM: range-based contains() overload

The overload mirrors existing contains(key) semantics by delegating to keyIndex(key, lo, hi) < 0. This is consistent with the set’s negative-index “found” convention and supports efficient subrange checks upstream.

core/src/main/java/io/questdb/griffin/SqlUtil.java (2)

45-46: Case-insensitive disallowed alias handling is a solid improvement

Switching disallowedAliases to LowerCaseCharSequenceHashSet aligns with how identifiers are compared elsewhere and fixes uppercase keyword cases without extra normalization.

Also applies to: 68-69


99-104: Dot-aware aliasing and conditional quoting logic look correct

  • prefixedLiteral correctly distinguishes literal table-prefixed identifiers.
  • Quoting rules handle non-literals with dots and reserved words, and only quote the column part for prefixed literals that would otherwise conflict.

This addresses the DISTINCT rewrite aliasing issue described in #6058.

core/src/test/java/io/questdb/test/griffin/DistinctTest.java (3)

36-45: LGTM: coverage for alias-expression toggle

These entry-point tests exercise both configurations by delegating to shared helpers, keeping duplication low while validating behavior under both settings.


358-387: LGTM: inner-join DISTINCT with mixed table prefixes

Validates alias resolution across join boundaries and ensures the unprefixed aliasing doesn’t break inter-model references.


389-422: LGTM: duplicate column name across tables with DISTINCT

Good assertion on header naming differences when alias-expression is enabled vs. disabled (apptype_2 vs apptype1). This guards the exact behavior the optimizer and alias generator should produce.

core/src/test/java/io/questdb/test/griffin/SqlUtilTest.java (5)

70-75: LGTM: reserved-word aliases are quoted case-insensitively

The added uppercase assertions confirm the new lower-casing set behavior.


79-93: LGTM: duplicate base alias suffixing

Verifies sequential _2, _3 behavior and guards against regressions in suffixing logic.


105-113: LGTM: trimming trailing spaces for simple bases

Confirms that the trailing space removal preserves leading spaces while trimming the tail only.


114-122: LGTM: max-length trimming

Ensures aliases respect max length limits without breaking early-exit cases.


125-136: LGTM: non-literal quoting coverage

Covers both pre-quoted and dot-containing non-literals; aligns with the new quote decision logic.

@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 35 / 35 (100.00%)

file detail

path covered line new line coverage
🔵 io/questdb/std/LowerCaseCharSequenceHashSet.java 1 1 100.00%
🔵 io/questdb/griffin/SqlOptimiser.java 9 9 100.00%
🔵 io/questdb/griffin/SqlUtil.java 25 25 100.00%

@bluestreak01
Copy link
Copy Markdown
Member

really solid PR, thanks @puzpuzpuz 👍

@bluestreak01 bluestreak01 merged commit c8f236e into master Aug 24, 2025
35 checks passed
@bluestreak01 bluestreak01 deleted the puzpuzpuz_select_distinct_broken_aliases branch August 24, 2025 21:57
@puzpuzpuz
Copy link
Copy Markdown
Contributor Author

@bluestreak01 thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Incorrect or unexpected behavior SQL Issues or changes relating to SQL execution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

join goes AWOL in select distinct query questdb 9.0.2: invalid table name or alias

4 participants