Skip to content

fix(sql): reject trailing content after valid query#6751

Merged
bluestreak01 merged 15 commits intomasterfrom
mt_syntax-fix
Feb 11, 2026
Merged

fix(sql): reject trailing content after valid query#6751
bluestreak01 merged 15 commits intomasterfrom
mt_syntax-fix

Conversation

@mtopolnik
Copy link
Copy Markdown
Contributor

@mtopolnik mtopolnik commented Feb 5, 2026

Fixes #6726

Summary

  • Add post-compilation validation in compile() to reject unconsumed tokens after a valid statement, instead of silently ignoring them
  • Fix PG compatibility no-op handlers (RESET, CLOSE, UNLISTEN, DISCARD) to require their expected argument instead of silently accepting bare keywords
  • Fix BEGIN, COMMIT, ROLLBACK to consume the optional TRANSACTION keyword
  • Validate SET statement syntax (SET [SESSION | LOCAL] name { = | TO } value [, value]*), rejecting malformed forms like missing name, invalid operator, missing value, and dangling commas

Test plan

  • testTrailingContentAfterDdlRejected — DDL followed by trailing token is rejected
  • testTrailingContentAfterNoOpRejectedRESET ALL extra is rejected
  • testTrailingContentAfterSelectRejectedselect...select is rejected
  • testTrailingContentAfterSetRejectedSET x = y extra_token is rejected
  • testTrailingSemicolonAllowed — trailing ; is accepted
  • testTrailingTokenAbsentIsOk — no trailing content is accepted
  • testCloseMissingArgRejected — bare CLOSE without argument is rejected
  • testDiscardMissingArgRejected — bare DISCARD without argument is rejected
  • testResetMissingArgRejected — bare RESET without argument is rejected
  • testUnlistenMissingArgRejected — bare UNLISTEN without argument is rejected
  • testCompileSetMissingNameSET with no name is rejected
  • testCompileSetMissingOperatorSET x with no =/TO is rejected
  • testCompileSetInvalidOperatorSET x GARBAGE y is rejected
  • testCompileSetMissingValueSET x = with no value is rejected
  • testCompileSetDanglingCommaSET x = y, with trailing comma is rejected
  • testCompileSetWithToSET x TO y is accepted
  • testCompileSetMultipleValuesSET x TO y, z is accepted
  • testCompileSetWithSessionSET SESSION x = y is accepted
  • testCompileSetWithLocalSET LOCAL x = y is accepted
  • testCompileSetQuotedValueSET x = 'quoted' is accepted
  • Existing testCompileSet, testCompileResetDoesNothing, testCompileCloseDoesNothing, testCompileUnlistenDoesNothing still pass

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 5, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch mt_syntax-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mtopolnik mtopolnik added the Bug Incorrect or unexpected behavior label Feb 6, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@core/src/main/java/io/questdb/griffin/SqlCompilerImpl.java`:
- Around line 392-401: The current post-compile check in SqlCompilerImpl (after
calling compileInner) only inspects one token via SqlUtil.fetchNext(lexer) and
treats a single semicolon as OK, allowing inputs like "select 1; select 2";
change the logic to consume and ignore any number of trailing semicolon tokens
(use isSemicolon(tok) in a loop) and only treat a non-null, non-semicolon token
as an unexpected-token error (throw SqlException using
lexer.lastTokenPosition()), while still skipping the entire check when
executionContext.isValidationOnly() is true; also add a unit test that verifies
"semicolon + extra token" is rejected.
- Around line 3158-3162: In compileNoOp, SqlUtil.fetchNext(lexer) currently
accepts a bare ';' as an argument; change it to validate the fetched token from
the lexer (via SqlUtil.fetchNext or by inspecting lexer state) and if the next
token is a semicolon, EOF or otherwise empty/missing argument, throw a
SqlException indicating a missing/expected argument for the PG no-op statements
instead of silently calling compiledQuery.ofSet(); ensure the validation
references the lexer token returned by SqlUtil.fetchNext(lexer) and only call
compiledQuery.ofSet() after a successful non-empty-argument check.
- Around line 3328-3345: The compileSet method currently accepts malformed SET
statements because it doesn't validate presence of name/operator/value or detect
trailing commas; update compileSet to check each SqlUtil.fetchNext(lexer) result
(name token, operator token, first value, and each value after a comma) and
throw a SqlException when any required token is null or a semicolon (i.e.,
missing name/operator/value or trailing comma). Specifically, in compileSet,
validate the initial tok for name, then validate the operator tok (must not be
null/semicolon), then validate the first value tok, and inside the while loop
ensure the token after a comma is present and valid before continuing; only call
lexer.unparseLast() when the loop was stopped by a non-null non-comma token, and
include clear error messages referencing the offending part (e.g., "missing
value after ',' in SET" or "missing operator/value in SET").

In `@core/src/test/java/io/questdb/test/griffin/engine/SqlCompilerImplTest.java`:
- Around line 7190-7200: The test testTrailingContentAfterSelectRejected
currently triggers the "keyword-as-identifier" path; change the exercised SQL to
be a syntactically complete SELECT followed by extra tokens so the
trailing-token validator fails (for example use "select x from tab where x=0
extra" instead of "select x from tab select x from tab"), and update the
assertExceptionNoLeakCheck call (the SQL string, expected error position and
expected message) to assert the trailing-token validation error from the
validator invoked after compile instead of the keyword-as-identifier message.

Copy link
Copy Markdown
Member

@bluestreak01 bluestreak01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtopolnik Here's a detailed review of the PR:

1. Critical: SET TIME ZONE will break

The compileSet method assumes the grammar SET [SESSION | LOCAL] name { = | TO } value [, ...]. But PostgreSQL's SET TIME ZONE 'UTC' uses a two-word compound parameter name (TIME ZONE), which doesn't fit this grammar. Tracing through:

SET TIME ZONE 'UTC'
         ↑
         compileSet reads "TIME" as parameter name (not SESSION/LOCAL)
         then reads "ZONE" expecting '=' or 'TO' → throws "'=' or 'TO' expected"

Some PG JDBC drivers send SET TIME ZONE during connection initialization. Before this PR, compileSet was a pure no-op that consumed nothing, so any SET statement was silently accepted. Now it will reject this form. A test for SET TIME ZONE 'UTC' would demonstrate the failure.


2. Use the existing SqlException.unexpectedToken() helper

At SqlCompilerImpl.java:399, the trailing content error is manually constructed:

throw SqlException.$(lexer.lastTokenPosition(), "unexpected token [").put(tok).put(']');

But SqlException.java:152 already has a helper for this exact pattern:

public static SqlException unexpectedToken(int position, CharSequence token) {
    return position(position).put("unexpected token [").put(token).put(']');
}

Should use SqlException.unexpectedToken(lexer.lastTokenPosition(), tok) instead.


3. compileSet semicolon handling: inconsistency with compileBegin/compileCommit/compileRollback

In compileSet, when the do-while loop exits on a semicolon, tok != null is true, so unparseLast() is called — pushing the ; back. In compile() (single-query mode), the trailing content check then re-fetches it and accepts it. This is correct.

However, the BEGIN/COMMIT/ROLLBACK methods have the opposite behavior — they consume the semicolon (no unparseLast when isSemicolon(tok) is true), relying on goToQueryEnd()'s initial unparseLast() to recover in batch mode. The inconsistency between how compileSet and compileBegin handle ; is confusing and fragile.


4. Stale comment in compileBatch

SqlCompilerImpl.java:465:

// We've to move lexer because some query handlers don't consume all tokens (e.g. SET )

After this PR, SET now fully consumes its tokens. This comment is stale.


5. Test coverage gaps

Missing happy-path tests:

  • BEGIN TRANSACTION, COMMIT TRANSACTION, ROLLBACK TRANSACTION — the new TRANSACTION-keyword consumption is untested
  • SET TIME ZONE 'UTC' — common PG statement (will reveal bug #1 above)

Missing error-path tests:

  • BEGIN extra_garbage — should be rejected by trailing content check, but no test verifies this
  • COMMIT extra_garbage / ROLLBACK extra_garbage — same gap
  • CLOSE ALL extra, DISCARD ALL extra, UNLISTEN * extra — only RESET ALL extra is tested for no-op trailing content

6. Latent test fixes are correct

The PR fixes genuinely broken test SQL that was previously silently accepted:

  • AlterTableDropPartitionTest.java:917: Stray ) removed — to_timestamp('2020', 'yyyy'))to_timestamp('2020', 'yyyy')
  • MatViewTest.java:1871: Stray ;) removed — full;)full
  • LimitTest.java:1224: Semicolon misused as comma — limit :lo; :hilimit :lo, :hi

These are valuable finds.


7. Minor: Qodana exclusion scope

The addition to qodana.yaml excludes SqlCompilerImpl.java from the unused inspection at the file level. It would be better to suppress at the specific method/field level with @SuppressWarnings("unused") to avoid masking future dead code in this large file.


8. Zero-GC compliance

The new code is zero-GC safe — all token operations use existing CharSequence references from the lexer, no allocations on the data path. Looks good.


Summary

Severity Issue
Critical SET TIME ZONE 'UTC' will break — compound parameter names not supported
Medium Should use SqlException.unexpectedToken() helper instead of manual construction
Medium Missing tests for BEGIN/COMMIT/ROLLBACK TRANSACTION and trailing garbage
Low Stale comment in compileBatch about SET not consuming tokens
Low Inconsistent semicolon handling between compileSet and compileBegin
Low Qodana file-level exclusion could mask future issues
Positive Latent syntax errors in tests correctly fixed
Positive Zero-GC compliance maintained
Positive Error positions verified correct for all tested cases

- Use SqlException.unexpectedToken() helper instead of manual string
  construction in compile() trailing content check
- Remove stale "(e.g. SET )" from compileBatch comment since SET now
  consumes its tokens
- Make compileBegin/compileCommit/compileRollback unparse semicolons
  instead of consuming them, consistent with compileSet behavior
- Add lenient fallback in compileSet for non-standard forms like
  SET TIME ZONE 'UTC' that don't use = or TO operators
- Fix testTrailingContentAfterSelectRejected to exercise the post-compile
  trailing content validator instead of the keyword-as-identifier path
- Add happy-path tests for BEGIN/COMMIT/ROLLBACK TRANSACTION and
  SET TIME ZONE 'UTC'
- Add error-path tests for trailing content after BEGIN, COMMIT, ROLLBACK
- Add trailing content tests for CLOSE ALL, DISCARD ALL, UNLISTEN *
- Add tests proving CLOSE;, DISCARD;, RESET; correctly reject semicolons
  as arguments
- Add test proving content after semicolons is rejected in single-query
  mode

Co-Authored-By: Claude Opus 4.6 <[email protected]>
bluestreak01
bluestreak01 previously approved these changes Feb 11, 2026
@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 33 / 33 (100.00%)

file detail

path covered line new line coverage
🔵 io/questdb/griffin/SqlCompilerImpl.java 33 33 100.00%

@bluestreak01 bluestreak01 merged commit 53c59d0 into master Feb 11, 2026
44 checks passed
@bluestreak01 bluestreak01 deleted the mt_syntax-fix branch February 11, 2026 10:24
@bluestreak01 bluestreak01 restored the mt_syntax-fix branch February 11, 2026 10:35
@bluestreak01 bluestreak01 deleted the mt_syntax-fix branch February 11, 2026 10:36
maciulis pushed a commit to maciulis/questdb that referenced this pull request Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Incorrect or unexpected behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Query parser ignoring anything after limit

4 participants