fix(sql): reject trailing content after valid query#6751
fix(sql): reject trailing content after valid query#6751bluestreak01 merged 15 commits intomasterfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@core/src/main/java/io/questdb/griffin/SqlCompilerImpl.java`:
- Around line 392-401: The current post-compile check in SqlCompilerImpl (after
calling compileInner) only inspects one token via SqlUtil.fetchNext(lexer) and
treats a single semicolon as OK, allowing inputs like "select 1; select 2";
change the logic to consume and ignore any number of trailing semicolon tokens
(use isSemicolon(tok) in a loop) and only treat a non-null, non-semicolon token
as an unexpected-token error (throw SqlException using
lexer.lastTokenPosition()), while still skipping the entire check when
executionContext.isValidationOnly() is true; also add a unit test that verifies
"semicolon + extra token" is rejected.
- Around line 3158-3162: In compileNoOp, SqlUtil.fetchNext(lexer) currently
accepts a bare ';' as an argument; change it to validate the fetched token from
the lexer (via SqlUtil.fetchNext or by inspecting lexer state) and if the next
token is a semicolon, EOF or otherwise empty/missing argument, throw a
SqlException indicating a missing/expected argument for the PG no-op statements
instead of silently calling compiledQuery.ofSet(); ensure the validation
references the lexer token returned by SqlUtil.fetchNext(lexer) and only call
compiledQuery.ofSet() after a successful non-empty-argument check.
- Around line 3328-3345: The compileSet method currently accepts malformed SET
statements because it doesn't validate presence of name/operator/value or detect
trailing commas; update compileSet to check each SqlUtil.fetchNext(lexer) result
(name token, operator token, first value, and each value after a comma) and
throw a SqlException when any required token is null or a semicolon (i.e.,
missing name/operator/value or trailing comma). Specifically, in compileSet,
validate the initial tok for name, then validate the operator tok (must not be
null/semicolon), then validate the first value tok, and inside the while loop
ensure the token after a comma is present and valid before continuing; only call
lexer.unparseLast() when the loop was stopped by a non-null non-comma token, and
include clear error messages referencing the offending part (e.g., "missing
value after ',' in SET" or "missing operator/value in SET").
In `@core/src/test/java/io/questdb/test/griffin/engine/SqlCompilerImplTest.java`:
- Around line 7190-7200: The test testTrailingContentAfterSelectRejected
currently triggers the "keyword-as-identifier" path; change the exercised SQL to
be a syntactically complete SELECT followed by extra tokens so the
trailing-token validator fails (for example use "select x from tab where x=0
extra" instead of "select x from tab select x from tab"), and update the
assertExceptionNoLeakCheck call (the SQL string, expected error position and
expected message) to assert the trailing-token validation error from the
validator invoked after compile instead of the keyword-as-identifier message.
core/src/test/java/io/questdb/test/griffin/engine/SqlCompilerImplTest.java
Show resolved
Hide resolved
Co-Authored-By: Claude Opus 4.6 <[email protected]>
bluestreak01
left a comment
There was a problem hiding this comment.
@mtopolnik Here's a detailed review of the PR:
1. Critical: SET TIME ZONE will break
The compileSet method assumes the grammar SET [SESSION | LOCAL] name { = | TO } value [, ...]. But PostgreSQL's SET TIME ZONE 'UTC' uses a two-word compound parameter name (TIME ZONE), which doesn't fit this grammar. Tracing through:
SET TIME ZONE 'UTC'
↑
compileSet reads "TIME" as parameter name (not SESSION/LOCAL)
then reads "ZONE" expecting '=' or 'TO' → throws "'=' or 'TO' expected"
Some PG JDBC drivers send SET TIME ZONE during connection initialization. Before this PR, compileSet was a pure no-op that consumed nothing, so any SET statement was silently accepted. Now it will reject this form. A test for SET TIME ZONE 'UTC' would demonstrate the failure.
2. Use the existing SqlException.unexpectedToken() helper
At SqlCompilerImpl.java:399, the trailing content error is manually constructed:
throw SqlException.$(lexer.lastTokenPosition(), "unexpected token [").put(tok).put(']');But SqlException.java:152 already has a helper for this exact pattern:
public static SqlException unexpectedToken(int position, CharSequence token) {
return position(position).put("unexpected token [").put(token).put(']');
}Should use SqlException.unexpectedToken(lexer.lastTokenPosition(), tok) instead.
3. compileSet semicolon handling: inconsistency with compileBegin/compileCommit/compileRollback
In compileSet, when the do-while loop exits on a semicolon, tok != null is true, so unparseLast() is called — pushing the ; back. In compile() (single-query mode), the trailing content check then re-fetches it and accepts it. This is correct.
However, the BEGIN/COMMIT/ROLLBACK methods have the opposite behavior — they consume the semicolon (no unparseLast when isSemicolon(tok) is true), relying on goToQueryEnd()'s initial unparseLast() to recover in batch mode. The inconsistency between how compileSet and compileBegin handle ; is confusing and fragile.
4. Stale comment in compileBatch
SqlCompilerImpl.java:465:
// We've to move lexer because some query handlers don't consume all tokens (e.g. SET )After this PR, SET now fully consumes its tokens. This comment is stale.
5. Test coverage gaps
Missing happy-path tests:
BEGIN TRANSACTION,COMMIT TRANSACTION,ROLLBACK TRANSACTION— the new TRANSACTION-keyword consumption is untestedSET TIME ZONE 'UTC'— common PG statement (will reveal bug #1 above)
Missing error-path tests:
BEGIN extra_garbage— should be rejected by trailing content check, but no test verifies thisCOMMIT extra_garbage/ROLLBACK extra_garbage— same gapCLOSE ALL extra,DISCARD ALL extra,UNLISTEN * extra— onlyRESET ALL extrais tested for no-op trailing content
6. Latent test fixes are correct
The PR fixes genuinely broken test SQL that was previously silently accepted:
AlterTableDropPartitionTest.java:917: Stray)removed —to_timestamp('2020', 'yyyy'))→to_timestamp('2020', 'yyyy')MatViewTest.java:1871: Stray;)removed —full;)→fullLimitTest.java:1224: Semicolon misused as comma —limit :lo; :hi→limit :lo, :hi
These are valuable finds.
7. Minor: Qodana exclusion scope
The addition to qodana.yaml excludes SqlCompilerImpl.java from the unused inspection at the file level. It would be better to suppress at the specific method/field level with @SuppressWarnings("unused") to avoid masking future dead code in this large file.
8. Zero-GC compliance
The new code is zero-GC safe — all token operations use existing CharSequence references from the lexer, no allocations on the data path. Looks good.
Summary
| Severity | Issue |
|---|---|
| Critical | SET TIME ZONE 'UTC' will break — compound parameter names not supported |
| Medium | Should use SqlException.unexpectedToken() helper instead of manual construction |
| Medium | Missing tests for BEGIN/COMMIT/ROLLBACK TRANSACTION and trailing garbage |
| Low | Stale comment in compileBatch about SET not consuming tokens |
| Low | Inconsistent semicolon handling between compileSet and compileBegin |
| Low | Qodana file-level exclusion could mask future issues |
| Positive | Latent syntax errors in tests correctly fixed |
| Positive | Zero-GC compliance maintained |
| Positive | Error positions verified correct for all tested cases |
- Use SqlException.unexpectedToken() helper instead of manual string construction in compile() trailing content check - Remove stale "(e.g. SET )" from compileBatch comment since SET now consumes its tokens - Make compileBegin/compileCommit/compileRollback unparse semicolons instead of consuming them, consistent with compileSet behavior - Add lenient fallback in compileSet for non-standard forms like SET TIME ZONE 'UTC' that don't use = or TO operators - Fix testTrailingContentAfterSelectRejected to exercise the post-compile trailing content validator instead of the keyword-as-identifier path - Add happy-path tests for BEGIN/COMMIT/ROLLBACK TRANSACTION and SET TIME ZONE 'UTC' - Add error-path tests for trailing content after BEGIN, COMMIT, ROLLBACK - Add trailing content tests for CLOSE ALL, DISCARD ALL, UNLISTEN * - Add tests proving CLOSE;, DISCARD;, RESET; correctly reject semicolons as arguments - Add test proving content after semicolons is rejected in single-query mode Co-Authored-By: Claude Opus 4.6 <[email protected]>
[PR Coverage check]😍 pass : 33 / 33 (100.00%) file detail
|
Fixes #6726
Summary
compile()to reject unconsumed tokens after a valid statement, instead of silently ignoring themRESET,CLOSE,UNLISTEN,DISCARD) to require their expected argument instead of silently accepting bare keywordsBEGIN,COMMIT,ROLLBACKto consume the optionalTRANSACTIONkeywordSETstatement syntax (SET [SESSION | LOCAL] name { = | TO } value [, value]*), rejecting malformed forms like missing name, invalid operator, missing value, and dangling commasTest plan
testTrailingContentAfterDdlRejected— DDL followed by trailing token is rejectedtestTrailingContentAfterNoOpRejected—RESET ALL extrais rejectedtestTrailingContentAfterSelectRejected—select...selectis rejectedtestTrailingContentAfterSetRejected—SET x = y extra_tokenis rejectedtestTrailingSemicolonAllowed— trailing;is acceptedtestTrailingTokenAbsentIsOk— no trailing content is acceptedtestCloseMissingArgRejected— bareCLOSEwithout argument is rejectedtestDiscardMissingArgRejected— bareDISCARDwithout argument is rejectedtestResetMissingArgRejected— bareRESETwithout argument is rejectedtestUnlistenMissingArgRejected— bareUNLISTENwithout argument is rejectedtestCompileSetMissingName—SETwith no name is rejectedtestCompileSetMissingOperator—SET xwith no=/TOis rejectedtestCompileSetInvalidOperator—SET x GARBAGE yis rejectedtestCompileSetMissingValue—SET x =with no value is rejectedtestCompileSetDanglingComma—SET x = y,with trailing comma is rejectedtestCompileSetWithTo—SET x TO yis acceptedtestCompileSetMultipleValues—SET x TO y, zis acceptedtestCompileSetWithSession—SET SESSION x = yis acceptedtestCompileSetWithLocal—SET LOCAL x = yis acceptedtestCompileSetQuotedValue—SET x = 'quoted'is acceptedtestCompileSet,testCompileResetDoesNothing,testCompileCloseDoesNothing,testCompileUnlistenDoesNothingstill pass🤖 Generated with Claude Code