-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
This issue is created follow up #8880.
Background
Currently, Pinot SQL OPTION keyword is a REGEX match that is allow ANYWHERE in the SQL string. This causes SQLi issues.
Backward-Compatibility Issue
#8880 proposed an alternative syntax of using OPTION similar to SQL setStatement (see: https://calcite.apache.org/docs/reference.html)
However, this creates a backward-incompatible change towards the Pinot SQL syntax.
- it hasn't been design-reviewed by the community;
- it changes the syntax to be a "statement" instead of a "clause" (e.g. https://docs.microsoft.com/en-us/sql/t-sql/queries/option-clause-transact-sql?view=sql-server-ver16)
Thus we propose to hold off the syntax change until we come to a consensus on which syntax to use.
SQLi Resolution
- fixing REGEX OPTION parser #8905 proposed a temporary solution to only allow
OPTIONkeyword at end of SQL string. - also marked the REGEX syntax as deprecated (although we have no easy way to inform end-user in the query result)
Alternative Syntax Change
Note, for the syntax code segment: <, > is used to quote reserved keywords or reserved operators.
As STATEMENT
Pinot query now only allows a single STATEMENT. So there's no difference setting OPTIONS as "clause" associated with the statement, or associates the OPTION with the entire SQL statement list context.
Potentially acceptable syntax listed below:
- Standard
SET statement
setStatement:
<SET> identifier <=> literal <;>
identifier:
simpleIdentifier | <">quoted.complex.identifier<">
literal:
stringLiteral | integerLiteral | doubleLiteral | booleanLiteral
PRO: standard SET statement is default supported in Calcite, also supported by major SQL engines:
- https://dev.mysql.com/doc/refman/8.0/en/set-statement.html,
- https://www.postgresql.org/docs/current/sql-set.html
CON: standard SET statement only allows a single key-value, for multiple key-value options, multiple set statement is needed
- Special
OPTIONkeyword
optionStatement:
<OPTION> <(> optionKVPair [<,> optionKVPair ]* <)> <;>
optionKVPair:
identifier <=> literal
identifier:
simpleIdentifier | <">quoted.complex.identifier<">
literal:
stringLiteral | integerLiteral | doubleLiteral | booleanLiteral
PRO: similar usage of current OPTION clause.
CON: I couldn't find a commonly used SQL system that supports this statement type, not easy for users understand the syntax.
AS CLAUSE
We can also extend Calcite's syntax to support OPTION clause in SELEC statement
the syntax will be similar to:
select:
<SELECT> [ <ALL> | <DISTINCT> ]
{ * | projectItem [<,> projectItem ]* }
<FROM> tableExpression
[ <WHERE> booleanExpression ]
[ <GROUP BY> { groupItem [<,> groupItem ]* } ]
[ <HAVING> booleanExpression ]
[ <OPTIONS> <(> optionKVPair [<,> optionKVPair ]* <)> ]
<;>
optionKVPair:
identifier <=> literal
identifier:
simpleIdentifier | <">quoted.complex.identifier<">
literal:
stringLiteral | integerLiteral | doubleLiteral | booleanLiteral
PRO: This is almost identical to current Pinot OPTION syntax
CON: for each statement we need to extend and add OPTION syntax (e.g. INSERT, CREATE, DELETE, and other DML/DQL we add in the future); also I think requires us to alter Calcite's core syntax parsing extension template.
Closing Thoughts
- Please comment/reply on this issue to share your understanding, and if any of the descriptions I posted is incorrect or imprecise.
- if there's any alternative syntax not listed above. please kindly share in the comment as well. I will incorporate in the alternative syntax list.