Skip to content

Fixing query options #8906

@walterddr

Description

@walterddr

This issue is created follow up #8880.

Background

Currently, Pinot SQL OPTION keyword is a REGEX match that is allow ANYWHERE in the SQL string. This causes SQLi issues.

Backward-Compatibility Issue

#8880 proposed an alternative syntax of using OPTION similar to SQL setStatement (see: https://calcite.apache.org/docs/reference.html)

However, this creates a backward-incompatible change towards the Pinot SQL syntax.

Thus we propose to hold off the syntax change until we come to a consensus on which syntax to use.

SQLi Resolution

  • fixing REGEX OPTION parser #8905 proposed a temporary solution to only allow OPTION keyword at end of SQL string.
  • also marked the REGEX syntax as deprecated (although we have no easy way to inform end-user in the query result)

Alternative Syntax Change

Note, for the syntax code segment: <, > is used to quote reserved keywords or reserved operators.

As STATEMENT

Pinot query now only allows a single STATEMENT. So there's no difference setting OPTIONS as "clause" associated with the statement, or associates the OPTION with the entire SQL statement list context.

Potentially acceptable syntax listed below:

  1. Standard SET statement
setStatement:
    <SET> identifier <=> literal <;>

identifier:
    simpleIdentifier | <">quoted.complex.identifier<">

literal:
    stringLiteral | integerLiteral | doubleLiteral | booleanLiteral

PRO: standard SET statement is default supported in Calcite, also supported by major SQL engines:

  1. Special OPTION keyword
optionStatement:
  <OPTION> <(> optionKVPair [<,> optionKVPair ]* <)> <;>

optionKVPair:
  identifier <=> literal

identifier:
    simpleIdentifier | <">quoted.complex.identifier<">

literal:
    stringLiteral | integerLiteral | doubleLiteral | booleanLiteral

PRO: similar usage of current OPTION clause.
CON: I couldn't find a commonly used SQL system that supports this statement type, not easy for users understand the syntax.

AS CLAUSE

We can also extend Calcite's syntax to support OPTION clause in SELEC statement
the syntax will be similar to:

select:
      <SELECT> [ <ALL> | <DISTINCT> ]
          { * | projectItem [<,> projectItem ]* }
      <FROM> tableExpression
      [ <WHERE> booleanExpression ]
      [ <GROUP BY> { groupItem [<,> groupItem ]* } ]
      [ <HAVING> booleanExpression ]
      [ <OPTIONS> <(> optionKVPair [<,> optionKVPair ]* <)> ]
      <;>

optionKVPair:
  identifier <=> literal

identifier:
    simpleIdentifier | <">quoted.complex.identifier<">

literal:
    stringLiteral | integerLiteral | doubleLiteral | booleanLiteral

PRO: This is almost identical to current Pinot OPTION syntax
CON: for each statement we need to extend and add OPTION syntax (e.g. INSERT, CREATE, DELETE, and other DML/DQL we add in the future); also I think requires us to alter Calcite's core syntax parsing extension template.

Closing Thoughts

  • Please comment/reply on this issue to share your understanding, and if any of the descriptions I posted is incorrect or imprecise.
  • if there's any alternative syntax not listed above. please kindly share in the comment as well. I will incorporate in the alternative syntax list.

Metadata

Metadata

Assignees

Labels

PEP-RequestPinot Enhancement Proposal request to be reviewed.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions