Skip to content

Multi-stage intersect ignores all modifier #13126

@gortiz

Description

@gortiz

In SQL intersect may be modified with the all modifier. For example:

if table A contains a column a with values [1,1,2,3,4] and B contains a column b with values [1, 1, 2]:

select * 
from (select a from A)
intersect (select b from B)

Returns [1, 2] while

select * 
from (select a from A)
intersect ALL (select b from B)

returns [1,1,2].

Currently Pinot accepts the ALL modifier and it is shown in the explain plan, but the semantics are always the same. Specifically, Pinot semantics are the ones of intersect without all modifier.

You can verify that by running ColocatedJoinEngineQuickStart and executing:

select userUUID
from (select userUUID from userGroups)
intersect all
(select userUUID from userGroups)

with and without all.

intersect all should return the same number of rows than count() (which is 2494) while intersect should return the same number of rows than count(distinct(userUUID)) (which is 2470).

But the returned number of rows is 2470 with and without all modifier

As a short term solution we can fail if all modifier is supplied, as it would be better than returning incorrect results.

Metadata

Metadata

Assignees

Labels

beginner-taskSmall task for new contributors to ramp upmulti-stageRelated to the multi-stage query engine

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions