-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Right now in Pinot, aggregation filters like select SUM(1) FILTER (WHERE a = 1) from Table are pushed down into something like SELECT SUM(1) FROM Table WHERE a = 1. That is fine and it is even desired given it allows us to use indexes in this kind of queries.
But the same optimization is also applied to queries with group by like: select b, SUM(1) FILTER (WHERE a = 1) from Table group by b, which are translated to SELECT b, SUM(1) FROM Table WHERE a = 1. This is incorrect if there are values of b which are never assigned to an a whose value is 1. For example, assuming Table is:
a | b
-----
1 | 0
2 | 0
2 | 1
Then the expected result for select b, SUM(1) FILTER (WHERE a = 1) from Table group by b is:
b | sum
-------
0 | 1
1 | 0
but we are actually returning
b | sum
-------
0 | 1
In order to correctly follow SQL semantics, we should either:
- Disable this optimization when there are group by.
- Push the filter in some way that the rows used as keys are not filtered out if they don't match the aggregation filter.
At shorter term what we could offer is an option or hint to disable this optimization.
PS: This problem affects both single and multi stage
PS2: See https://modern-sql.com/feature/filter