-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Optimize boolean conditions using CNF / DNF #11749
Description
The problem
The table a more than hundred cols but only 3 of them are indexed, let call them PARTITION A ORDER BY B, C, D
The query use A B and D and another cols which is not indexed E.
And it's look like
select * from T where
A = x and (
(B = 1 and D = 2 and E = 3) or
(B = 1 and D = 3 and E = 4) /* .... repeated a lot of times */
)While the numbers of condition is small the query is quite fast (150ms), with 50 conditions the query start to take 10s and more.
Strangely if the wheres are re ordered like so :
select * from T where
A = x and (
B = 1 and (
D = 2 and (E = 3 or E=4 or E=5 /*.... repeated a lot of times */ ) OR
D = 3 and (E = 3 or E=4 or E=5 /*.... repeated a lot of times */ ) OR
/*.... repeated a lot of times */
)
)the query is sub seconds.
From the boolean point of view the filter is the same but it seems clickhouse like that a way more than the first one.
Proposed solution
Introduce optimizer(s) to convert logical expression to conjunctive normal form (CNF) or disjunctive normal form (DNF)
CNF is smth like
(c1 = 2 OR c1 = 5) AND (c2 < 5 OR c2 > 20) AND (c3 = 'abc' OR c3 = 'efg')
DNF is smth like
(c1 = 2 AND c2 = 5) OR (c2 > 5 AND c2 < 10) OR c3 = 'abc'
Any logical expression has CNF / DNF form.
In some cases, CNF or DNF may be more preferable and both have more chances to use index than non-normalized conditions.
Extra options
During those optimizations, it's may be possible to detect always true / always false sub-conditions and remove them.
Some references in context of DBs
- https://docs.actian.com/psql/psqlv13/index.html#page/sqlref%2Fsqlperf.htm%23
- https://docs.teradata.com/reader/i_VlYHwN0b8knh6AEWrv1Q/o6Bog0jlIbA13_g1yNycTA
- https://en.wikipedia.org/wiki/Category:Normal_forms_(logic)