Skip to content

Filter-push-down does not remove unneeded columns from JOIN #75152

@KochetovNicolai

Description

@KochetovNicolai

Let's compare

with 1e5 as cnt
select count() from (select number as k, sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) as v from numbers(cnt)) as l inner join (select number as k, sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) as v from numbers(cnt)) as r on l.k = r.k where l.v != 12094310375782415882 and r.v != 12094310375782415882 settings query_plan_use_new_logical_join_step=1;

   ┌─count()─┐
1. │   99999 │
   └─────────┘

1 row in set. Elapsed: 2.456 sec. Processed 200.00 thousand rows, 1.60 MB (81.44 thousand rows/s., 651.53 KB/s.)
Peak memory usage: 1.13 GiB.

and

with 1e5 as cnt
select count() from (select number as k, sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) as v from numbers(cnt) where v != 12094310375782415882) as l inner join (select number as k, sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) as v from numbers(cnt) where v != 12094310375782415882) as r on l.k = r.k settings query_plan_use_new_logical_join_step=1;

   ┌─count()─┐
1. │   99999 │
   └─────────┘

1 row in set. Elapsed: 2.075 sec. Processed 200.00 thousand rows, 1.60 MB (96.37 thousand rows/s., 770.94 KB/s.)
Peak memory usage: 1.13 GiB.

The second query is faster, even though the predicate-push-down optimization is applied.
The reason is that unneeded columns are not removed from JOIN.

:) explain header=1 with 1e5 as cnt select count() from (select number as k, sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) as v from numbers(cnt)) as l inner join (select number as k, sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) as v from numbers(cnt)) as r on l.k = r.k where l.v != 12094310375782415882 and r.v != 12094310375782415882 settings query_plan_use_new_logical_join_step=1;

EXPLAIN header = 1
WITH 100000. AS cnt
SELECT count()
FROM
(
    SELECT
        number AS k,
        sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) AS v
    FROM numbers(cnt)
) AS l
INNER JOIN
(
    SELECT
        number AS k,
        sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) AS v
    FROM numbers(cnt)
) AS r ON l.k = r.k
WHERE (l.v != 12094310375782415882) AND (r.v != 12094310375782415882)
SETTINGS query_plan_use_new_logical_join_step = 1

Query id: 8c8038ea-b266-47b5-91de-e5e775f7ead9

    ┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
 1. │ Expression ((Project names + Projection))                                                                                                          │
 2. │ Header: count() UInt64                                                                                                                             │
 3. │   Aggregating                                                                                                                                      │
 4. │   Header: count() UInt64                                                                                                                           │
 5. │     Expression ((Before GROUP BY + ))                                                                                                              │
 6. │     Empty header                                                                                                                                   │
 7. │       Expression                                                                                                                                   │
 8. │       Header: __table1.v UInt64                                                                                                                    │
 9. │               __table3.v UInt64                                                                                                                    │
10. │         Join11. │         Header: __table1.v UInt64                                                                                                                  │
12. │                 __table3.v UInt64                                                                                                                  │
13. │           Expression                                                                                                                               │
14. │           Header: __table1.k UInt64                                                                                                                │
15. │                   __table1.v UInt64                                                                                                                │
16. │             Filter (( + (Change column names to column identifiers + (Project names + (Projection + Change column names to column identifiers))))) │
17. │             Header: __table1.k UInt64                                                                                                              │
18. │                     __table1.v UInt64                                                                                                              │
19. │               ReadFromSystemNumbers                                                                                                                │
20. │               Header: number UInt64                                                                                                                │
21. │           Expression                                                                                                                               │
22. │           Header: __table3.k UInt64                                                                                                                │
23. │                   __table3.v UInt64                                                                                                                │
24. │             Filter (( + (Change column names to column identifiers + (Project names + (Projection + Change column names to column identifiers))))) │
25. │             Header: __table3.k UInt64                                                                                                              │
26. │                     __table3.v UInt64                                                                                                              │
27. │               ReadFromSystemNumbers                                                                                                                │
28. │               Header: number UInt64                                                                                                                │
    └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

28 rows in set. Elapsed: 0.020 sec. 

Note that columns __table1.v and __table3.v are not needed anymore after the optimization, but we keep it (e.g. in the hash table for hash_join).

Unneeded columns do not appear for the manually-optimized query.

:) explain header=1 with 1e5 as cnt select count() from (select number as k, sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) as v from numbers(cnt) where v != 12094310375782415882) as l inner join (select number as k, sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) as v from numbers(cnt) where v != 12094310375782415882) as r on l.k = r.k

EXPLAIN header = 1
WITH 100000. AS cnt
SELECT count()
FROM
(
    SELECT
        number AS k,
        sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) AS v
    FROM numbers(cnt)
    WHERE v != 12094310375782415882
) AS l
INNER JOIN
(
    SELECT
        number AS k,
        sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) AS v
    FROM numbers(cnt)
    WHERE v != 12094310375782415882
) AS r ON l.k = r.k

Query id: cea54d57-7cd4-488f-8597-ba6d58fd6368

    ┌─explain──────────────────────────────────────────────────────────────────────────────────────────────────────────┐
 1. │ Expression ((Project names + Projection))                                                                        │
 2. │ Header: count() UInt64                                                                                           │
 3. │   Aggregating                                                                                                    │
 4. │   Header: count() UInt64                                                                                         │
 5. │     Expression ((Before GROUP BY + Drop unused columns after JOIN))                                              │
 6. │     Empty header                                                                                                 │
 7. │       Join (JOIN FillRightFirst)                                                                                 │
 8. │       Header: __table1.k UInt64                                                                                  │
 9. │         Expression ((JOIN actions + (Change column names to column identifiers + (Project names + Projection)))) │
10. │         Header: __table1.k UInt64                                                                                │
11. │           Filter ((WHERE + Change column names to column identifiers))                                           │
12. │           Header: __table2.number UInt64                                                                         │
13. │             ReadFromSystemNumbers                                                                                │
14. │             Header: number UInt64                                                                                │
15. │         Expression ((JOIN actions + (Change column names to column identifiers + (Project names + Projection)))) │
16. │         Header: __table3.k UInt64                                                                                │
17. │           Filter ((WHERE + Change column names to column identifiers))                                           │
18. │           Header: __table4.number UInt64                                                                         │
19. │             ReadFromSystemNumbers                                                                                │
20. │             Header: number UInt64                                                                                │
    └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Note that __table1.k is also not needed in this case. We probably keep it to avoid an empty block. We can fix it as well, but it is less important.

Metadata

Metadata

Labels

comp-joinsJOINs end-to-end (planning hooks + runtime join operators/algorithms). Single bucket to avoid pla...

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions