Skip to content

Push down predicate for Multi-Join #4731

@Slind14

Description

@Slind14

Hi there,

I gave the new flat multi-join a try. When I implemented it the query time exponentially increased e.g.

Query with both joins: 3444ms

SELECT 
    group AS "_group",
    count() as "count",
    date AS "_date", 
    count() * AVG(valJoin1)) AS "valJoin1", 
    SUM(valJoin2) AS "valJoin2" 
FROM `main` r 
ANY LEFT JOIN (
    SELECT 
        toStartOfHour(timestamp) AS "dateHour", 
        device, 
        country_name, 
        val AS "valJoin1" 
    FROM `joinOne` 
    WHERE toDate(date) BETWEEN '2019-03-08' AND '2019-03-15'
    GROUP BY dateHour, device, country_name
) j1 ON toStartOfHour(r.timestamp) = j1.dateHour AND r.device = j1.device AND r.country_name = j1.country_name 
ANY LEFT JOIN (
    SELECT 
        impressionUUID, 
        if(sum(event) > 0, 1, 0) AS "valJoin2" 
    FROM `joinTwo` 
    WHERE (toDate(date) BETWEEN '2019-03-08' AND '2019-03-15') 
    GROUP BY impressionUUID
) j2 ON j2.impressionUUID = r.impressionUUID 
GROUP BY _group, _date

Query without join (400k rows in main) : 31ms

SELECT 
    group AS "_group",
    count() as "count",
    date AS "_date"
FROM `main` r
GROUP BY _group, _date

Query with join on joinOne (8k rows in joinOne) : 73ms

SELECT 
    group AS "_group",
    count() as "count",
    date AS "_date", 
    count() * AVG(valJoin1)) AS "valJoin1",
FROM `main` r 
ANY LEFT JOIN (
    SELECT 
        toStartOfHour(timestamp) AS "dateHour", 
        device, 
        country_name, 
        val AS "valJoin1" 
    FROM `joinOne` 
    WHERE toDate(date) BETWEEN '2019-03-08' AND '2019-03-15'
    GROUP BY dateHour, device, country_name
) j1 ON toStartOfHour(r.timestamp) = j1.dateHour AND r.device = j1.device AND r.country_name = j1.country_name
GROUP BY _group, _date

Query with join on joinTwo (200k rows in joinTwo) : 178ms

SELECT 
    group AS "_group",
    count() as "count",
    date AS "_date", 
    SUM(valJoin2) AS "valJoin2" 
FROM `main` r
ANY LEFT JOIN (
    SELECT 
        impressionUUID, 
        if(sum(event) > 0, 1, 0) AS "valJoin2" 
    FROM `joinTwo` 
    WHERE (toDate(date) BETWEEN '2019-03-08' AND '2019-03-15') 
    GROUP BY impressionUUID
) j2 USING (impressionUUID)
GROUP BY _group, _date

Metadata

Metadata

Assignees

Labels

comp-joinsJOINs end-to-end (planning hooks + runtime join operators/algorithms). Single bucket to avoid pla...comp-query-optimizerQuery plan optimization: physical plan steps, plan-level rewrites and optimizations (QueryPlan pa...featureperformance

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions