-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Push down predicate for Multi-Join #4731
Copy link
Copy link
Closed
Labels
comp-joinsJOINs end-to-end (planning hooks + runtime join operators/algorithms). Single bucket to avoid pla...JOINs end-to-end (planning hooks + runtime join operators/algorithms). Single bucket to avoid pla...comp-query-optimizerQuery plan optimization: physical plan steps, plan-level rewrites and optimizations (QueryPlan pa...Query plan optimization: physical plan steps, plan-level rewrites and optimizations (QueryPlan pa...featureperformance
Description
Hi there,
I gave the new flat multi-join a try. When I implemented it the query time exponentially increased e.g.
Query with both joins: 3444ms
SELECT
group AS "_group",
count() as "count",
date AS "_date",
count() * AVG(valJoin1)) AS "valJoin1",
SUM(valJoin2) AS "valJoin2"
FROM `main` r
ANY LEFT JOIN (
SELECT
toStartOfHour(timestamp) AS "dateHour",
device,
country_name,
val AS "valJoin1"
FROM `joinOne`
WHERE toDate(date) BETWEEN '2019-03-08' AND '2019-03-15'
GROUP BY dateHour, device, country_name
) j1 ON toStartOfHour(r.timestamp) = j1.dateHour AND r.device = j1.device AND r.country_name = j1.country_name
ANY LEFT JOIN (
SELECT
impressionUUID,
if(sum(event) > 0, 1, 0) AS "valJoin2"
FROM `joinTwo`
WHERE (toDate(date) BETWEEN '2019-03-08' AND '2019-03-15')
GROUP BY impressionUUID
) j2 ON j2.impressionUUID = r.impressionUUID
GROUP BY _group, _dateQuery without join (400k rows in main) : 31ms
SELECT
group AS "_group",
count() as "count",
date AS "_date"
FROM `main` r
GROUP BY _group, _dateQuery with join on joinOne (8k rows in joinOne) : 73ms
SELECT
group AS "_group",
count() as "count",
date AS "_date",
count() * AVG(valJoin1)) AS "valJoin1",
FROM `main` r
ANY LEFT JOIN (
SELECT
toStartOfHour(timestamp) AS "dateHour",
device,
country_name,
val AS "valJoin1"
FROM `joinOne`
WHERE toDate(date) BETWEEN '2019-03-08' AND '2019-03-15'
GROUP BY dateHour, device, country_name
) j1 ON toStartOfHour(r.timestamp) = j1.dateHour AND r.device = j1.device AND r.country_name = j1.country_name
GROUP BY _group, _dateQuery with join on joinTwo (200k rows in joinTwo) : 178ms
SELECT
group AS "_group",
count() as "count",
date AS "_date",
SUM(valJoin2) AS "valJoin2"
FROM `main` r
ANY LEFT JOIN (
SELECT
impressionUUID,
if(sum(event) > 0, 1, 0) AS "valJoin2"
FROM `joinTwo`
WHERE (toDate(date) BETWEEN '2019-03-08' AND '2019-03-15')
GROUP BY impressionUUID
) j2 USING (impressionUUID)
GROUP BY _group, _dateReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
comp-joinsJOINs end-to-end (planning hooks + runtime join operators/algorithms). Single bucket to avoid pla...JOINs end-to-end (planning hooks + runtime join operators/algorithms). Single bucket to avoid pla...comp-query-optimizerQuery plan optimization: physical plan steps, plan-level rewrites and optimizations (QueryPlan pa...Query plan optimization: physical plan steps, plan-level rewrites and optimizations (QueryPlan pa...featureperformance