-
Notifications
You must be signed in to change notification settings - Fork 8.3k
partial_merge_join algorithm lead to duplicated rows #31009
Copy link
Copy link
Closed
Labels
bugConfirmed user-visible misbehaviour in official releaseConfirmed user-visible misbehaviour in official releasecomp-joinsJOINs end-to-end (planning hooks + runtime join operators/algorithms). Single bucket to avoid pla...JOINs end-to-end (planning hooks + runtime join operators/algorithms). Single bucket to avoid pla...
Description
How to reproduce:
CREATE TABLE left
(
key UInt32,
value String
)
ENGINE = MergeTree ORDER BY key;
CREATE TABLE right
(
key UInt32,
value String
)
ENGINE = MergeTree ORDER BY tuple();
-- super secrect amount of numbers stolen from production
INSERT INTO left SELECT number, toString(number) FROM numbers(25367182);
INSERT INTO right SELECT number, toString(number) FROM numbers(23124707);
set join_algorithm='partial_merge_join';
SELECT
key,
count(1) AS cnt
FROM
(
SELECT data.key
FROM
(
SELECT key
FROM left AS s
) AS data
LEFT JOIN
(
SELECT key
FROM right
GROUP BY key
) AS promo ON promo.key = data.key
)
GROUP BY key
HAVING count(1) > 1
ORDER BY key ASC
LIMIT 10
Query id: 7d5f5792-f9f5-4766-b24d-649ec470d2ff
┌─────key─┬─cnt─┐
│ 458752 │ 2 │
│ 1835008 │ 2 │
│ 3670016 │ 2 │
│ 4128768 │ 2 │
│ 4587520 │ 2 │
│ 5046272 │ 2 │
│ 5505024 │ 2 │
│ 6487878 │ 2 │
│ 6946630 │ 2 │
│ 7405258 │ 2 │
└─────────┴─────┘
Obviously SELECT query must return nothing because we don't have duplicated values in both tables.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugConfirmed user-visible misbehaviour in official releaseConfirmed user-visible misbehaviour in official releasecomp-joinsJOINs end-to-end (planning hooks + runtime join operators/algorithms). Single bucket to avoid pla...JOINs end-to-end (planning hooks + runtime join operators/algorithms). Single bucket to avoid pla...