-
Notifications
You must be signed in to change notification settings - Fork 197
Closed
Labels
type: EnhancementNew feature or request, ideasNew feature or request, ideas
Description
Cloudberry Database version
No response
What happened
TPCH SQL16 plan shows degradation of performance . One reason: the not in part isn't parallel.
select
p_brand,
p_type,
p_size,
count(distinct ps_suppkey) as supplier_cnt
from
partsupp,
part
where
p_partkey = ps_partkey
and p_brand <> 'Brand#13'
and p_type not like 'MEDIUM POLISHED%'
and p_size in (40, 23, 12, 20, 48, 24, 21, 29)
and ps_suppkey not in (
select
s_suppkey
from
supplier
where
s_comment like '%Customer%Complaints%'
)
group by
p_brand,
p_type,
p_size
order by
supplier_cnt desc,
p_brand,
p_type,
p_size;
optimizer = off
enable_parallel = on
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 6:1 (slice1; segments: 6) (cost=60277.13..60277.37 rows=20 width=44)
Merge Key: (count(DISTINCT partsupp.ps_suppkey)), part.p_brand, part.p_type, part.p_size
-> Sort (cost=60277.13..60277.13 rows=3 width=44)
Sort Key: (count(DISTINCT partsupp.ps_suppkey)) DESC, part.p_brand, part.p_type, part.p_size
-> GroupAggregate (cost=60277.02..60277.10 rows=3 width=44)
Group Key: part.p_brand, part.p_type, part.p_size
-> Sort (cost=60277.02..60277.03 rows=3 width=40)
Sort Key: part.p_brand, part.p_type, part.p_size
-> Redistribute Motion 6:6 (slice2; segments: 6) (cost=48369.17..60276.99 rows=3 width=40)
Hash Key: part.p_brand, part.p_type, part.p_size
Hash Module: 3
-> Hash Join (cost=48369.17..60276.92 rows=3 width=40)
Hash Cond: (part.p_partkey = partsupp.ps_partkey)
-> Parallel Seq Scan on part (cost=0.00..11718.33 rows=50493 width=40)
Filter: ((p_brand <> 'Brand#13'::bpchar) AND ((p_type)::text !~~ 'MEDIUM POLISHED%'::text) AND (p_size = ANY ('{40,23,12,20,48,24,21,29}'::integer[])))
-> Hash (cost=48368.89..48368.89 rows=23 width=8)
-> Broadcast Motion 3:6 (slice3; segments: 3) (cost=600.93..48368.89 rows=23 width=8)
-> Hash Left Anti Semi (Not-In) Join (cost=600.93..48368.28 rows=8 width=8)
Hash Cond: (partsupp.ps_suppkey = supplier.s_suppkey)
-> Seq Scan on partsupp (cost=0.00..41100.55 rows=2666455 width=8)
-> Hash (cost=600.80..600.80 rows=10 width=4)
-> Broadcast Motion 3:3 (slice4; segments: 3) (cost=0.00..600.80 rows=10 width=4)
-> Seq Scan on supplier (cost=0.00..600.67 rows=3 width=4)
Filter: ((s_comment)::text ~~ '%Customer%Complaints%'::text)What you think should happen instead
For parallel-aware NOT IN HashJoin, it's hard to tell null values in which batches.
But the parallel-oblivious NOT IN HashJoin is possible as we will build all inner tables first.
Parallel-oblivious Hash Join should be enabled.
How to reproduce
Run TPCH16 SQL or just a NOT IN sql.
Operating System
ubuntu
Anything else
No response
Are you willing to submit PR?
- Yes, I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct.
Metadata
Metadata
Assignees
Labels
type: EnhancementNew feature or request, ideasNew feature or request, ideas