-
Notifications
You must be signed in to change notification settings - Fork 197
Closed
Labels
type: BugSomething isn't workingSomething isn't workingtype: Performancecloudberry runs slow on some particular querycloudberry runs slow on some particular query
Description
Cloudberry Database version
main
What happened
Optimizer could produce better plan to dedup based on cost.
What you think should happen instead
No response
How to reproduce
For the planner, we need to create better plan which uses HashAggregate to dedup columns.
For example,
ebi_dwh=# explain analyze WITH UniqueIDs AS (
SELECT DISTINCT gsp_id
FROM eft_com_call_statistic
)
SELECT COUNT(*)
FROM UniqueIDs;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=0.00..1419.47 rows=1 width=8) (actual time=2510.079..2510.081 rows=1 loops=1)
-> Gather Motion 4:1 (slice1; segments: 4) (cost=0.00..1419.47 rows=1 width=8) (actual time=1998.786..2510.054 rows=4 loops=1)
-> Partial Aggregate (cost=0.00..1419.47 rows=1 width=8) (actual time=1993.802..1993.803 rows=1 loops=1)
-> HashAggregate (cost=0.00..1419.47 rows=2315005 width=1) (actual time=1294.980..1855.013 rows=2339996 loops=1)
Group Key: gsp_id
-> Seq Scan on eft_com_call_statistic (cost=0.00..969.33 rows=2375680 width=37) (actual time=0.343..339.551 rows=2377251 loops=1)
Planning Time: 5.357 ms
(slice0) Executor memory: 98354K bytes.
(slice1) Executor memory: 244665K bytes avg x 4x(0) workers, 244755K bytes max (seg2). Work_mem: 286737K bytes max.
Memory used: 2097152kB
Optimizer: Pivotal Optimizer (GPORCA)
Execution Time: 2584.618 ms
(12 rows)
This plan uses HashAggregate to perform gsp_id deduplication, while
ebi_dwh=# explain analyze select count(distinct gsp_id) from eft_com_call_statistic;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=0.00..1142.67 rows=1 width=8) (actual time=10573.416..10573.418 rows=1 loops=1)
-> Gather Motion 4:1 (slice1; segments: 4) (cost=0.00..1142.67 rows=1 width=8) (actual time=10205.645..10573.387 rows=4 loops=1)
-> Partial Aggregate (cost=0.00..1142.67 rows=1 width=8) (actual time=10437.005..10437.007 rows=1 loops=1)
-> Seq Scan on eft_com_call_statistic (cost=0.00..969.33 rows=2375680 width=37) (actual time=0.269..548.673 rows=2377251 loops=1)
Planning Time: 4.608 ms
(slice0) Executor memory: 37K bytes.
(slice1) Executor memory: 148480K bytes avg x 4x(0) workers, 148579K bytes max (seg2).
Memory used: 2097152kB
Optimizer: Pivotal Optimizer (GPORCA)
Execution Time: 10573.853 ms
(10 rows)
It seems the cost estimation is no accurate.
Operating System
No specific
Anything else
No
Are you willing to submit PR?
- Yes, I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct.
Metadata
Metadata
Labels
type: BugSomething isn't workingSomething isn't workingtype: Performancecloudberry runs slow on some particular querycloudberry runs slow on some particular query