[Bug] Optimizer could choose HashAggregate to dedup based on cost.

### Cloudberry Database version

main

### What happened

Optimizer could produce better plan to dedup based on cost.

### What you think should happen instead

_No response_

### How to reproduce

For the planner, we need to create better plan which uses HashAggregate to dedup columns.
For example,
```
ebi_dwh=# explain analyze WITH UniqueIDs AS (
    SELECT DISTINCT gsp_id
    FROM eft_com_call_statistic
)
SELECT COUNT(*)
FROM UniqueIDs;
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=0.00..1419.47 rows=1 width=8) (actual time=2510.079..2510.081 rows=1 loops=1)
   ->  Gather Motion 4:1  (slice1; segments: 4)  (cost=0.00..1419.47 rows=1 width=8) (actual time=1998.786..2510.054 rows=4 loops=1)
         ->  Partial Aggregate  (cost=0.00..1419.47 rows=1 width=8) (actual time=1993.802..1993.803 rows=1 loops=1)
               ->  HashAggregate  (cost=0.00..1419.47 rows=2315005 width=1) (actual time=1294.980..1855.013 rows=2339996 loops=1)
                     Group Key: gsp_id
                     ->  Seq Scan on eft_com_call_statistic  (cost=0.00..969.33 rows=2375680 width=37) (actual time=0.343..339.551 rows=2377251 loops=1)
 Planning Time: 5.357 ms
   (slice0)    Executor memory: 98354K bytes.
   (slice1)    Executor memory: 244665K bytes avg x 4x(0) workers, 244755K bytes max (seg2).  Work_mem: 286737K bytes max.
 Memory used:  2097152kB
 Optimizer: Pivotal Optimizer (GPORCA)
 Execution Time: 2584.618 ms
(12 rows)

```

This plan uses HashAggregate to perform gsp_id deduplication, while

```
 ebi_dwh=# explain analyze select count(distinct gsp_id) from eft_com_call_statistic;
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=0.00..1142.67 rows=1 width=8) (actual time=10573.416..10573.418 rows=1 loops=1)
   ->  Gather Motion 4:1  (slice1; segments: 4)  (cost=0.00..1142.67 rows=1 width=8) (actual time=10205.645..10573.387 rows=4 loops=1)
         ->  Partial Aggregate  (cost=0.00..1142.67 rows=1 width=8) (actual time=10437.005..10437.007 rows=1 loops=1)
               ->  Seq Scan on eft_com_call_statistic  (cost=0.00..969.33 rows=2375680 width=37) (actual time=0.269..548.673 rows=2377251 loops=1)
 Planning Time: 4.608 ms
   (slice0)    Executor memory: 37K bytes.
   (slice1)    Executor memory: 148480K bytes avg x 4x(0) workers, 148579K bytes max (seg2).
 Memory used:  2097152kB
 Optimizer: Pivotal Optimizer (GPORCA)
 Execution Time: 10573.853 ms
(10 rows)
```

It seems the cost estimation is no accurate.

### Operating System

No specific

### Anything else

No

### Are you willing to submit PR?

- [ ] Yes, I am willing to submit a PR!

### Code of Conduct

- [X] I agree to follow this project's [Code of Conduct](https://cloudberrydb.org/community/coc).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Optimizer could choose HashAggregate to dedup based on cost. #659

Cloudberry Database version

What happened

What you think should happen instead

How to reproduce

Operating System

Anything else

Are you willing to submit PR?

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Optimizer could choose HashAggregate to dedup based on cost. #659

Description

Cloudberry Database version

What happened

What you think should happen instead

How to reproduce

Operating System

Anything else

Are you willing to submit PR?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions