Cost-based choice between nested loop join and hash join

Currently, only a nested loop join is used when streams have indexed relationships. But often this is sub-optimal and causes the dependent streams to be fetched more times than actually necessary. Hash join could be a better choice in these cases. Of course, cast-based approach should be used when choosing between the possible join algorithms.

Just an example from the TPC-R test suite:

```sql
select first 10
  l_orderkey, o_orderdate, o_shippriority,
  sum(l_extendedprice * (1 - l_discount)) as revenue,
from
  customer, orders, lineitem
where
  c_mktsegment = 'BUILDING'
  and c_custkey = o_custkey
  and l_orderkey = o_orderkey
  and o_orderdate < date '1995-03-15'
  and l_shipdate > date '1995-03-15'
group by
  l_orderkey, o_orderdate, o_shippriority
order by
  2 desc, o_orderdate;

PLAN SORT (
  SORT (
    JOIN (
      CUSTOMER NATURAL,
      ORDERS INDEX (ORDERS_CUSTKEY),
      LINEITEM INDEX (LINEITEM_PK))))

Elapsed time = 1.600 sec

vs

PLAN SORT (
  SORT (
    JOIN (
      HASH (
        ORDERS INDEX (ORDERS_ORDERDATE),
        CUSTOMER NATURAL),
      LINEITEM INDEX (LINEITEM_PK))))

Elapsed time = 1.031 sec

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cost-based choice between nested loop join and hash join #7331

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Cost-based choice between nested loop join and hash join #7331

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions