Currently, only a nested loop join is used when streams have indexed relationships. But often this is sub-optimal and causes the dependent streams to be fetched more times than actually necessary. Hash join could be a better choice in these cases. Of course, cast-based approach should be used when choosing between the possible join algorithms.
Just an example from the TPC-R test suite:
select first 10
l_orderkey, o_orderdate, o_shippriority,
sum(l_extendedprice * (1 - l_discount)) as revenue,
from
customer, orders, lineitem
where
c_mktsegment = 'BUILDING'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate < date '1995-03-15'
and l_shipdate > date '1995-03-15'
group by
l_orderkey, o_orderdate, o_shippriority
order by
2 desc, o_orderdate;
PLAN SORT (
SORT (
JOIN (
CUSTOMER NATURAL,
ORDERS INDEX (ORDERS_CUSTKEY),
LINEITEM INDEX (LINEITEM_PK))))
Elapsed time = 1.600 sec
vs
PLAN SORT (
SORT (
JOIN (
HASH (
ORDERS INDEX (ORDERS_ORDERDATE),
CUSTOMER NATURAL),
LINEITEM INDEX (LINEITEM_PK))))
Elapsed time = 1.031 sec
Currently, only a nested loop join is used when streams have indexed relationships. But often this is sub-optimal and causes the dependent streams to be fetched more times than actually necessary. Hash join could be a better choice in these cases. Of course, cast-based approach should be used when choosing between the possible join algorithms.
Just an example from the TPC-R test suite: