0% found this document useful (0 votes)
25 views78 pages

Oracle DBA

The document provides an extensive overview of Oracle 8 Optimizer hints, detailing various hints that can be used to influence the execution path of SQL statements. It includes descriptions of specific hints such as ALL_ROWS, AND-EQUAL, APPEND, CACHE, and CHOOSE, along with examples of their usage. Additionally, it outlines SQL tuning tips and best practices for optimizing query performance.

Uploaded by

smtocommerce
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views78 pages

Oracle DBA

The document provides an extensive overview of Oracle 8 Optimizer hints, detailing various hints that can be used to influence the execution path of SQL statements. It includes descriptions of specific hints such as ALL_ROWS, AND-EQUAL, APPEND, CACHE, and CHOOSE, along with examples of their usage. Additionally, it outlines SQL tuning tips and best practices for optimizing query performance.

Uploaded by

smtocommerce
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1 ORACLE 8 OPTIMIZ

OPTIMIZER
ER HINTS...................................................
................................................................................
...............................3
1.1 USING HINTS .................................................................................................................3
1.2 ALL_ROWS...............................................
...........................................................................................
....................................................
...............
...........
.... 3
1.3 AND-EQUAL.....................................................
................................................................................................
..................................................
............
..... 4
1.4 APPEND AND NOAPPENNOAPPEND D......
...........
...........
...........
...........
............
...........
...........
...........
...........
............
...........
...........
..........
........
.......
......
... 4
1.5 CACHE......................................................
........................................................................................................
..........................................................
..............
...... 4
1.6 CHOOSE........................................................
...............................................................................................................
............................................................
.....5
1.7 CLUSTER................................................
........................................................................................................
..................................................................
.......... 5
1.8 FIRST_ROWS......................................................
............................................................................................
..............................................
...............
....... 5
1.9 FULL....................................................
............................................................................................................
......................................................................
..............5
1.10 HASH ...................................................
...........................................................................................................
....................................................................
............ 6
1.11 HASH_AJ.................................................
.............................................................................................
....................................................
...............
...........
.... 6
1.12 INDEX.......................................................
.....................................................................................................
.....................................................
...............
..........6
1.13 INDEX_ASC........................................................
.......................................................................................
.......................................
...............
............
..... 7
1.14 INDEX_COMBINE................................................
..............................................................................................
.................................................
...7
1.15 INDEX_DESC................................................
................................................................................................
.......................................................
.........
..7
1.16 INDEX_FFS.................................................
....................................................................................
..........................................
..............
...............
..........
..7
1.17 MERGE_AJ........................................................
..............................................................................................
..............................................
...............
....... 7
1.18 NO_MERGE....................................................
.....................................................................................................
........................................................
....... 8
1.19 NOCACHE...............................................
...........................................................................................
....................................................
...............
...........
.... 8
1.20 NOPARALLEL....................................................
...................................................................................
.......................................
...............
............
..... 8
1.21 ORDERED .....................................................
...........................................................................................................
..........................................................
.... 8
1.22 PARALLEL........................................................
..............................................................................................
..............................................
...............
....... 8
1.23 PUSH_SUBQ
PUSH_S UBQ.................................................
........................................................................................................
..........................................................
...9
1.24 ROWID....................................................
......................................................................................................
..........................................................
..............
...... 9
1.25 RULE................................................
......................................................................................................
.............................................................
..............
..........
...9
1.26 STAR.................................................
.................................................................................................
.......................................................
..............
...............
........ 9
1.27 USE_CONCAT.....................................................
........................................................................................................
................................................... 9
1.28 USE_HASH........................................................
..............................................................................................
..............................................
...............
....... 9
1.29 USE_MERGE...................................................
.............................................................................................
.................................................
...........
.... 10
1.30 USE_NL ...............................................
.......................................................................................................
..................................................................
.......... 10

2 TOP 13 SQL TUNING TIPS TIPS.................................................


.................................................................
........................
...............
..............
....... 10
2.1 THE OBJECTIVE OF THE SQL TUNING TIPS .....................................................................11
2.2 THE 13 TIPS ...............................................................................................................11
2.3 AVOID UNPLANNE
NPLANNED D FULL TABLE SCANS....................................................
................................................................
....................
..........11
2.4 WHEN A FULL TABLE SCAN IS USED .............................................................................12
2.5 HOW TO MAKE SURE A QUERY CAN USE AN INDEX.............................. ......................................
...............
...............
...........
... 13
2.6 ISSUES ENCOUNTERED WHEN CREATING INDEXES .............................................................14
2.7 USE ONLY SELECTIVE INDEXES ......................................................................................14
2.8 MEASURING INDEX SELECTIVITY ....................................................................................14
2.9 CHOOSING A LEADING COLUMN FOR A CONCATENATED INDEX..... ...........
...........
..........
.........
.......
.......
........
.......
.......
.... 16
2.10 MANAGE MULTI-TABLE JOINS (NESTED LOOPS, MERGE JOINS AND HA HAS SH
JOINS)..............................................................................................................
......................................................................... .............................................
..............
......18
2.11 TEPS NVOLVED IN MERGE JOINS.....................................................
2.12 S
TUNINGI IMPLICATIONS ..............................................................
................
............
..... 18
FOR MERGE JOINS ................................................................19
2.13 HOW TO DESIGNATE A TEMPORARY-ONLY TABLESPACE ..................................................21
2.14 STEPS INVOLVED IN NESTED LOOPS...............................................
......................................................
...............
...............
.........21
2.15 IMPLICATIONS OF THE DRIVING TABLE IN A NESTED LOOPS JOIN ...............................23
2.16 HOW TO INFLUENCE THE JOIN PATH ..............................................................................24
2.17 WHAT HAPPENS WHEN YOU START FROM NONSELECTIVE CRITERIA..... .........
........
.......
.......
........
.......
.......
......25
2.18 WHAT HAPPENS WHEN YOU ADD MORE TABLES TO THE JOIN ........................................26
2.19 HOW TO INDEX TABLES WITH MANY-TO-MANY RELATIONSHIPS............... ......................
...............
...............
.........28
2.20 HASH JOINS ...............................................................................................................31
2.21 MANAGEMENT ISSUES FOR HASH JOINS...............................................
..............................................................
......................
...........
.... 33
2.22 MANAGING JOINS .......................................................................................................39
2.23 MANAGING SQL STATEMENTS CONTAINING VIEWS ....... ..............
.............
...................
...................................
......................39
2.24 IMPROVING INTEGRATI
NTEGRATION ON OF VIEWS INTO QUERIES ..........................................................40
2.25 FORCING VIEWS TO REMAIN SEPARATE .........................................................................42
2.26 TUNE SUBQUERIES ......................................................................................................45
2.27 WHEN SUBQUERIES ARE RESOLVED................................................
..........................................................
.................
..............
............
..... 45
2.28 HINTS FOR SUBQUERIES THAT RETURN THE MAXIMUM VALUE....... .............................
.................................
...........46
2.29 HOW TO COMBINE SUBQUERIES ....................................................................................48
2.30 HOW TO PERFORM EXISTENCE CHECKS.................................................
...........................................................
..................
..............
...... 49
2.31 USE COMPOSITE KEYS/STAR QUERIES ..........................................................................51
2.32 HOW TO CREATE A STAR SCHEMA ...............................................................................52
2.33 QUERYING THE START SCHEMA ....................................................................................53
2.34 THE TRADITIONAL EXECUTION PATH.....................................................
............................................................
..............
...............
..........54
2.35 HOW TO CREATE A STAR QUERY EXECUTION PATH ........................................................55
2.36 MANAGEMENT ISSUES FOR STAR QUERIES .....................................................................56
2.37 PROPERLY INDEX CONNECT BY OPERATIONS...... ...........
..........
...........
...........
...........
...........
...........
............
..........
.......
.....
.. 58
2.38 LIMIT REMOTE TABLE ACCESSES .................................................................................61
2.39 MANAGE VERY LARGE TABLE ACCESSES ......................................................................63
2.40 THE PROBLEM.....................................................
.....................................................................................
.......................................
...............
...............
....... 63
2.41 MANAGE DATA PROXIMITY...............................................
.............................................................................
.....................................
............
..... 64
2.42 AVOID UNHELPFUL INDEX SCANS..................................................
.............................................................................
................................... 64
2.43 CREATE FULLY INDEXED TABLES .................................................................................67
2.44 CREATE HASH CLUSTERS ............................................................................................68
2.45 F QUIVALENCE QUERIES ARE USED.................................................
2.46 IIF E
THERE IS NO WAY TO ENFORCE R................................................................
OW PROXIMITY.............................
......................
....................................
.............. ...........
...................
.. 68
.......... 68
2.47 IF SPACE ALLOCATIONS IS NOT A PROBLEM ..................................................................69
2.48 CREATE PARTITIONED TABLES .....................................................................................70
2.49 IMPLEMENT THE PARALLEL OPTIONS.................................................
.............................................................
...................
...............
..........72
2.50 USE UNION ALL RATHER THAN UNION...... ...........
...........
............
...........
...........
...........
.........
........
........
.......
.......
........
......
.. 72
2.51 AVOID USING PL/SQL FUNCTION CALLS IN SQL.........................................
.................................................
...............
....... 74
2.52 MANAGE THE USE OF BIND VARIABLES ........................................................................75
2.53 REVISIT THE TUNING PROCESS......................................................
....................................................................
.....................
...............
..........
.. 77
1 Or
Orac
acle
le 8 Opt
Optim
imiz
izer
er Hi
Hint
ntss

1.1
1. 1 Us
Using
ing Hin
Hints
ts

You can use hints to alter the execution path chosen by the cost-based optimizer. Hints
are embedd
embedded
ed within
within SQL statement
statements,
s, and only
only modif
modifyy the execution
execution path
path for the
statements in which they appear. The start of a hint is indicated by the string.

Following a SELECT, DELETE, or UPDATE. A hint’s text is terminated by

The syntax for hints is almost identical to that of comments within SQL – the only
difference is the addition of the “+” sign to indicate the start of the hint. The “+” cannot
be preceded by a space.

NOTE

If a hint is specified incorrectly, the optimizer will ignore it and will not report an error.

In the following
following sections,
sections, you’ll
you’ll see descriptions
descriptions of the available
available hints and examples
examples of
their usage. Many of the hints shown in the following sections are referenced in Chapter
11. The operations referenced by the hints are described in the “Operations” section
earlier in this chapter.

NOTE

Many of the hints allow you to list specific table names. If you use table aliases in your
query, refer to the tables by their aliases within the hints.

1.2
1. 2 A
ALL_
LL_ROW
ROWSS

The ALL_ROWS hint tells Oracle to optimize


op timize the query for the best throughput-that is, to
minimize the time it takes for all rows to be returned by the query. This is the default
action of the Oracle optimizer, and is appropriate for batch operations. An example is
shown in the following listing.

select /*+ ALL_ROWS */


COMPANY.Name The example show in the previous
from COMPANY, SALES listi
sting wou
oulld no
norm
rmal
allly us
usee a
where COMPANY.Company_ID = SALES.Company_ID
SALES.Company_ID NESTED LOOPS operation during
and SALES.Period_
SALES.Period_ID
ID =3
and SALES.Sales_Tc
SALES.Sales_Tctal>1000;
tal>1000;
the query’
query’ss execut
execution
ion (s
(see
ee the “NESTE
“NESTEDD LOOPS”
LOOPS” sectio
section
n ear
earlie
lierr in this
this chapte
chapter)
r)..
ALL_ROWS will force the optimizer to consider using a MERGE JOIN instead.

NOTE

If you specify ALL_ROWS, you must have previously analyzed the tables used by the
query. If you have not done this, the optimizer estimates missing statistics and uses them
while choosing an execution path.

1.3
1. 3 AN
AND-E
D-EQUA
QUALL

The AND-EQUAL hint tells the optimizer to perform an AND-EQUAL operation on the
indexes listed within the hint. An example of this hint is shown in the following listing.

select /*+ AND-EQUAL COMPANY$CITY, COMPANY$STATE */ 1.4 APPEND


APPEND and
Name, City, State
from COMPANY
where City = ‘Roanoke’
and State = ‘VA’ ;

NOAPPEND

The APPEND and NOAPPEND hints, available as of Oracle8, can only be used in an
insert statement. The APPEND hint specifies that data is simply appended at the end of
the table. The functionality of the APPEND hint for inserts is similar to the SQL*Loader
Direct Path loader and may increase the space usage in your tables. Existing free space in
the table is not used; the loading begins after the table’s highwatermark. The default is
NOAPPEND.

1.5
1. 5 C
CAC
ACHE
HE

The CACHE hint, when used for a table in a query, tells Oracle to treat the table as a
cached table. That is, CACHE tells Oracle to keep the blocks from the full table scan of a
table in the SGA’s data block buffer cache area, instead of quickly removing them from
the SGA. Caching is particularly useful for small, frequently used tables. The CACHE
hint is useful if a noncached table will be frequently scanned by separate parts of a query.
An example of this hint is shown in the following listing. In the example, the blocks read
from the full table scan of the COMPETITOR table (see the FULL hint later in this
section) will be marked as “most recently used”, so they will stay in the data block buffer
cache longer than if the CACHE hint was not used.

Select /*+ FULL(competitor)


FULL(competitor) CACHE(competitor)
CACHE(competitor) */ *
from COMPETITOR
where Company_ID > 5;
1.6
1. 6 CH
CHOO
OOSE
SE

The CHOOSE hint tells the optimizer to choose between cost-based and rule-based
optimization for the query. If the tables used by the query have been analyzed the
optimi
optimizer
zer will
will use cost-
cost-bas
based
ed optimi
optimizati
zation.
on. If no stati
statisti
stics
cs are availab
available,
le, rule-b
rule-base
ased
d
optimization will be used. Applications that use a mix of the two should generally use
cost-based optimization for the majority of the transactions, while using the RULE hint to
specific rule-based optimization for specific queries. If you have set up of the cost-based
optimizer properly, the CHOOSE hint is unnecessary. The CHOOSE hint is available as
of Oracle 7.2

1.7
1. 7 CLU
CLUST
STER
ER

The CLUSTER hint tells the optimizer to use a TABLE ACCESS CLUSTER operation.
The syntax for the CLUSTER hint is

/*+ CLUSTER(table) */

See the “TABLE ACCESS CLUSTER” operation description earlier in this chapter for
information in cluster accesses.

1.8
1. 8 FIR
FIRST_
ST_ROW
ROWSS

The FIRST_ROWS hint is the logical opposite of the ALL_ROWS hint: it tells the
operator to optimize the query with the goal of the shortest response time for the return of
the first row from the query. FIRST_ROWS is ignored if set-based operations are used. If
indexes or a NESTED LOOPS join is available, the optimizer will usually choose to use
them. An example of this hint
h int is shown in the following listing.

select /*+ FIRST_ROWS */


Company.Name NOTE
from COMPANY, SALES
where COMPANY.Company_ID = Sales.Company_ID If you specify FIRST_ROWS, you
and SALES.Period_ID = 3 mustt have previo
mus previousl
usly
y analyz
analyzed
ed the
and SALES.Sal
SALES.Sales>1000;
es>1000;
tabled used by the query. If you have
not done so, the optimizer estimates
missing statistics and uses them while choosing an execution path.

1.9
1. 9 FU
FULL
LL
The FULL hint tells the optimizer to perform a TABLE ACCESS FULL operation on the
specified table. You may wish to use this hint if you know that the index normally used
by the query would be a poor choice given the data distribution
d istribution

For example, consider the query shown in the following listing. Under normal data
distribution, an AND-EQUAL of the City and State indexes would be appropriate for
good performance.

Select Name, City, State What if 90 percent of the records in the table have a City value
from COMPANY of ‘Roanoke’ and a State value of ‘VA’? If you use the indexes
where City = ‘Roanoke’ to access the ‘Roanoke’ records, you will have to read almost
and State = ‘VA’; all of the index blocks and table blocks to satisfy the query. In
that case, it would require fewer logical reads to simplify scan
the COMPANY table (via a TABLE ACCESS FULL operation). You can force a
TABLE ACCESS FULL to occur by disabling the indexes or by using the FULL hint, as
shown in the following listing.

Select /*+ FULL(COMPANY)


Name, City, State */ 1.
1.10
10 HASH
HA SH
from COMPANY
where City = ‘Roanoke’ The HASH
The HASH hi
hint
nt tell
tellss the
the opti
optimi
mizer
zer to us
usee a TABL
TABLE
E
and State
State = ‘VA’; ACCESS HASH operation. The syntax for the HASH hint
is

/*+ HASH(table) */

See the “TABLE


“TABLE ACCESS
ACCESS HASH”
HASH” operat
operation
ion descri
descripti
ption
on earlie
earlierr in this
this chapter
chapter for
information on hash cluster accesses.

1.11
1. 11 HASH_A
HAS H_AJ
J

The HASH_AJ hint specifies the type of join to use during an antijoin in Oracle 7.3.

1.12
1. 12 IIND
NDEX
EX

The INDEX hint can be used in three different ways:

 If a single index is listed, that index will be used.


 If multiple indexes are listed, the optimizer will choose which indexes to use.
 If a table is listed, but no indexes are listed, the optimizer will choose an index or
indexes to use for that table.
Given the query, in the following listing, the optimizer may choose to use the index on
City, the index on State, or both, as appropriate.

select /*+ INDEX(COMPANY) */ 1.13


1.1 3 INDEX_
IND EX_ASC
ASC
Name, City, State
from COMPANY The INDEX_ASC hint is the same as the INDEX hint.
where City = ‘Roanoke’
and State = ‘VA’ ; 1.14 INDEX_COMB
INDEX _COMBINE
INE
The INDEX_COMBINE hint prompts the optimizer to use Boolean logic to combine the
particular bitmap indexes in the hint. As shown earlier in this chapter, bitmap
b itmap indexes can
be combined using bitwise OR, AND, or AND NOT. If no indexes are given as
argume
arguments
nts for the INDEX_
INDEX_COM COMBIN
BINE E hint,
hint, the optim
optimize
izerr uses
uses whateve
whateverr Boolea
Booleann
combination of bitmap indexes has the best cost estimate. The following listing shows an
example of the INDEX_COMBINE hint.

select /*+INDEX_COMBINE(COMPANY COMPANY$STATE,COMPANY$CITY) */


Name, City, State
from COMPANY
where City like ‘New%’
or State in (‘NY’, ‘NJ’, ‘CT’) ;

1.15
1.1 5 INDEX_DE
INDE X_DESC
SC

The INDEX_DESC hint tells the optimizer to scan an index in descending order of the
indexed values. INDEX_DESC is most commonly used with subqueries.

1.16
1. 16 IN
INDEX
DEX_F
_FFS
FS

The INDEX_FFS hint causes a fast full index scan to be performed in place of a TABLE
ACCESS FULL operation. The difference between a ‘fast full index scan’ and a regular
index range scan is that just like in full table scan, index blocks read from disk are placed
at the top of the LRU (Least Recently Used) block list. This ensures that a large index
range scan will not override the entire buffer cache. The advantage of a fast full index
scan over a regular full table scan is that the data is produced in the order of the index.

The following listing shows an example of the usage of the INDEX_FFS hint. The query
has no limiting conditions, so a full table scan would normally be performed. The hint,
however, tells the optimizer to use a specific index during the execution.

select /*+ INDEX_FFS (SALES SALES$COMPANY_ID) */ 1.17


1.1 7 MERGE
ME RGE_AJ
_AJ
Company_ID, Sales_Total

from
orderSales
by Company_ID;
The MERGE_AJ hint specifies the type of join to use during an antijoin as of Oracle 7.3

1.18
1. 18 N
NO_M
O_MER
ERGE
GE

The NO_MERGE hint, available as of Oracle 7.3, tells the optimizer to not merge a
view’s SQL syntax with the syntax of a query that uses the view.

1.19
1. 19 NOCACH
NOC ACHE
E

The NOCACHE hint specifies that blocks retrieved by a full table scan are not to be
cached within the SGA’s data block buffer cache. The blocks will be marked as “least
recently used”, and will be the first ones removed from the SGA when additional space is
needed in the cache. In the following example, the blocks from the COMPETITOR table
will be marked as “least
“least recently used” within the cache (this is the default behavior for
noncached tables)

select /*+
/*+ FU
FULL(competitor)
LL(competitor) NOCACHE(competitor)
NOCACHE(competitor) */ * 1.20
1.2 0 NOPARAL
NOPA RALLEL
LEL
from COMPETITOR
where Company_ID > 5;
The NOPARA
NOPARALLE LLELL hint
hint allows
allows
you to disable parallelism for a query, even if the tables used by the query have been
defined to have a default degree of parallelism.

1.21
1. 21 ORDERE
ORD ERED
D

The ORDERED hint, when used with NESTED LOOPS joins, influences the order in
which tables are joined – that is, the structure of the loops used by the join. If you use this
hint, you need to be certain that the relative distribution of values within the joined tables
will not change dramatically over time; otherwise, the specified join order may cause
performance problems in the future.

1.22
1. 22 P
PARA
ARALLE
LLELL

The PARALLEL hint allows you to specify a degree of parallelism for a table in a query.
The degree of parallelism
parallelism specified
specified by the PARALLEL
PARALLEL hint will override
override any degree of
parallelism already defined at the table level.

delete, insert, and update operations are only considered for parallelization if the session
is in a PARALL
PARALLEL EL DML-en
DML-enabl
abled
ed mode
mode (use
(use alte
alterr sess
session
ion parall el dml to enter this
parallel
mode).
1.23
1. 23 PU
PUSH_
SH_SUB
SUBQQ

The PUSH_SUBQ hint causes nonmerged subqueries and views to be evaluated at the
earliest possible place in the execution plan. Nonmerged subqueries typically use a group
operati
oper ation
on (such
(such as group by) or a group
group funct
functio
ion
n (suc
(such
h as SUM). The use of group
operation forces the subquery to be performed separated from the main query (see the
VIEW plan operation earlier in this chapter). See the sections on managing the use of
views and subqueries for further details.

1.24
1. 24 R
ROW
OWID
ID
The ROWID hint tells the optimizer to use a TABLE ACCESS BY ROWID operation.
The syntax for the ROWID hint is

/*+ ROWID (table) */

1.25
1. 25 RULE
RU LE

The RULE hint tells the optimizer to use rule-based optimization for the query. All other
hints within the query will be ignored. The syntax is shown in the following listing.

select /*+ RULE */ 1.


1.26
26 STAR
STAR
COMPANY.Name
from COMPANY, SALES
where COMPANY.Company_ID = Sales.Company_ID The STAR
The STAR hint hint,, avai
availlable
able as of
and Sales.Period_ID = 3 Oracle 7.3 tells the optimizer to use a
and Sales_Total>1000;
composite
compos ite key/star
key/star query execution
path when resolving a join. Star
query execution paths are typically used by data warehousing applications that feature a
masterr table with many small
maste small detail
detail tables.
tables. See the “Use Composite
Composite Keys/Star
Keys/Star Queries”
Queries”
section for a detailed discussion of the use of star query execution paths.

1.27
1.2 7 USE_CONC
USE_ CONCAT
AT

The USE_CONCAT hint forces conditions that are combined via or clauses in the where
cl
clau
ause
se to be tr
trat
ated
ed as a comp
compou
ound
nd qu
quer
ery
y using
sing the
the UNI
UNION-
ON-ALL
ALL op oper
erat
atiion
on..
USE_CONCAT is available as of Oracle 7.2

1.28
1. 28 U
USE_
SE_HAS
HASHH

The USE_HASH hint, available as of Oracle 7.3, tells the optimizer to perform a hash
join.
1.29
1. 29 US
USE_
E_MER
MERGE
GE

The USE_MERGE hint is the logical opposite


opposite of the USE_NL hint. USE_MERGE
USE_MERGE tells
the optimizer to use merge join between specific tables – and thus is a more detailed hint
than ALL_ROWS. For the query in the following listing, the COMPANY table and the
SALES table will be joined via a MERGE JOIN operation. USE_MERGE is useful when
you are joining three or more tables and need to specify a MERGE JOIN operation for
specific sets of tables within the query.

select /*+ USE_MERGE (COMPANY, SALES) */


COMPANY.Name 1.30
1.3 0 USE_NL
USE _NL
from COMPANY, SALES
where COMPANY.Company_ID = SALES.Company_ID
SALES.Company_ID
and SALES.Peri
SALES.Period_ID
od_ID = 3 The USE_NL hint tells the
and SALES.Sal
SALES.Sales_Total>1000;
es_Total>1000;
optimi
optimizer
zer to perfor
perform
m a NESTED
NESTED
LOOPS
LOO PS join,
join, using
using the specif
specified
ied
table as the driving table in the join. USE_NL is more specific than the FIRST_ROWS
hint, since FIRST_ROWS may lead to a NESTED LOOPS join being used, but USE_NL
lets you influence the join order in the following listing, the COMPANY table will be
used as the driving table in a NESTED LOOPS operation.

select /*+ USE_NL (COMPANY) */


COMPANY.Name
from COMPANY, SALES
where COMPANY.Company_ID = SALES.Company_ID
SALES.Company_ID
and SALES.P
SALES.Period_ID
eriod_ID = 3
and SALES.S
SALES.Sales_Total
ales_Total > 1000;

2 To
Top
p113
3SSQL
QL Tu
Tuni
ning
ng Ti
Tips
ps
Most of the time, the execution path selected by the Oracle optimizer is the best possible
path to use when resolving a query. In about 5 to 10 percent of the cases, however, a
better execution path is available. When your tables are small, the performance impact of
using a nonoptimal execution path may be minimal. As your tables grow in size, the
performance impact of nonoptimized queries can grow exponentially.

As the performance of a single query rapidly worsens, it consumes resources that should
be available to the rest of the appliction – and soon, the entire system’s performance
deteriorates. To avoid this scenario, you need to actively manage the performance of the
queries within your application. In this chapter you will see the tuning tips that are most
frequently used when improving the performance of SQL in Oracle 8. improving the
poorly optimized queries will have a significant impact on the overall performance of
your application.

2.1 The
Th e Objecti
Objective
ve of the SQL Tuning Tips
The goal of SQL tuning is to improve the performance of queries through manual
interventio
intervention.
n. The performance
performance of a query is altered by changing
changing its execution
execution path – the
set of operations that the database uses to access and manipulate the desired data. The
execution path may be altered to use specific operations when processing the query. You
should be familiar with the available operations and hints to make the best use of this
chapter.

2.2
2. 2 Th
Thee 13 Tips
Tip s
The top SQL tuning tips are, in order:

1. Avoid
Avoid unplann
unplanned ed full
full table
table scans
scans
2. Use
Use only
only selselec
ecti tive
ve ind
indexe
exess
3. Mana
Managege mult
multi- i-ta
tabl
blee join
joinss (NE
NESTSTEDED LOOP
LOOPS,
S, MERG
MERGE
E JOIN
JOINS,
S, an
and
d HASH
HASH
JOINS).
4. Manage
Manage SQLSQL stat stateme
ements
nts cont
contain
aining
ing vview
iewss
5. Tune
Tune su
subq
bqueueri
ries
es
6. Use compos
composit itee keys/
keys/sta
starr quer
queries
ies
7. Proper
Properlyly index
index CONNEC
CONNECT T BY oper
operati
ations
ons
8. Li
Limi
mitt remot
remotee table
table acce
accesssses
es
9. Manage
Manage very
very larg
largee table
table access
accesses
es
10. Use UNION
UNION ALL rather rather than UNION
UNION
11. Avoid using
using PL/SQL functionfunction calls
calls in SQL
12. Manage the use
use of bind variables
variables
13. Revisit
Revisit the tuning
tuning process
process

In the following sections, you will see examples of queries that require manual tuning
intervention, along with the execution paths before and after tuning. Both the theoretical
basis for the tip and detailed implementation information are provided.

2.3 Avoid
A void Unplanned
U nplanned Full Table S
Scans
cans

Full tables scans sequentially read all data from a table – whether it is relevant to the
query or not. There are two strong reasons for avoiding unnecessary full table scans:
 Full table scans are not selective. While nonselective searches may be appropriate
for large batch jobs that operate on many records, they are usually inappropriate
for online applications.
 Data read via full table scans is removed from the SGA buffers quickly. If the
table being scanned is not a “cached” table, the data read via a full table scan is
placed at the end of the Least Recently Used block list. As a result, the data
blocks read from the full table scan are the first to be removed from the SGA. For
a multiuser system, full table scans prevent users from sharing the tables data in
the SGA; the same blocks have to be read multiple times via physical reads of the
data files.

2.4 When
Wh en a Full Table
T able S
Scan
can Is Used

Under rule-based optimization, a full table scan will be used on a table if any of the
following conditions are met in a SQL statement:

 No indexes exist on the table


 No limiting conditions are placed on the rows returned (that is, the q
query
uery requests
all rows). For example, if the query does not have a where clause, all rows will be
returned.
 No limiting conditions placed on the rows correspond to the leading column of
any index on the table. For example, if a three-column concatenated index is
created on the City-State-Zip columns, then a query that had a limiting condition
as only the State column would not be able to use that index, since State is not the
leading column of that index.
 Limiting conditions placed on the rows correspond to the leading column of an
index, but the conditions are used inside expressions. For example, if an index
exists on the City column, then a limiting condition of

where City = ‘TOKYO’

could use the index. However, if the limiting condition was instead

where UPPER (City) = ‘TOKYO’

then the index on the City column would not be used because the City column is
inside the UPPER function. If you had concatenated the City column with a text
string, the index would not be used. For example, if the limiting condition was

where City | | ‘X’ like ‘TOKYO%

then the index on the City column would not be used.


 Limiting conditions placed on the rows correspond to the leading column of an

index, but the conditions are either NULL checks or inequalities. For example, if
an index exists on the City column, none of the following will be able to use the
index:
where City is null
where City is not null
where City != ‘TOKYO’
 Limiting conditions placed on the rows correspond to the leading columns of the
index, but the conditions use the like operator and the values starts with ‘%’ or the
value is a bind variable. For example, neither of the following will be able to use
the index:
where City like ‘%York%’
where City like :city_bind_variable

NOTE

The bind variable may contain trailing ‘%’ or even no ‘%’ at all. Regardless, an index
will not be used. See Tip #12, “Manage the Use of bind Variable”, for a discussion of the
usage of bind variables.

If cost-based optimization is used, Oracle will use full table scans for all of the cases

shown
use fullfor rule-based
table scans if optimization. Additionally,
the table has not the cost-based
been analyzed, if the tableoptimizer
is small, may
if thedecide to
indexed
columns are not selective, or if the optimization goal is set to ALL_ROWS.

To avoid unplanned full table scans, make sure the query can use an index.

2.5 How
Ho w to Make Sure a Query Can Use an Index

To make sure a query can use an index, you should first index all of your primary key
and foreign key columns. These are the columns most likely to be used during joins and
limiting conditions, so indexing them immediately resolves several of the potential full
table scan causes listed in the previous section. By default, Oracle will create a unique
index for each primary key constraint; foreign keys are not automatically indexed.
The order of the columns within your concatenated indexes is critical. There are two
issues that you must balance relative to the order of columns within indexes. First, the
column most frequently used in limited conditions should be the leading column. As
shown in the previous section, a query using only the second column of a three-column
index will not be able to use the index. Second, the most selective column – the column
with the most distinct values – should be the leading column. The selectivity of the
leading column is critical if you are using the cost-based optimizer, since the optimizer
will judge the index’s selectivity
selectivity based on the leading column’s
column’s selectivity.
selectivity. Ideally, the
column most used in limiting conditions is also the most selective column in the index. If
that is not the case, you may need to create multiple indexes on the columns (see Tip #2,
“Use Only Selective Indexes”, for tips on index creation).
Within the query, you need to verify that the indexed
Within indexed columns are not being used within
expressions such as concatenate (II), UPPER, and SUBSTR. Also, you need to verify that
th
thee li
limi
miti
ting
ng condi
conditi
tion
onss fo
forr th
thee quer
query
y are
are eq
equal
ualit
itie
ies,
s, and
and are
are not
not NULL
NULL chec
checks
ks or
inequalities.

2.6 Issues
Is sues Encountered
En countered When Creating
Creat ing In
Indexes
dexes

When creating indexes, always create them in a separate tablespace from the table they
index; if the tablespaces are stored in datafiles that are on separate devices, you can
reduce I/O contention that may occur during queries and transactions. The indexes should
always be created under the same owner account as the table, and should use a consistent
naming convention. The name for an index should include the name of the table it
indexes, and indicate whether it is a unique or nonunique index. If you index a partitioned
table, then the name of the index should indicate whether the index is a local index or a
global index. If you create local partition indexes, the local indexes should be stored apart
from the partition they index.

The storage space used by indexes should be properly sized.

2.7 Use O
Only
nly Selective Indexes

The selectivity of an index is the ration of the number of distinct values in the indexed
column(s) to the number of records in the table. If a table has1000 records, and an
indexed column on that table has 950 distinct values, then the selectivity of that index is
950/1000, or 0.95. The best possible selectivity is 1.00. Unique indexes on non-nullable
columns always have a selectivity of 1.00

The selectivity of an index is a measure of he index’s usefulness in reducing the I/O


required
required by queries
queries against the index’s table. For example,
example, if an index on a 1000-record
1000-record
table had only five distinct values, then the index’s poor selectivity (5/1000 = 0.005)
would make it a poor choice when applying limiting conditions on rows. If there are only
five distinct values among the 1000 records in the example table, the index would return
an average of 200 records for each distinct
distinct values. In that case, it may be more efficient
to perform a full table scan instead of
o f multiple index scan and table aaccesses.
ccesses.

If you are using cost-based optimization, the optimizer should not use insufficiently
select
selective
ive indexe
indexes.
s. If you are using
using rule-ba
rule-based
sed optimi
optimizat
zation
ion,, the optimi
optimizer
zer will
will not
consider the selectivity of an index when determining the execution path (unless the
index is created as a unique index), and you will have to manually tune your queries to
avoid using nonselective indexes.

2.8 Measuring Index Selectivity


Selectivit y
Index selectivity can be measured via manual and automated methods. To manually
measure selectivity, query the base table for the number of distinct values for the given
index. For example, if you are creating a two-column concatenated index on the City and
State columns of the COMPANY table, the query shown in the following listing would
return the number of distinct combinations of City and State values in COMPANY:

select COUNT (distinct City || ‘%’ || State) Once you know the number of distinct values
from COMPANY; from the preceding query, you can compare it to
the total
otal nu
numb
mber
er of row
owss in the table
able,, as
provided by the following query.

select COUNT(*) The ratio of the number of distinct values to the total number of rows
from COMPANY; is the
the se
sele
lect
ctiv
ivit
ity
y of th
thee colu
column
mns.
s. The
The advan
advanta
tage
ge of the
the manua
manuall
method is its ability to estimate the selectivity of an index before the
index is created.

To automate the selectivity measurement process for existing indexes, you can use the
analyze command. If you analyze a table, all of the table’s indexes will automatically be
analyzed as well. To determine the selectivity of an index, analyze the table, as shown in

the following
followi
command isng lis
listin
used ting.
to g. Inrate
gene thethe
generate example
example,
most , the com
compute
accurate pute statisti
statisticsstat cs op
istics
possible. opti
tion
on of the
the analyze

analyze table COMPANY compute statistics;

NOTE

You can analyze tables when using both the rule-based and cost-based optimizers. The
optimizer will only use the statistics if cost-based optimization is used.

Once the table and its indexes have been analyzed, you can determine the selectivity of
the index. The number of distinct values in the index is shown in USER_INDEXES. The
quer
queryy in the
the fo
foll
llow
owin
ing
g li
list
stin
ing
g will
will disp
displa
lay
y th
thee nu
numb
mber
er of dist
distin
inct
ct ke
keys
ys in the
the
COMPANY$CITY index.
select Distinct_Keys
from USER_INDEXES The total number of rows in the table is in
where Table_Name = ‘COMPANY’ USER_TABLES:
and Index_Name = ‘COMPANY$CITY’
‘COMPANY$CITY’ ;
select Num_Rows
from USER_TABLES
The Distinct_Keys value, divided by the where Table_Name = ‘COMPANY’;
Num_Rows value, is the index’s
select
selectivi
ivity.
ty. You can query
query USER_T
USER_TAB_
AB_COL
COLUMN
UMNS S to see the select
selectivi
ivity
ty of each
columns individually (after the table has been analyzed). As with the manual method
shown previously, these statistics allow you to calculate the selectivity of a column
before an index is created on the column.
select Column_Name, Num_Distinct The Num_Distinct column in
from USER_TAB_COLUMNS USER_T
USE R_TAB_
AB_COLCOLUMN
UMNS S displa
displays
ys the number
number of
where Table_Name = ‘COMPANY’; distinct values in that column for all rows in the
table. Divide Num_Distinct by the total number of
rows for the table (the Num_Rows column in USER_TABLES) to determine a column’s
selectivity. Null values will count as separate distinct values.

Querying USER_TAB_COLUMNS will help you measure the selectivity of individual


columns, but it will not allow you to accurately measure the selectivity of concatenated
sets of columns. To measure the selectivity of a set of columns, you need to either use the
manual method or create an index on the set of columns and re-analyze the table.

2.9 Choosing
Choo sing a Leading Col
Column
umn for a Concatenated
Concatena ted Index

As described in the section on avoiding unplanned full table scans, concatenated indexes
present a challenge to the DBA: the leading column should be the most selective column,
and it should also be the column most often used by limiting conditions in queries. If a
single column meets both conditions, that column should be the leading column. The

second-most-used
index. and second-most-selective column should be the second column in the

If a nonselecti
nonselective
ve column
column is the most frequently
frequently used column
column in limiting
limiting conditions,
conditions, you
may need to evaluate your application design. User’s queries should return the minimum
number of records they need – forcing them to use a nonselective execution path in their
queries will have a damaging effect upon their queries performance.

If a highly selective column is not frequently used in the limiting conditions of queries,
then why has an index been created on the column? If the column is a foreign key
column, then it should be indexed; otherwise the requirement for the index should be
reeval
ree valuat
uated.
ed. It is approp
appropria
riate
te to create
create lightl
lightly
y used
used indexes
indexes to resolv
resolvee perfor
performan
mance
ce
problems with specific queries, but their impact on the performance
p erformance and management of
the rest of the application must be considered.
Choosing Between Concatenated Indexes and Multiple Single-Column Indexes

What is more selective in the COMPANY table – an index on City, or a concatenated


inde
index
x on CiCity
ty an
andd Stat
State?
e? If ther
theree are
are 200
200 un
uniq
ique
ue City
City valu
values
es in a 101000
00-r
-rec
ecor
ord
d
COMPANY table, the selectivity of City is 0.2. How many distinct combinations of City
and State are there? Most likely, there are around 200 – the number of distinct vales will
increase slightly (for example, to distinguish Portland, ME, from Portland, OR), but
usually not enough to significantly alter the selectivity of the index. The largest single
factor in determining the selectivity of a concatenated index is the selectivity of its
leading column.
Because the selectivity of a concatenated index’s leading column plays such a large role
in determining the index’s overall selectivity, there is usually little additional cost to
index selectivity when more columns are added to the index. The overall size and
complexity of the index should be considered when evaluating its column composition,
but its selectivity should remain fairly constant as columns are added.

Choosing
Choosin g betwee
betweenn a single
single concatena
concatenated
ted index
index – City
City and State
State togeth
together
er – versus
versus
multiple single-column indexes – a City index and State index – requires you to evaluate
the ways in which the columns are used as limiting conditions. If both columns are used
as limiting conditions, and if the second column is used by itself in many queries, the
second column should be separately indexed. Your options are shown in Table 2.1.

As shown in Table 2.1, one index will be created on either the City column or the
concatenation of the City and State columns. As previously described, the selectivity of
these two indexes is likely to be very similar; if that is the case, then use the concatenated
index. Scenario #3 in Table 2.1 is therefore disregarded. The only remaining question is
whether an index should be created on the State column – and that determination is based
on the State column’s selectivity and the degree to which it is used as a limiting condition
column.

Scenario Index # 1 Index # 2


1 City, State (none)
2 City, State State
3 City State
Table 2- 1Concatenated
1Concatenated Versus Single-Column Index Options
Using a single concatenated index in place of multiple single-column indexes also limits
the amount
amount of input
input to AND-EQ
AND-EQUAL
UAL operat
operation
ion during
during query
query proces
processin
sing,
g, and may
signif
significa
icantl
ntly
y improv
improvee perfor
performan
mance
ce of that
that operati
operation.
on. The AND-EQ
AND-EQUALUAL operat
operation
ion
merges lists of data returned from multiple index scans. AND-EQUAL scans the entire
index range of all involved indexes for a given value, and then performs the merge. For
example, if there are 100,000 records in the COMPANY table, and 10,000 of the records
had a City value of ‘Portland’ value would require 10,000 reads. If 5000 of the records
had a State value of ‘OR’ value would require 5000 reads. The AND-EQUAL operation
involving the City and State indexes thus requires 15,000 index reads (10,000+5000) to
resolve a query with limiting conditions specifying City = ‘Portland’ and State = ‘OR’. If
City and State had been concatenated into a single index, the operation may require far
fewer logical reads. Since there are 10,000 COMPANY records with a City value of
‘Portland’, you would need no more than 10,000 logical reads if City and State were
concatenated into a single index
index - a reduction of at least 33 percent. Since only one index
would be used, no AND-EQUAL operation
operation would be necessary, and the ROWIDs
ROWIDs from
the index would be passed on to the next operation in the execution path.
2.10 Manage
Manage Multi-Table
Multi-Table Joins (NESTED
(NESTED LOOPS,
LOOPS, MERGE
MERGE
JOINs and HASH JOINs)

Three join operations are available in Oracle: NESTED LOOPS, HASH JOIN, and
MERGE JOIN. As described, MERGE JOIN is a set operation; it does not return records
to the next operation until all of the rows have been processed. NESTED LOOPS and
HASH JOIN are row operations, returning the first records to the next operation quickly.
In the following section, you will see how to use the size of the tables and nature of the
query to determine which operation is most appropriate for your query.

Within each of the join options, there are a number of steps you must take to get the best
performance from the join. If you do not properly tune a join operation, then the time
required to perform the join may grow exponentially as the tables grow. In the following
sections you will see how to tune each of these operations, and how to choose which to
use.

Tuning your joins can have a significant impact on the performance of your queries, and

there areare
choices many optionstheavailable.
available, followingSince it is form
sections an important area
the largest of focus,
single tip in and
this so many
chapter.
Examples of each major option are provided.

2.11 Steps Involved in MERGE JOINs

A MERGE JOIN operation joins the results of two data scans. The MERGE JOIN
operation usually comprises three steps:

1. TABLE
TABLE ACCES
ACCESS S FULL
FULL of each
each table
table in
in the join
join..
2. SORT
SORT JOIN
JOIN to sort
sort th
thee resu
result
ltss of the data acce
accessss operat
operatio
ions
ns (such
(such as TABL
TABLE
E
ACCESS FULL)
3. MERGE JOIN to merge
merge the
the results
results of the SORT
SORT JOIN
JOIN operation
operation
The use of MERGE
MERGE JOINs
JOINs usuall
usually
y indica
indicates
tes that
that indexe
indexess are either
either unavai
unavailab
lable
le or
disabled by the query’s syntax. Figure 2.1 shows the execution path for a two-table
MERGE JOIN between COMPANY and SALES.

As a set operation that performs full table scans of its tables, MERGE JOIN is usually not
appropriate for use in multiuser online applications for the following reasons:

Row Set Set

1A. TABLE ACCESS FULL 2A. SORT 3. MERGE


off COMPANY
o JOIN JOIN
Output
Row Set to user

1B. TABLE ACCESS FULL 2B. SORT


of SALES JOIN

Figure 2- 1MERGE JOIN example execution path

 It may be slow to return the first row from the query. Since MERGE JOIN is a set
operation, not a row operation, it does not return rows to the user until all of the
rows are processed.
 Its result set will not stay in the SGA very long. Since the blocks used to satisfy
the table reads were read via full table scans, they will be the first blocks removed
from the SGA when more space is needed in the SGA’s data block buffer cache.
 Temporary segments may need to be allocated to resolve the query, potentially
resulting in temporary segment lock contention among users.

There are, however, situations in which MERGE JOIN is the most efficient way to
perform a join. In batch operations or large reports, MERGE JOIN may result in the best
possible throughput for your query. In the following parts of this section, you will see the
tuning implications for the MERGE JOIN operation and comparisons to the NESTED
LOOPS join operation. Later sections of this chapter, including the tips on managing very
large tables and remote queries, feature the MERGE JOIN operation.

2.12 Tuning
T uning Implication
Imp licationss for MERGE JOI
JOINs
Ns

MERGE JOIN will be effective wherever a full table scan will be effective. That is, in
situations in which a full table scan is preferable to an index range scan/table access by
RowID combination, a MERGE JOIN operation will be preferable to a NESTED LOOPS
join operation.

A full table scan is preferable when the table involved is very small or extremely large. If
the table is very small, it may be quicker to perform a full table scan than to perform an
index scan followed by a partial table scan. For example, if the table is completely stored
in the number of blocks scanned during one read of the database, the entire table can be
read in a single physical database read. If the entire table can be read during a single
physical read from the database, a full table scan will be more efficient than an index
range
ran ge scan
scan and table
table access
access by RowID
RowID combin
combinati
ation
on (which
(which would
would requir
requiree multi
multiple
ple
physical reads).
If a table is extremely large, it may be more efficient to perform a full table scan than to
perform an index scan followed by a partial table scan, for several reasons. First,
depending on the degree to which the data is physically stored in an ordered format, and
the number of records selected, you may read fewer blocks to satisfy a full table scan
than for a comparable index scan and table access by RowID. Second, the blocks read
into the SGA’s data block buffer cache by a full table scan are not held long in the SGA,
so they will not hinder data sharing among users; blocks read via index scans are held
long in the SGA, so a scan of a large index can prevent data sharing among users. Lastly,
full table scans are good candidates for parallel processing via Oracle’s Parallel Query
option. The choice of data access methods for extremely large tables is discussed in detail
in “Manage Very Large Accesses”, later in this chapter.

During the SORT JOIN operation, Oracle sorts as much of the data as possible inside a
memory area called the sort area. The maximum size of the sort area is defined by the
SORT_AR
SORT _AREA_
EA_SIZ
SIZE
E parame
parameter
ter in the instan
instance’
ce’ss init.o
init.ora
ra file;
file; it is alloca
allocated
ted in 8K
increments. If the sort area is not large enough to hold all the sorted data, Oracle will
allocate a temporary segment for the duration of the query. The cost of the allocation and
re
rele
leas
asee of te
temp
mpor
orar
ary
y sesegm
gmen
entt is roug
roughl
hly
y eqequi
uiva
vale
lent
nt to thethe co
cost
st of ababou
outt ten
ten
insert/u
inse rt/updat
pdate/de lete sta
e/delete statem
tements
ents.. The allocat
allocation
ion and releas
releasee of tempor
temporary
ary segmen
segmentsts

require locks
result sets, theytomay
be allocated.
experienceWhen multiple
contention users
during the concurrently attemptlock
temporary segment to allocation
sort large
process.

Data from the temporary segment is loaded into the sort area to be sorted. Every set of
“load and sort” steps is called a run. If the result set is large relative to the sort area size,
the data could be loaded and unloaded multiple times to and from the disk, generating
excessive I/O.

NOTE

As of Oracle7.2, writes to temporary segments bypass the SGA via a technique called
Sort Direct Writes. Sort Direct Writes improve the performance of writes to temporary
segments by at least 40 percent over standard writes.

The third step of MERGE JOIN, the comparison and merge portion, is very efficient. The
performance costs involved in MERGE JOIN are almost entirely found within the first
tw
two
o st
steps
eps:: th
thee full
full tabl
tablee scan
scanss and
and th
thee so
sort
rtin
ing
g oper
operat
atio
ions
ns.. Tu
Tuni
ning
ng MERG
MERGE E JOIN
JOIN
operations should therefore focus on improving the performance of the first two steps.
The performance of full table scans can be improved through I/O tuning and improved
use of Oracle’s multiblock read capabilities, or by using the Parallel Query option. You
can improve sort performance by setting a high value for SORT_AREA_SIZE, by using
Sort Direct Writes, by dedicating a tablespace to temporary segment activity, or by some
combination of these three.
2.13 How
Ho w to Designate
Desig nate a Temporary-Only
Temporary- Only Tab
Tablespace
lespace

As of OrOrac
acle
le7.
7.3,
3, you can spspec
ecif
ify
y a tabl
tables
espa
pace
ce as “tem
“tempo
pora
rary
ry”.
”. If you
you do so
so,, the
the
tables
tablespace
pace can nott be used to hold any permanent segments; it can only be used for
canno
temporary segments created during queries. The first sort to use the temporary tablespace
allo
allocat
cates
es a tetemp
mpororar
ary
y segm
segmen
entt wi
with
thin
in the
the temp
tempor
orar
ary
y tabl
tables
espac
pace;
e; when
when the
the quer
queryy
completes, the space used by the temporary segment is not dropped. Instead, the space
used by the temporary segment is available for use by the other queries; this allows the
sorting operation to avoid the costs of allocating and releasing space for temporary
segments. Maintaining a temporary segment in this fashion will reduce the temporary
segment lock allocation contention and will save the cost of the ten insert/update/delete
statements involved in temporary segment allocation.

To dedicat
dedicatee a tables
tablespace
pace to tempor
temporary
ary segments,
segments, specify
specify tem
tempor
porary in t h
ary hee create
tablespa
tabl ce or alter tablespace
espace tablespace command. The example shown in the following listing
changes an existing tablespace named TEMP_1 to a temporary tablespace.

alter tablespace TEMP_1 temporary;

To change the TEMP_1 tablespace


tablespace out of temporary state, and allow permanent objects
to be stored in it, use the permanent keyword in the alter tablespace command, as show
in the following listing.

alter tablespace TEMP_1 permanent;

2.14 Steps Involved in NESTED


NEST ED LOOPS
LOOP S

A NESTED LOOPS operation joins two data sources. NESTED LOOPS is the most
common way that Oracle performs joins – it usually indicates that an index is available
for use during the join. As a row operation, NESTED LOOPS returns each row to the
next operation as it is processed rather than waiting for the entire set to be processed.
Because it is an index-based row operation, NESTED LOOPS is a very effective join
operation for multiuser online applications.

When performing a NESTED LOOPS join, the optimizer first selects a drivi ng table for
driving
the join. A full table scan may be performed on the driving table. For each row in the
driving table, an indexed access to the driven table is performed to see if a match exists
driven
between the tables. If a matching value exists, that record is returned to the user via the
NESTED LOOPS operation.
For example
example,, conside
considerr the SALES
SALES and COMPAN
COMPANY
Y tables
tables used
used through
throughout
out.. In the
following query, COMPANY and SALES are joined by the Company_ID column. In the
example, Company_ID is the primary key of the COMPANY table, and is the first
column of a multicolumn primary key in SALES.

select COMPANY.Name When execut


When executing
ing this
this query,
query, the
from COMPANY, SALES optimizer will be able to use an
where COMPANY
COMPANY.Company_ID
.Company_ID = SALES.Company_ID
SALES.Company_ID index based on the join criteria in
and Period_ID =2; the whe re clause. The execution
where
path for the preceding query is
shown in the following listing.

NESTED LOOPS The ex


The exec
ecut
utio
ion
n path
path show
showss that
that
TABLE ACCESS FULL SALES SALES will be the driving table for
TABLE ACCESS BY ROWID COMPANY the query. The first step performed
INDEX UNIQUE SCAN COMPANY_PK by the query will be the full table
scan of SALES. For each record in
SALES, the COMPANY_PK index will be probed to see if a matching Company_ID
value exists for the SALES. Company_ID value in the current row. If a match exists, a

table access by RowID is performed to get the COMPANY. Name column reques
the query. requested
ted by

The steps of the preceding example’s NESTED LOOPS operation are:

1. Full
Full table
table scan
scan of
of the
the drivin
driving
g table
table
2. Index
Index range
range scan
scan of the driven
driven tabl
table.
e.
3. If a match
match is
is found,
found, a table
table access
access by RowID
RowID of the
the driven
driven table.
table.

These steps are illustrated in Figure 2.2. As an index-based row application, NESTED
LOOP
LO OPSS will
will be effe
effect
ctiv
ivee fo
forr join
joinss perf
perfor
orme
med
d in mult
multiu
iuse
serr on
onli
line
ne ap
appl
plic
icat
atio
ion
n
environments. For single-user batch processing, particularly if you are dealing with very
large tables, you may need to force MERGE JOINs to occur to improve your throughput.
Figure 2.2 shows the execution path.

NOTE

Since NESTED LOOPS joins, during their execution, reuse the index blocks they have
already read into the SGA’s data block buffer cache, they generate an artificially high hit
ratio for the instance.
2.15 Implications of the Driving Table in a NESTED LOOPS
Join

The key to the performance of NESTED LOOPS join is the order in which the tables are
joined. The selection of the driving table, the first table in the join, is critical. The amount
of repetition in the nested loop is the product of the previous result set and the current
accessed table.

Row Row

1. TABLE ACCESS FULL 4. NESTED


of SALES LOOPS

Row Row

2. INDEX UNIQUE SCAN 3. TABLE ACCESS


on COMPANY.Company_ID BY ROWID
of COMPANY

Output
to user

Figure 2- 2NESTED LOOPS example execution path

If more tables are used in the join, the selection


selection of the driving
driving table becomes even more
critical, since the driving set of records is used for each successive join. As a result, the
time needed to perform a join can grow exponentially as tables are added to the join
unless the driving set – the join between the driving table and the first driven table –
starts with very selective criteria.

In the previous
previous exampl
example,e, SALES
SALES was used as the drivingdriving table.
table. There
There are several
several
important points to note about the selection of the driving table:

 Although all of the SALES table’s primary key columns were specified in the
query, the SALES_PK index was not used. The SALES_PK index was not used
because there was not a limiting condition on the leading column (the
Company_ID column) of the SALES_PK index. The only condition on SALES.
Company_ID is a join condition.
 The optimizer could have selected either table as the driving table. If COMPANY
was the driving table, it would have had a full table scan performed on it.
 In rule-based optimization, if there is an equal chance of using an index regardless
of the choice of the driving table, the driving table will be the one that is listed
last in the from clause.
 In cost-based optimization, the optimizer will consider the size of the tables and
the selectivity of the indexes while selecting a driving table.

When selecting a driving table, the optimizer ranks all the tables in the from clause based
on the limiting conditions and the join conditions. The optimizer ranks each table as a
potential driving table. For each table, it evaluates the possible access paths that could be
used during the query (unique index scans, nonunique index range scans, or table scans).
The optimizer will choose an execution path that lets it best exploit the available indexes,
conside
cons iderin
ring
g the join
join condit
condition
ionss and the query’s
query’s limit
limiting
ing condit
condition
ions.
s. The cost-b
cost-base
ased
d
optimi
opt imizer
zer will
will conside
considerr the size of the tables
tables and the select
selectivi
ivity
ty of the indexes
indexes in
evaluating possible driving tables. If the rule-based optimizer is used, and two or more
tables have similar access paths available, then the optimizer will use the table mentioned
last in the query’s from clause as the driving table for the query.

2.16 How
H ow to Influence the Join
Jo in Path
Pat h

To modify the join path of a NESTED LOOPS join, you can either use hints (cost-based
optimi
optimizati
zation
on only)
only) or manual
manually
ly modif
modifyy the from
from clau
clause
se and wheree conditions in the
wher
query. Hints related to the join path of NESTED LOOPS joins include:

ORDERE
ORDERED
D Join
Join the
the tables
tables based
based on
on their
their order
order in
in the
the from
from cclau
lause.
se.

INDEX List specific indexes to use.

FULL
FULL List
List a spe
speci
ciffic tabl
tablee ffor
or a fful
ulll tab
table
le scan
can – this
his tabl
tablee may
may se
serrve as a
driving table for the query.
USE_
USE_NL
NL List
List tabl
tables
es to jjoi
oin
n via
via NENEST
STED ED LOOP
LOOPS S joi
joins
ns..

You can modify the where clause of a query to nullify the use of indexes on specific
columns. By nullifying indexes, you can influence which tables will be more likely to
have full table scans during the query – and thus more likely to be driving
driving tables for the
query.

As described in the first tip in this chapter, the optimizer will use an index if the indexed
columns are used in a limiting condition. If an indexed column is used within a function,
such as UPPER, then an index on that column would not be used by the optimizer. When
you want to force the optimizer to ignore an index, use the following conventions:

 For numeric or date columns, add zero to the value. For example, replace

where Company_ID = 12345


with

where Company_ID+0 = 12345

Adding 0 to the Company_ID value does not change the value, but it does prevent
the optimizer from using the index on the Company_ID column during the query.

 For character columns, concatenate an empty string (“) to the value. For example,
replace

where City = ‘ROANOKE’

with

where City || ‘’ = ‘ROANOKE”

Adding the empty string to the City value does not change the value, but it does
prevent the optimizer from using the index on the City column during the query.

NOTE
You need to establish and follow a consistent syntax for nullifying indexes in commonly
used queries. Whenever you see a query that has been modified using this syntax, you
will know that the query was modified for performance improvements, and the goal was
to nullify the index rather than to indicate a significant business rule.

In the following sections you will see detailed examples on topics related to the selection
of the driving table:

 What happens when you start from nonselective criteria


 What happens when you add more tables to the join
 How to index tables with many-to-many relationships

Following those sections, you will see tips for reducing the number of tables used in
joins.

2.17
2.17 What
What Happ
Happen
enss Wh
When
en You
You Star
Startt fr
from
om Nons
Nonsel
elec
ecti
tive
ve
Criteria

NESTED LOOPS is a directional operation; if you join two tables via a NESTED
LOOPS operation, you will get different performance depending on which of the tables is
the driving table. Consider SALES and COMPANY again and what if COMPANY has 5

records,
executionand
pathSALES has 100,000
will require records?
a full table scan ofIfSALES,
SALESplus
is used as the
repeated drivingtotable,
accesses the
the same
rows in COMPANY. If COMPANY were used as the driving table, the full table scan of
COMPANY
COMPAN Y would read only 5 rows, and would be follfollowed
owed by indexed
indexed accesses to the
larger SALES table. Changing the choice of the driving table can dramatically alter the
number of rows read by the query – and can in turn dramatically alter the performance of
the query. As more and more tables are added to the query, the size of the driving set
passed to each successive join has a great impact on the performance of the query, as you
y ou
will see the following sections.

2.18 What
Wha t Happens When You Add More
M ore Tables to the Join
J oin

The results from the first NESTED LOOPS operation – the join between the driving table
and the first driven table – are called the driving set. If there are four tables involved in a
NESTED LOOPS join, the steps tab\ken by the optimizer are:

1. Se
Sele
lect
ct a dri
drivi
ving
ng tabl
tablee
2. Perform
Perform a NESTED
NESTED LOOPSLOOPS join between
between the
the driving
driving table
table and a second
second table
table
3. Perfor
Performm a NESTED
NESTED LOOPSLOOPS join between
between the
the driving
driving set return
returned
ed from Step
Step #2
and the third table

4. and
Perfor
Perform
themfourth
a NESTED
NEST ED LOOPS
table. LOOPS join between
between the
the driving
driving set return
returned
ed from Step
Step #3

If you select the wrong driving table, the first NESTED LOOPS join will perform poorly.
However, selecting the proper driving table does not guaranteed good performance if you
have three or more tables in your join. If you have more than two tables in a NESTED
LOOPS join, you need to consider the size of the driving set of records involved in each
successive join. If the driving set is not selective, the performance of each successive join
will now worse; as more tables are added, the time needed to complete the query will
grow exponentially!

If there is a limiting condition in the where clause that is based on one of the columns
used to join tables, consider which table the condition is applied to. For example, if there
are four tables (named A, B, C, and D) in the query, all of which are the same size, and
the from and where clauses are of the form

from D, C, B, A
where A.join_column = B.join_column
and B.join_column = C.join_column
and C.join_column = D.join_column
and A.join_column = 12345
and D.Name = ‘MAGELLAN’

then a NESTED LOOPS join can be used if A, B, C and D have indexes on their join
columns. Table A will be joined to table B; the driving set from that join will be joined to
table C; the driving set from that join will be joined to table D, and the limiting condition
on D. Name will be applied.
If D.Name is selective, the performance of the query could be improved by joining D as
part of the first NESTED LOOPS join in the execution path. The reason for the
improvement is that fewer records will be returned by the first NESTED LOOPS join, so
a smaller driving set will be used for each successive join. Just as large driving sets cause
signif
significa
icant
nt perfor
performan
mancece gai
gains.
ns. In this
this example
example,, the wh
whereree clause could be rewritten,
changing the limiting condition on A to be a limiting condition on Dinstead:

from D, C, B, A
where A.join_column = B.join_column
and B.join_column = C.join_column
and C.join_column = D.join_column
and D.Join_column = 12345
and D.Name = ‘MAGELLAN’

Given the revised where clause, the optimizer will now first join D to C, then the driving
set from that join to C, then the driving set from that join to B, and finally the driving set
from that join to A.

What
exactlyis the
100significance
records, andof that
this change? Assume
there was that record
only one each ofin
theDtable
withA,
a B, C, and
Name D had
value of
‘MAGELLAN’. The number of buffer gets (logical reads) needed to resolve the original
query is shown in Table 2.2. An index access generates an average of two accesses inside
the index “branches”, plus one access to the index “leaf” blocks for each row and one
RowID access to the table for each row. Thus, to access 100 rows by an index search, you
need two buffer gets for the index branches, plus 100 index reads and 100 table reads, for
a total of 202 buffer gets required by the read of table A.

Operation Original Query Buffer Gets Cumulative Buffer


Gets
First Table Accessed A 2+2*100 202
First Joined Table B 100*202 20,402
Second Joined Table C 100*100*202 2,040,402
Third Joined Table D 100*100*100*202 204 040, 402
Table 2-2Cumulativ
2- 2Cumulativee Buffer Gets for Original NESTED LOOPS Join

As shown in Table 2.2, each successive join suffers if the driving set from the preceding
join is large. Table 2.3 shows the number of buffer gets needed to resolve the modified
query (in which the limiting condition on the join column was moved to table D, which
had an additional limiting condition).

As shown in Table 2.3, the change to the order of the joins made a significant difference
to the number of buffer gets the database needed to perform in order to resolve the query.
In the original query, the limiting conditions on table D did not affect the size of the
driving sets, because table D was the last table joined in the query. In the modified query,
table D is part of the first NESTED LOOPS join – and as a result, the query requires one
one-hundredth of the buffer gets in order to complete. The modified query will run
significantly faster than the original query.

Operation Original Query Buffer Gets Cumulative Buffer


Gets
First Table Accessed D 2+2*100 202
(1 row returned)
First Joined Table C 1*202 404
Second Joined Table B 100*1*202 20,604
Third Joined Table A 100*100*1*202 2,040,604
Table 2- 3Number of Buffer Gets Needed to Resolve Modified Query

2.19 How to
t o Index Tables
Tab les with Many-to-Ma
Many -to-Many
ny Relationships
Relation ships

In the example shown in the previous section, you can see the importance of the order of

join
timeexecutions for However,
for the query. minimizing thenot
it is total number
enough of buffer
to add gets and hint
an ORDERED reducing the response
or manipulate the
where clause to force the join to use a certain efficient path. In many cases, the indexing
scheme does not support the optimal path.

For example, consider a many-to-many relationship between two tables, STUDENT and
CLASS. The CLASS table has one record for each class, and its primary key is Class_ID.
The STUDENT table has one record for each student, and its primary key is Student_ID.
Each student can take several classes and each class can have hundreds of students; there
is therefore a many-to-many relationship between STUDENT and CLASS. To resolve the
many-to-many relationship between these entities, a third table is created, the GRADE
table. The primary key for the GRADE table us the combination of the Student_ID and
Class_ID columns. For every student, for every class that he or she takes, the GRADE
table will contain a Student_ID, Class_ID and the Grade scored. What should you use as
the leading column of the primary key for GRADE? If you create the GRADE primary
key with Student_ID as the leading column of the GRADE table’s primary key index. If
you use Class_ID as the leading column of the primary key, then Class_ID will be the
leading column of the index. Since the leading column of an index can significantly
affect performance of NESTED LOOPS joins (as described earlier in this section of this
chapter), then you need to be sure you are choosing the proper leading column for the
primary key index on the GRADE table.

If the primary key index is the only index defined for the GRADE table, and the leading
column of the primary key index is Student_ID, then you are forcing Oracle to resolve
the many-to-many relationship between STUDENT and CLASS only in one direction:
from STUDENT to GRADE (joining on Student_ID),
Student_ID), then from GRADE to CLASS, as
illustrated in Figure 2.3
Oracle can only use an index if there is a condition
condition on the leading
leading columns of the index.
The join from CLASS to GRADE requires the use of an index on Class_ID, which is not
the leading column for the primary key index in the GRADE table. Therefore, CLASS
cannot be the driving table for a join to GRADE. Instead, Oracle must use the result of
the STUDENT-GRADE join as the driving set when joining to CLASS. Thus, the join
between STUDENT and GRADE must be performed first. Oracle will perform a full
table scan on the STUDENT table, followed by an index-based access to the GRADE
table. The result of that join will be joined to the CLASS table.

STUDENT

Join on Result set


Student_ID

GRADE
Join on Result set
Class_ID

CLASS

Figure 2- 3Original
3Original direction of many-to-many NESTED LOOPS join example

Since no index on GRADE begins with the Class_ID


Class_ID column, you limit the join options
options
available to Oracle and the direction that a NESTED LOOPS join can take. The direction
chosen for the NESTED LOOPS join may not be the most efficient option, given the
distribution of data within the database.

Ideally, you should index the tables so that the optimizer can evaluate the paths available
and make the best decision regarding the path the join takes. In this case, it may be much
more
mo re ef
effi
fici
cient
ent to join
join CLAS
CLASS S an
andd GRAD
GRADE E firs
first,
t, and then
then join
join thei
theirr resu
result
lt se
sett to
STUDENT
STUD ENT.. To make
make this
this new path possib
possible
le (whil
(whilee also
also making
making the original
original path
path
possible), you must change the way you index the GRADE table.

When tuning queries involving many-to-many relationships, you typically need to create
two indexes on the middle table (in this case, the GRADE table). In each index, the
leading column must be the corresponding foreign key to one of the tables. The GRADE
table has two foreign keys (Student_ID and Class_ID), so it must have two indexes (one
starting with Student_ID and a second starting with Class_ID) and GRADE_CLASS
(Class_ID, Student_ID). The two indexes contain the same columns, but in reverse order.
NOTE

If your goal is solely to influence the join direction, you only need to index the Class_ID
by itself, since Class_ID would then be the leading column of a (single column) index on
the GRADE table. However, by adding the Student_ID to the index, the join from
CLASS to GRADE to STUDENT can be done by accessing the GRADE table’s indexes
without actually accessing the GRADE table itself.

The availability of the new index on the GRADE table allows the optimizer to consider a
reversed direction for the join, as illustrated in Figure 2.4. Since Class_ID is the leading
column of the GRADE_CLASS index in the GRADE table, Oracle can consider joining
CLASS to GRADE as the first join processed. A query of the students that take a certain
class can use the GRADE_CLASS index without accessing the GRADE table unless a
column from GRADE table is required, in which case the query will have to access the
GRADE table itself

For example, the following query will list all the freshman students taking the class
names CHEM101:

CLASS

Join on Result set


Class_ID

Join on Result set


GRADE Student_ID

STUDENT

Figure 2- 4Modified
4Modified direction of many-to-many NESTED LOOPS join example
select STUDENT.Student_Name,
STUDENT.Student_Name, When resolving the query, Oracle can
CLASS.Class_Name now
now choohoose the dir
directi
ction for a
from STUDENT, GRADE, CLASS NESTED LOOP join to follow. The
where STUDENT.Student_ID = GRADE.Student_ID choices are:
and GRADE.Class_ID = CLASS.Class_ID
CLASS.Class_ID
and STUDENT.Year = ‘Freshman’
and CLASS.Class_Name = ‘CHEM101’ ;  Jo
Join
in STUD
STUDEN ENTT to GRAD
GRADE, E,
and join
oin that result ult set to
CLASS. The limiting condition
on the STUDENT. Year value is not very selective; a value of ‘Freshman’ will
return about 25 percent of the STUDENT table. That large set of data will be
joined with GRADE, and the result will be joined with the CLASS table.
 Join
oin CLAS
CLASS S to GRADERADE,, andand join
oin that
that res
esul
ultt set to STUDSTUDEN ENT.T. The
The
CLAS
CL ASS.S.Cl
Clas
ass_
s_Na
Name
me crit
criter
eria
ia is a mumuchch bett
better
er limi
limiti
ting
ng co condi
nditi
tion
on than
than the
the
STUD
ST UDEN ENT.
T. Year
Year lilimi
miti
ting
ng condi
conditi
tion
on;; only
only a singl
singlee clas
classs sh
shou
ould
ld be cacall
lled
ed
‘CHEM101’. That one record is then joined to GRADE; the result set will have
one record for each student who has a grade in the CHEM101 class. That result
set is then joined to STUDENT.

The second option is much more likely to perform well than the first option, since the
result set of the first join performed is much smaller. When you create many-to-many
relationships in your databases, you should make sure you index the tables to allow the
joins to be evaluated in either direction by NESTED LOOPS joins. If possible, create
concatenated indexes to avoid table accesses to the middle join table (GRADE in this
example).

Indexing a many-to-many relationship in the manner shown in this example will provide
the greatest benefit when the tables involved in the relationship are not equal in size. In
thi
thiss example
example,, there
there are many more
more student
studentss than
than classe
classes,
s, so you should
should index the
GRADE table so that the table returning the fewest records is part of the first join. The
creation of the GRADE_CLASS index on the GRADE table allows the optimizer to
consider both directions for the processing of the NESTED LOOPS join.

2.20
2.2 0 Hash
H ash Joi
Joins
ns

As of Orac
Oracl7
l7.3
.3,, th
thee opti
optimi
mize
zerr ma
may
y dynam
dynamic
ical
ally
ly choos
choosee to perfo
perform
rm HASH
HASH JOIN
JOIN
operations instead of NESTED LOOPS or MERGE JOINs. The following query shows a
join between
b etween the COMPANY table and the SALES table. COMPANY and SALES both
have the Company_ID column as the leading column of an index.

select COMPANY.Name A trad


tradit
itio
iona
nall NESTE
NESTED D LOOP
LOOPS S
from COMPANY, SALES explain plan for the indexed join
where COMPANY.Company_ID = SALES.Company_ID
SALES.Company_ID between COMPANY and SALES
and Period_ID = 2; is shown in the following listing:
NESTED LOOPS As shshow
own
n in the
the exec
execut
utio
ion
n path
path,,
TABLE ACCESS FULL SALES SALES is the driving table for the
TABLE ACCESS BY ROWID COMPANY NESTED LOOPS join. For each
INDEX UNIQUE SCAN COMPANY_PK record in SALES, the
COMPANY_PK index is checked
for a matching Company_ID value. If a match exists, the COMPANY table is accessed
via a TABLE ACCESS BY ROWID operation to the COMPANY.

As of Oracle7.3, the optimizer may choose to ignore the index and use a hash join for this
query instead. The explain plan for the hash join version of the execution path is shown
in the following listing.

HASH JOIN In hash


hash join
join,, both
both of the
the tabl
tables
es are
are full
fully
y
TABLE ACCESS FULL SALES scanned. However, the way that the tables are
TABLE ACCESS FULL COMPANY manipulated during the query execution makes
the join very efficient.

NOTE

Hash joins refer


Hash joins have to
nothing to in
the way dowhich
with hash clusters.
a join Hash clusters
is performed by O are a way of storing data.
Oracle.
racle.

During a hash join, two tables are joined based on a set of join conditions – columns
whose values they have in common, as defined in the where clause of the query. In the
preceding example, the join condition is

where COMPANY.Company_ID = SALES.Company_ID

During a hash join, one of the tables involved in the join (in this case, SALES) is read by
Oracle via a full table scan, and a bitmap of unique keys from the joined columns from
the table is built in memory. As the rows are read, they are separated into groups or
partitions based on a hashing function that maps the join keys to a partition. Partition-
level
level hashin
hashingg is used to split both tables into matching sets based on the join criteria.
Partition-level hashing allows you to manage hash joins of large tables without excessive
I/O penalties. In addition to partitioning the table, he optimizer builds a hash table (in
memory) for all the partitions of the table that fir into memory; this is a second hashing
function performed on the join keys, to be used when performing the join. The bitmap of
values created is then used as a filter when the rows of the second table (COMPANY) are
read and processed.

If a row is successfully filtered through the bitmap (that is, it is present in the bitmap), the
hashing algorithm is used for data lookup and the subsequent join. The hash join detects
the joins as quickly as possible via in-memory processing of the join columns; the
returned values from that processing tell Oracle which rows should be joined.
If both of the tables are large, a portion of the processing may involve writing the
partitions to disk – incurring disk I/O costs in the process. The hash tables and bitmaps
always remain in memory. When doing hash joins of large tables, the challenge is to
minim
min imize
ize disk
disk I/Os
I/Os while
while joi
joinin
ning
g the tables
tables without
without index lookups
lookups and sorts
sorts (the
(the
traditional
traditional join methods, used in NESTED LOOPS and MERGE JOIN methods). methods). When
large tables are joined via hash joins, they are first partitioned in memory; the partitions
of the tables are processed separately, and the results are merged. During its hash join
process, Oracle always performs two levels of hashing to perform the join. Two-level
hashing avoids potential problems with large I/Os for joins of large tables.

Several init.ora parameters affect the usage and


an d performance of hash joins. They are:

HASH_JOIN_ENABLED If TR
TRUE (t(the d
deefault), tth
he co
cost-based op
optimizer
will use hash joins whenever the cost of hash
joins is cheaper than NESTED LOOPS or
MERGE JOINs. To disable hash joins, set this
value to FALSE.
HASH_AREA_SIZE Sets the maximum of memory that will be used
for hash joins. The default is twice the

HASH_M
HASH_MULT
ULTIBL
IBLOCK
OCK_IO
_IO_CO
_COUNT SORT_AREA_SIZE
UNT Sets
Sets the number setting.
number of blocks
block s to be read/w
read/writ
ritten
ten at a
time while doing disk I/O for hash joins. The
default value is the setting for
DB_FILE_MULTIBLOCK_READ_COUNT.

If HASH_JOIN_ENABLED is set to TRUE (its default value), queries that had been
executed as NESTED LOOPS or MERGE JOINs in earlier versions of Oracle may
suddenly start using hash joins. If you have specifically tuned a query to use another join
method, you may need to force that method to be used via hints. The unit of I/O for hash
joins is determined by the following equation:

Hash join I/O size = Database block size * HASH_MULTIBLOCK_TO_COUNT

The hash join I/O size should not exceed 64KB. Thus, for a database block size of 8K,
the HASH_MULTIBLOCK_IO_COUNT value should not exceed 8.

2.21 Management
M anagement Issues for Hash
Has h Joins

Hash joins are an alternative to MERGE JOINs and NESTED LOOPS joins. MERGE
JOIN operations tend to be I/O-bound (especially if temporary segments are used during
sorting operations); if I/O is spread across disks, then the MERGE JOIN performance
may be acceptable if both tables are large or both are small. If you are able to use the
Parallel Query Option, you can improve the performance of MERGE JOINs, since the
full table scans typically found in MERGE JOIN operations can be parallelized.
Most joins on large tables are I/O-bound. A full table scan will not benefit from having
the table’s data striped across multiple disks unless the query is parallelized or the
distribution of I/O minimizes the table scan’s impact on othe I/O operations on the
system.

Hash joins can take advantage of the Parallel Query option. However, there will be a
limit to the extent to which a hash join query can improve with parallelism. Parallelism
does not reduce the amount of CPU and memory needed by an operation; instead, it
creates multiple processes and distributes the CPU and memory costs of the query across
the processes concurrently. Since the hash join operation is already demanding a great
deal of CPU and memory resources, there may not be enough resources available to fully
exploit parallelism. If you plan to use hash joins extensively, you must have adequate
memory and CPU resources available. If inadequate memory is available, the partitioning
process used by the hash join may require a great deal of I/O to temporary segments.

The init.ora settings for the hash join parameters can be overridden at the session level
via the alte session command, as shown in the following listing. The settings for the
alterr session
HASH_AR
HAS H_AREA_ EA_SIZ
SIZEE and HASH_M
HASH_MULT ULTIBL
IBLOCK_
OCK_IO_
IO_COU
COUNTNT parame
parameter
terss can be
altered at the session level.

alter session set hash_multiblock_io_count=8;

In general, the larger hash area size is, the better the performance of the hash join will be.
The default value for the hash area size is twice the sort area size; if your sort area size is
large enough, you hash area size should be acceptable. The minimum memory used is at
least six times the product of the database block size and the
HASH_MULTIB
HASH_M ULTIBLOCK_I
LOCK_IO_COUN
O_COUNT T parameter
parameter setting.
setting. For example,
example, if the database
database
block size is 8KB, and the HASH_MULTIBLOCK_IO_COUNT is set to 8, then the
minimum memory used by a hash join is 384KB (6*8KB*8).

The HASH_M
HASH_MULTULTIBL
IBLOCK
OCK_IO
_IO_CO
_COUNT
UNT parame
parameter
ter has a signif
significa
icant
nt impact
impact on the
performance of hash joins. In systems where the amount of data in the smaller table of
the join is orders of magnitude larger than the size of the available memory, set the
HASH_MULTIBLOCK_IO_COUNT parameter value to 1. If you set this parameter
value too large, you will have few partitions with a large amount of data in each, resulting
in a negative impact on query performance.

If you are using the multithread server, the setting for the
HASH_M
HAS H_MULT
ULTIBLIBLOCK
OCK_IO_IO_CO_COUNT
UNT parame
parameter
ter is ignore
ignored
d and a value
value of 1 is used
used
instead. In a multithread server environment, the sort area and the hash area are both
allocated in the shared pool part of the memory. Usage of hash joins significantly
increases the demand on the shared pool, sometimes causing costly flush operations to
occur. When the shared pool is flushed, the application will suffer a short pause. If the
shared pool is large (as it is in most multithread server configurations), the pause could be
quite noticeable. If your system uses multithread server and the response time becomes
very long, you can try to measure the effectiveness of the shared pool.
The foll
The followi
owing
ng scri
script
pt queri
queries
es ththee dynam
dynamicic perfo
perform
rman
ance
ce view
viewss to dete
determ
rmin
inee the
the
effectiveness of the different portions of the shared SQL area.

select Namespace Library,


Gets
ROUND(DECODE(Gethits,0,1,GetHits)/DECODE(Gets,0,1,Gets),3)
GetHitRation
from SYS.V_$LIBRARYCACHE;

Sample output for the preceding query is shown in the following listing:

L
--I----
----B----
R----
A----
R----
Y----
----
----
----
----
----
----
----
----
----
----
----
---- G----
-- E----
----TS GE
G----
-- -- ET
---- H
----I----
----T----
R----
A----
T----
IO N
----
--
SQL AREA 8466 .939
TABLE/PROCEDURE 2782 .884
BODY 163 .926
TRIGGER 0 1
INDEX 153 .627
CLUSTER 458 .98
OBJECT 0 1
PIPE 0 1

The “SQL AREA” library statistics reflect the shared SQl area’s effectiveness. In the
listing, the GetHitRatio value for the SQL AREA library is 0.939 out of maximum value
of 1.0. If you are using multithrea
multithread
d server
server and have sized your shared pool large enough
based on the recommendations, and the “Get Hit Ratio” statistic (see the preceding
listing) is lower than 0.90, there probably is a negative effect caused by sorts and hash
joins. To resolve this problem, you have two alternatives. First, you can set the
HASH_JOIN_ENABLED to FALSE, then no hash joins will be performed and the hash
join processing requirements on the shared pool
po ol will be eliminated. The second options is
to keep HASH_JOIN_ENABLED set to TRUE while increasing the size of the reserved
portion of the shared pool. You can control the size of the reserved portion of the shared
pool via the SHARED_POOL_RESERVED init.ora parameter. The
SHARED_POOL_RESERVED parameter controls the allocation of large segments in the
shared pool. If you encounter performance
performance problems with your sharedshared pool, you should
should
increa
inc rease
se the SHARED
SHARED_PO_POOL_
OL_RES
RESERV
ERVEDED parame
parameter
ter settin
setting
g from
from the defaul
defaultt of 5
percent of the SHARED_POOL size to about 10 percent of that size (the parameter is
expressed in bytes). If this modification does not help and you are still getting slow
response time and low hit ratio on the SQL area portion of the shared pool, try disabling
the multithread server.

To force the use of a hash join, you can use the USE_HASH hint in your queries. The
HASH_AJ hint in a query enables the use of the hashing method for an anti-join (such as
a query containing a not in clause with a subquery).

NOTE

Si
Sinc
ncee hash
hash jo
join
in are
are enab
enable
led
d by defa
defaul
ult,
t, be su
sure
re to chec
check
k your
your expl
explai
ain
n plan
planss afte
afterr
upgrading Oracle versions to check which join method is used in your queries.

Performance Aspects of Hash Joins

In our testing, we have found that hash joins perform better than merge joins in almost all
circumstances. In many isolated executions of SQL statements, hash joins perform better
than NESTED LOOPS. However, the results may be deceiving. Since hash joins cause
full table scans to be executed,
executed, the first rule of tuning
tuning,, “avoid full table scans”, kicks in.
As a reminder, full table scans of tables that are larger than 5 blocks and are not defined
as cached will not effectively cache the blocks read. Queries performing full table scans
of large tables always cause disk I/O. Queries using hash joins are not exempt from this
rule. The result is that although
although on a one-by-one basis, the queries using hash joins may
seem to perform better than queries using NESTED LOOPS, online applications using
NESTED LOOPS joins for their queries will utilize the data block buffer cache best and
perform best for the majority of the users.

However, hash joins are an important feature in the Oracle optimizer. Hash joins have an
advantage when the following conditions apply:

 The join is performed in a batch application and is not expected to be used


concurrently by many users.
 The join is performed between two large tables (large tables are tables that are
significantly larger than the buffer cache)
 The join is performed between a large table and a small table. In this case, use the
hint CACHE for caching the small table. ForF or example:

select /*+ USE_HASH(COMPANY) CACHE(COMPANY) */


COMPANY.Name,
COMPANY .Name, Sum(SALES
Sum(SALES.Dollar_Amount)
.Dollar_Amount)
from COMPANY, SALES
SALES – COMPANY is a small table and SALES is a large table where
COMPANY.Company_ID
COMPAN Y.Company_ID = SALES.Company_ID
SALES.Company_ID group by COMPANY.Name;

In the preceding example, the query will perform faster with a hash join than with a
NESTED LOOPS join because the access to the COMPANY table will be done just once
rather than again and again for every row in the SALES table. The CACHE hint causes
the blocks of the small COMPANY table to remain in the data block buffer cache for
future use by other users. However, the large SALES table is not cached because chances
are that it is larger than the data
d ata block buffer cache.

How to Change the Database Design to Reduce the Number of Joins

You can reduce the number of joins performed during queries by denormalizing the data
within your application. In a normalized database, the attributes of a table relate directly
to the full primary key of the table. In a denormalized table, attributes may relate directly
to only part of the primary key
k ey for the table.

For example, the COMPANY table used in the examples throughout has a column named
Active_Flag. You may have a lookup table of valid Active_Flag values, featuring the
Flag_Code value and a description, as shown in the following listing:

select * from ACTIVE_FLAG_CODES;

ACTIVE_FLAG DESCRIPTION
--
----
----
----
----
----
----
----
----
----
-- --
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
A Active
I Inactive
P Pending Classification
U Unknown

If you query the COMPANY table based on the Active_Flag code descriptions, you will
need to join COMPANY to the ACTIVE_FLAG_CODES table shown in the preceding
listing unless you specify the code value in your query. The following listing shows a
sample query that joins COMPANY to ACTIVE_FLAG_CODES

select Company.Name To elimelimin


inat
atee
from COMPANY, ACTIVE_FLAG_CODES the
the join
join from
from
where COMPANY.Active_Flag = ACTIVE_FLAG_CODES.Active_Flag the query, you
and ACTIVE_F
ACTIVE_FLAG_CODES.
LAG_CODES.Description
Description = ‘Active’; must either
change the
query to use the code value (where COMPANY.Active_Flag = ‘A’), or you must add the
ACTIV
ACT IVE_
E_FL
FLAG
AG_C
_COD
ODES
ES.. Desc
Descri
ript
ptio
ion
n colu
column
mn is ad
adde
dedd to COMP
COMPANANY,Y, you
you can
eliminate
elimi nate the ACTIVE_FLAG_
ACTIVE_FLAG_CODESCODES table from your join, as shown in the following
following
listing.

select Company.Name Alth


Althou
ough
gh this
this ex
exam
ampl
plee only
only deals
deals wi
with
th tw
two
o
from COMPANY tabl
tables
es,, the
the impa
impact
ct of deno
denorm
rmal
aliz
izat
atio
ion
n an
and
d
where Active_Flag_Description = ‘Active’; reducing the number of tables involved in joins
can be great when many tables are involved in
the initial join. Note that there are potential costs – this method requires more storage
space in the COMPANY table, and could cause problems if the Description values for the
Active_Flag codes change.

A second method of denormalization involves creating columns that store values for
specific ranges. For example, if your data is based on dates, there may be logical
divi
divisi
sion
onss of date
date vale
vales.
s. In th
thee SALE
SALE tabl
table,
e, th
thee prim
primar
ary
y key is Com
Compa pany_
ny_ID
ID anand
d
Period_ID. For each record, a Sales_Total value is stored. In order to select the data from
four periods, you need to query four records.

If your periods are predetermined, and the number f periods you need in unchanging, you
may be able to denormalize the SALE table to contain additional columns. The new
columns could store the Sales_Total values for specific periods. Instead of selecting four
rows for a company:
select Period_ID, Sales_Total
from SALE PERIOD_ID SALES_TOTAL
where Compant_ID = 8791 --------------- --------------------
and Period_ID between 1 and 4; 1 1000
2 2300
3 1890
4 2410

you could instead select one row from a modified SALE table:
select Period_1_Sales, Period_2_Sales, Period_3_Sales, Period_1_Sales
Period_4_Sales Period_2_Sales
from SALE Period_3_Sales
where Company_ID = 8791; Period_4_Sales
----
------
----
----
----
----
----
----
---- ----
------
----
----
----
----
----
----
-- ------------------- -------------------
1000 2300 1890 2410

By storing fewer records in the table, you reduce the size of the driving set of records
used in joins with the table. However, you need to consider the likelihood that the data or
the divisions of the data may change. If the data or the divisions change frequently,
storing values according to their ranges may not be appropriate for your application.

A third denormalization method involves keeping the most current detail record in the
same table as the master record. In many application
applications,
s, the master
master table (for example,
example, an
EMPLOY
EMP LOYEE EE table)
table) stores
stores data
data that
that is consta
constant
nt over time,
time, and the detail
detail table (for
(for
example, SALARY_HISTORY) stores data that changes over time. In many master-
detail relationships, the most important detail record is the most recent one. In that case,
the current active detail record could be stored in the master table to reduce the number of
subqueries involved in the query. Based on the application, the DBA may decide to have
the data stored redundantly in both the master table and the detail table. See the tips
regard
regarding
ing subque
subquerie
riess later
later in thi
thiss chapter
chapter for furthe
furtherr inf
inform
ormati
ation
on on master
master-det
-detail
ailss
queries.

2.22
2.2 2 Managin
Man aging
g Joins
Jo ins

In the preceding sections, you’ve seen descriptions of NESTED LOOPS, MERGE JOINs,
and HASH JOIN operations, and the impact and tuning methods for each. The optimal
type of join for your application depends on a number of criteria (such as the type of join
for your application depends on a number of criteria (such as the type of application and
the database design). If you use a NESTED LOOPS join, you need to make sure that the
proper driving table – and the proper driving set – is used base query. If you choose the
proper execution paths for your queries, they will be better able to withstand the impacts
of changes within the database as tables grow over time.

2.23 Managing
Ma naging SQL
S QL Stateme
Statements
nts Cont
Containing
aining Views
V iews

If a query contains a view, the optimizer has two ways of resolving the query: first
resolve the view and then resolve the query, or integrate the view text into the query. If
the view is resolved first, the entire result set of the view is first determined, and the rest
of the query conditions are then applied as a filter.

Depending on the relative sizes of the tables involved, resolving the view frist can cause
performance degradation for your queries; if the view is integrated into the query, the
query’s conditions can be applied within the view, and a smaller result set can be used. In
some situations, however, you may improve your query’s performance by separating
grouping operations via views. Both of these situations will be described in the following
sections. When you use views in your joins, you need to actively manage how the views
are resolved.

If a view contains a set of operations – such as group by, SUM, COUNT, or DISTINCT
– then the view cannot be integrated into the query. For example, the following listing
creates a view called COMPANY_COUNT that counts the records in the COMPANY
table, by State.

create view COMPANY_COUNT as If the COMPANY_COUNT view is joined to


select State, COUNT(*) Company_Count another table in a query, the
from COMPANY COMP
CO MPAN ANY_ Y_COCOUN
UNT T view
view wi
will
ll be reso
resolv
lved
ed
group by State; first, its full result set will be determined, and the
results will be used as input to the rest of the
quer
query.
y. Beca
Becaus
usee it co
cont
ntai
ains
ns a grou
roup by claus
clause,e, the
the view’
view’ss SQL
SQL sysynt
ntax
ax ca
cann
nnot
ot be
integrated into a larger query.

The SQL syntax of views that do not use grouping functions can be integrated into the
larger query. For example, the PERIOD3_NAMES view shown in the following listing
contains a join operation, with no
n o grouping or set operations.

create view PERIOD3_NAMES as


select COMPANY.Name If the PEI|RIOD3_NAMES view is
from COMPANY, SALES joined to another table, the SQL
where COMPANY.Company_ID = SALES.Company_ID
SALES.Company_ID syntax of PERIOD3_NAMES can
and SALES.Period_ID = 3; be integrated with the SQL syntax
of the rest of the query.

NOTE

Using UNION in a view does not prevent the view’s SQ syntax from being integrated
with the query’s syntax. If there is a UNION, the optimizer can still integrate the query
with the view by integrating each part of the UNIONed SQL separately.

2.24 Improving
Imp roving IIntegratio
ntegrationn of Views
V iews into Queries

If a view returns a large result set, or if the view’s result set is to be filtered by additional
limiting conditions in a query, the query would likely benefit from having the view’s
SQL integrated into the query. The optimizer will automatically perform the integration if
it can.

To avoid having views that cannot be integrated into queries, do not use grouping
functions in the views – defer the grouping operations to the query that joins with the
view. Such a deferral
deferral is not always a feasible solution,
solution, but needs to be investi
investigated
gated on a
view-by-view and query-by-query basis.

For example, suppose you had a view called SALES_TOTAL_VW that summed the
Sales_Total column of SALES, by Company_ID.the view’s SQL, shown in the following
listing, groups by Company_ID.

create view SALES_TOTAL_VW as The SALES+TOTAL


SALES+TOTAL_VW _VW view
select Company_ID, SUM(Sales_To
SUM(Sales_Total) could
tal) Sum_Sales_Total be joined to the
from SALES COM
CO MPAN
PANY tabl
tablee to gegett the
group by Company_ID; company’s name
(CO
COMMPANY
PANY.N
.Nam
ame)
e) and
and the the
company’s cumulative sales total (SALES_TOTAL_VW.Sum_Sales_Total), as shown in
the following listing.

select COMPANY.Name,
COMPANY.Name, Sum_Sales_total
from COMPANY, SALES_TOTAL_VW The grouping is
where COMPANY.Company_ID = SALES_TOTAL_VW.Company_ID; performed within the
view. The explain plan
for the query, in the following listing, shows that the view’s output is used as the driving
set of records for a NESTED LOOPS join.

NESTED LOOPS You ca


You can
n impr
improv
ovee the
the leve
levell of
VIEW SALES_TOTAL_VW integration on the example query.
SORT GROUP BY In SALES_TOTAL_VW, records
TABLE ACCESS FULL TABLES are grouped by Company_ID. In
TABLE ACCESS BY ROWID COMPANY the query, one record is returned
INDEX UNIQUE SCAN COMPANY_PK
for
for ea
each
ch Comp
Companany_
y_ID
ID (sin
(since
ce
SALES_TOTAL_VW and
COMPANY are joined on Company_ID). Therefore, the view and the query display
records at the same level – the Company_ID level.

You can therefore perform the grouping operation at the query level instead of at the
view level. Join SALES to COMPANY using the Company_ID column, and perform the
grouping at the Company_ID level. In the example in the following listing, the group by
clause groups by COMPANY.Name, which is an attribute of Company_ID.

select COMPANY.Name,
COMPANY.Name, SUM(Sales_Total)
SUM(Sales_Total) The SQL in the preceding listing
from COMPANY, SALES will generate the following
where COMPANY.Company_ID = SALES.Company_ID
SALES.Company_ID execution path:
group by COMPANY.Name
COMPANY.Name
SORT BY
NESTED LOOPS
The grouping operation is now performed after TABLE ACCESS FULL SALES
the NESTED LOOPS join. TABLE ACCESS BY ROWID COMPANY
INDEX UNIQUE SCAN COMPANY_PK
If there are conditions on the columns used in the group by clause of the view, these
conditions will be performed before the view is resolved. For example, the query in the
fol
followi
lowing
ng lisisti
ting
ng ha
hass a limi
limiti
ting
ng cond
condit
itiion on the ComCompapany
ny_I
_ID
D col
colum
umn n of
SALES_TOTAL_VIEW:

Select Sum_Sales_Toatl Because of the limiting


From SALES_TOTAL_VIEW
SALES_TOTAL_VIEW where Company_ID = 3; condition on the
Company
Com pany_ID
_ID column
column,, only
only
the rows that pass that condition will be summed by the view.

2.25 Forcing Views tto


o Remain Separate

Sometimes, you may wish for the view’s SQL to not be integrated with the rest of the
query. For example, if you are performing a group by operation on a NESTED LOOPS
join of two tables, the grouping operation is not completed
co mpleted until the two tables have been
fully joined. The examples in this section will use the same objects as the previous
section, with the opposite outcomes – a view that is deliberately resolved separately
instead of being integrated into a larger query.

In the following example, COMPANY and SALES are joined, and the Sales_Total
column from the SALES table is summed by Company Name.

select COMPANY.Name,
COMPANY.Name, SUM(Sale_Total)
SUM(Sale_Total) Gi
Give
venn the
the SQL
SQL in the
the prec
preced
edin
ing
g
from COMPANY, SALES listing, and the table and operations
where COMPANY.Company_ID = SALES.Company_ID
SALES.Company_ID desc
descriript
ptio
ions
ns,, the
the exec
execut
utio
ion
n pa
path
th
group by COMPANY.Name;
COMPANY.Name; would be

SORT BY GROUP As the explain plan listing


NESTED LOOPS shows, SORT GROUP BY
TABLE ACCESS FULL SALES is the last operation
TABLE ACCESS BY ROWID COMPANY performed – it follows the
INDEX UNIQUE SCAN COMPANY_PK NESTED LOOPS join
operati
oper ation.
on. But what if the
columns being grouped are all from just one of those tables? You could perform the
grouping functions on the table prior to performing the join. If SALES is much larger
than COMPANY, performing the SORT GROUP BY after the join requires the query to
perform extra work. To reduce the resource requirements, you can rewrite the query to
use a view that a group by clause. The following listing shows a sales summary view
that contains the SALES grouping information from the prior query.
create view SALES_TOTAL_VW as The SALES_TOTAL_V
SALES_TOTAL_VW W view
select Company_ID, SUM(Sales_To
SUM(Sales_Total)
tal) Sum_Sales_Total can now be joined to the
from SALES COM
CO MPAN
PANY tabl
tablee to gegett the
group by Company_ID; origin
original
al quer
query’s
y’s result
result set (the
(the
Company.Name and the
Sales_Total)

select Company.Name, Sum_Sales_Total


Sum_Sales_Total In the modified
from COMPANY, SALES_TOTAL_VW version, the query
where COMPANY.Company_ID = SALES_TOTAL_VW.Company_ID; does
does not
not conta
contain
in the
the
group by clause – the
grouping operation is inside the SALES_TOTAL_VW view and cannot be integrated
with the rest of the query. The explain plan for this version of the query is shown in the
following listing.

NESTED LOOPS In the


the ex
expl
plai
ain
n plan
plan show
show in the
the
VIEW SALES_TOTAL_VW preceding listing, the SORT
SORT GROUP BY GROUP BY operation is
TABLE ACCESS FULL SALES performed only on the SALES
TABLE ACCESS BY ROWID COMPANY
INDEX UNIQUE SCAN COMPANY_PK table and the view’s result set is
de
dete
terrmine
mined;
d; the resu
resullt of that
that
oper
operat
atio
ion
n is then
then join
joined
ed to the
the
COMPANY table via a NESTED LOOPS join. If the SALES table is significantly larger
than the COMPANY table, performing the SORT GROUP BY before the join may result
in great performance gains.

You can embed view queries directly within the from clause
clausess of queries, and therefore
therefore
not have
have to create
create views
views (such
(such as SALES_T
SALES_TOTA
OTAL_V
L_VWW earli
earlier)
er) separa
separatel
tely.
y. In the
follow
following
ing lis
listin
ting,
g, the SALES_TO
SALES_TOTAL TAL_VW
_VW syntax
syntax is embedde
embedded d in the query’s
query’s from
clause, and the view is joined to the COMPANY table.

select Name, Sum_Sales_Total


from COMPANY,
(select Company_ID Sales_Co_ID, The explain plan for the query
SUM(Sales_Total)
SUM(Sales_Total) Sum_Sales_total in the prec
preced
ediing list
listiing is
from SALES functi
fun ctional
onally
ly indent
indentica
icall to the
group by Company_ID)
explain plan for the query that
where COMPANY.Company_ID = Sales_Co_ID;
us
used
ed SALE
SALES_T
S_TOTOTALAL_VW_VW..
Thee view
Th view is reso
resolv
lved
ed befobefore
re
the join is performed. As shown in the following listing, the only difference is the name
of the view.

NESTED LOOPS NOTE


VIEW from$_subquery$_ 2
SORT GROUP BY If you have a view that does
TABLE ACCESS FULL SALES no
nott cont
contai
ain
n a grou
groupi
ping
ng
TABLE ACCESS BY ROWID COMPANY
INDEX UNIQUE SCAN COMPANY_PK
operation, and you do not want the view’s SQL to be integrated into the rest of the query,
you can use the NO_MERGE hint. The NO_MERGE hint prevents the view’s query
syntax from being merged into the rest of the query.

The technique of embedding views inside the SQL is useful if you require the view to be
executed before the SQL statement but you want to be able to pass it a starting condition.

For example,
examp
table; le,a you
that is cannot
cannotofuse
restriction theconn
connect byoperation
ect by
connect operation.and thenneed
If you jointothe result atojoin
perform another
with
a connect by operation prior to Oracle7.2, you needed to create a view to perform the
connect by operation and then join the view to the other table. However, the limitation of
this solution was that the view was executed before the join could take place (as shown in
the examples earlier in this section). Thus, the entire table was returned from the view
before the join took place, having a negative effect on the performance. If you are
interested in only a small portion of the table or if the table is large, using a view to
perform the connect by will be a very costly method from a performance perspective.

With the ability to embed an entire SQL statement into the from clause (available
(available as of
Oracle7.2), you can join a conn ect by query result to another table without bearing the
connect
performance penalty associated with the view-based method. The following SQL
statement returns all the tables used in an explain plan with the number of rows.

select Object_Name, Row_Count, Q_Plan


from ALL_TABLES,
(select
LPAD (‘ ‘, 2*Level)||Operation||’ ‘||Options||’ ‘||Object_Name Q_Plan,
Object_Name
Object_Owner
from PLAN_TABLE
connect by prior ID = Parent_ID and Statement_ID = :id
start with ID=0 and Statement_ID = :id) plan_view
where ALL_TABLES
ALL_TABLES.Table_Name
.Table_Name = PLAN_VIEW>object_Name
PLAN_VIEW>object_Name
and ALL_TABLES.Owner
ALL_TABLES.Owner = PLAN_VIEW.Object_Owner;
PLAN_VIEW.Object_Owner;

In the preceding example, embedded SQL is used to pass the statement ID as the :id bind
variable to the portion of the statement that executes the connect by operation. Thus, you
can pass limiting
limiting criteri
criteriaa to the portio
portion
n of the query
query tha
thatt per
perfor
forms
ms the connect by
operation, and join the result set to another table.
2.26
2.2 6 Tune
Tu ne Su
Subque
bqueries
ries

When using subqueries, you may encounter several unique problems. Potential problems
with queries involving subqueries include the following:

 Subqueries may be resolved before the rest of the query is resolved (similar to
views that perform grouping functions – see the previous section on views).
 Subqueries may require specific hints that are not directly related to the query that
calls the subquery.
 Subqueries that could be performed as a single query may instead be written as
several distinct subqueries
 Subqueries may not perform existence checks in the most efficient manner, either
by using a not in clause or by failing to use an exists clause.

In the following sections, you will see each of these problems addressed, and solutions
provided.

2.27 When Subqueries


S ubqueries Are Res
Resolved
olved

If a query contains a subquery, the optimizer has two ways of resolving the query: first
resolve the subquery and then resolve the query (the “view” method), or integrate the
subquery into the query (the “join” method). If the subquery is resolved first, the entire
result set of the subquery is first calculated, and the rest of the query conditions are
applied as filter. If the subquery is integrated into the query, the subquery’s conditions
and tables can be joined to the rest of the query. If you are not using the subqueries to
perform existence checks, the “join” method will usually perform better than the “view”
method.

If a subquery contains a set operation – such as group by, SUM or DISTINCT – then
the subquery cannot be integrated into the rest of the query. As described in the “Manage
SQL Statements Containing Views” section of this chapter, a nonintegrated subquery
restricts the options available to the optimizer. Since the subquery cannot be integrated
into the query that calls it, the subquery will be resolved before the rest of the query is
resolved.

The query
The query in th
thee foll
followi
owing
ng list
listin
ing
g co
cont
ntai
ains
ns a su
subq
bquer
uery
y wi
with
th a DI
DISTI
STINC
NCT T clause.
Because the subquery contains a DISDISTIN
TINCTCT clause, the subquery is treated as a view
and is resolved separately from the rest of the query. The query returns the Names of
companies that had a Sales_Total > 10000 in Period 4.

select COMPANY.Name Although the query in the preceding


from COMPANY listing
where COMPANY.Company_ID in called adoes not contain
“view”-type a view,
subquery it is
because
(select distinct SALES.Company_ID
SALES.Company_ID
from SALES
where Period_ID = 4
and Sales_Total > 10000);
it contains a grouping function (DISTINCT). The execution path for the preceding query
is shown in the following listing.

NESTED LOOPS As shown in the execution path, the


VIEW “view” representing
representing the subquery
subquery
SORT UNIQUE se
serv
rves
es as the
the driv
drivin
ing
g tabl
tablee for
for a
TABLE ACCESS FULL SALES
TABLE ACCESS BY ROWID COMPANY NESTED
the subqueryLOOPS
can be join. However,
rewritten to act
INDEX UNIQUE SCAN COMPANY_PK
as a joijoin. In the
the exam
examplple,
e, the
DISTINCT clause in the subquery
is unne
unnececess
ssar
ary,
y, si
since
nce the
the su
subqu
bquer
ery
y is mere
merely
ly ge
gener
nerat
atin
ing
g a list
list of vali
valid
d valu
values
es for
for
COMPANY.Company_ID. Instead of using a subquery, the query could be rewritten as a
join of the COMPANY and SALES tables. The revised version of the query is shown in
the following listing.

select COMPANY.Name Since the query has been rewritten


from COMPANY, SALES as a join
join,, the
the opti
optimi
mize
zerr ca
can
n now
where COMPANY.Company_ID = SALES.Company_ID
SALES.Company_ID evaluate potential driving tables for
and SALES.Period_ID = 4 the query.
query. The revise
revised
d execut
execution
ion
and Sales_Total > 10000;
path is shown in the following
listing. Although the SALES tables
is still chosen as the driving table (for this particular query), the VIEW operation is no
longer used. The optimizer
optimizer can choose how to drive the query instead
instead of being forced
forced to
use the subquery “view” to drive the query.

NESTED LOOPS 2.28 Hints for


TABLE ACCESS FULL SALES
TABLE ACCESS BY ROWID COMPANY Subq
Subque
ueri
ries
es That
That
INDEX UNIQUE SCAN COMPANY_PK

Return the
Maximum Value

In some applications, both historical and current data stored. For example, you may store
all current employee demographic information in a single table, with an employee’s
salary history data in a separate table. To determine the current salary of an employee,
you need to select the record with the maximum effective date for that employee from the
historical table. To select the most recent data along with the demographic data, you need
to use a subquery – and there is a useful hint that ap
applies
plies to this sort of query.

Conside
Consi derr an EMPLO
EMPLOYEYEEE tabl
tablee that
that has
has tw
two
o colu
column
mnss – Empl
Employe
oyee_
e_ID
ID and
and Name
Name..
Employee_ID is the single-column primary key of the EMPLOYEE table. A second
table,, EMPLOYEE_HIS
table EMPLOYEE_HISTORY,TORY, comprises three columns:
columns: Employee_ID
Employee_ID,, Salary,
Salary, and
Effective_Date. To select the current salary for an employee, you would execute the
query shown in the following listing.
select The ex
The exec
ecut
utio
ion
n path
path for
for the
the
from EMPLOYEE, EMPLOYEE_HISTORY EH preceding query is shown in
where EMPLOYEE>Employee_ID = EH.Employee_ID the following listing. It shows
and EMPLOYEE.Name = ‘George Washington’ that the indexes on the
and EH.Effective_Date = Employee_ID columns in both
(select MAX(Effective_Date)
MAX(Effective_Date)
from EMPLOYEE_HISTORY E2 tEMPLOYEE_HISTORY
he EMPLOYEE and
where E2.Employee_ID = EH.Employee_ID
EH.Employee_ID);
);
tables are used.
FILTER
NESTED LOOPS
TABLE ACCESS FULL EMPLOYEE_HISTORY
TABLE ACCESS BY ROWID EMPLOYEE
INDEX UNIQUE SCAN EMPLOYEE$EMPLOYEE_ID
SORT AGGREGATE
TABLE ACCESS BY ROWID EMPLOYEE_HISTORY
INDEX RANGE SCAN EMPLOYEE_HISTORY$EMPLOYEE_ID
EMPLOYEE_HISTORY$ EMPLOYEE_ID

Since a MAX function is used on the Effective_Date column, the index on that column
could be used during the execution of the subquery. You can specify that index to be used
via an INDEX_DESC hint – that will force the index to be read from its maximum to its
mi
mini
nimu
mumm vavalu
lue,
e, inst
instea
eadd of in the
the us usua
uall asce
ascend
ndin
ing
g dire
direct
ctio
ion.
n. As a resu
result
lt,, the
the
MAX(Effective_Date) will be returned to the calling portion of the query more quickly.
In the following listing, the query has been rewritten to use the INDEX_DESC hint. The
INDEX_DESC hint tells the optimizer to use the index on the
EMPL
EM PLOY
OYEEEE_I
_IND
NDEXEX_D
_DESESCC hint
hint tell
tellss the
the op
opti
timi
mize
zerr to ususee the
the inde
index
x on the the
EMPLOYEE_HISTORY.Effective_Date column. Only a single row will be returned
because of the RowNum criteria in the where clause – and that single row will be the one
with the highest value in the EMPLOYEE_HISTORY$EFF_DT index.

Select /*+ INDEX_DESC (eh employee_history$eff_d


e mployee_history$eff_dt)
t) */ The execution path of the revised
* query is shown in the following
from EMPLOYEE, EMPLOYEE_HISTORY EH listing. The index on the
where EMPLOYEE.Employee_ID = EH.Employee_ID
EH.Employee_ID Effe
Effect
ctiv
ive_
e_Da
Date
te co
colu
lumn
mn is now
and EMPLOYEE.Name = ‘George Washington’ used,
use d, and the sorti
sorting
ng operat
operation
ionss
and EH.Effective_Date < SysDate
have disappeared from the explain
and RowNum = 1;
plan.

COUNT STOPKEY Rewriting


NESTED LOOPS the query to
TABLE ACCESS BY ROWID EMPLOYEE_HISTORY use the
INDEX RANGE SCAN EMPLOYEE_HISTORY$EFF_DT
TABLE ACCESS BY ROWID EMPLOYEE
INDEX UNIQUE SCAN EMPLOYEE$EMPLOYEE_ID

Effective
Effective_Da
_Date
te index
index may improv
improvee the perfor
performan
mancece of the overall
overall query,
query, since
since the
Ef
Effe
fect
ctiv
ive_
e_Da
Date
te colum
columnn may
may be more
more sesele
lect
ctiv
ivee than
than the
the Emlp
Emlpoye
oyee_
e_ID
ID colu
column
mn in
EMPLOYEE_HISTORY.

2.29
2.2 9 How to Combin
Co mbinee Subquerie
Subq ueriess
A single query may contain multiple subqueries. The more subqueries you use, the more
diffic
difficult
ult it is to integr
integrate
ate or rewrit
rewritee them
them into
into larger
larger joins.
joins. Since
Since having
having multip
multiple
le
subqueries makes integration difficult, you should try to combine multiple subqueries
where possible.
You can even combine subqueries that at first seem unrelated. For example, consider the
EMPL
EM PLOYOYEE
EE tabl
tablee ag
agai
ain.
n. In ad
addi
diti
tion
on to the
the two
two co
colu
lumn
mnss it alre
alread
ady
y co
cont
ntai
ains
ns –
Employ
Emp loyee_
ee_ID
ID and Name
Name – add three
three new column
columns:
s: Active
Active_Co
_Code
de (indic
(indicati
ating
ng if the
employee is an active or retired employee), Hourly_Code (indicating full-time or part-
time workers), and Start_Date. For the Hourly_Code and Active_Code values, separate
code tables will be created, as shown in the following listing.

Create table ACTIVE_CODE


ACTIVE_CODE Applications that query the EMPLOYEE table are more
(Active_Code NUMBER, likely
likely to refere
reference
nce the code
code descri
descripti
ptions
ons than
than the code
Description VARCHA
VARCHAR2(20));
R2(20)); valu
values
es.. A typi
typica
call upda
update
te of the
the EMPL
EMPLOY OYEEEE tabl
tablee
Create table HOURLY_CODE
HOURLY_CODE (updating a person’s status when he or she starts) is shown
(Hourly_Code NUMBER, in the following listing.
Description VARCHA
VARCHAR2(20));
R2(20));

Update EMPLOYEE
Set Active_Code = 1
The update command in the Where Active_Code in
preceding listing will set the (select Active_Code from ACTIVE_C
ACTIVE_CODE ODE
Active_Code value to ‘1’ for where Description = ‘HIRED’)
and Hourly_Code in
any full-time employee who
(select Hourly_Code from HOURLY_CODE
HOURLY_CODE
has been
been hire
hired
d befo
before
re th
thee where Description = ‘FULLTIME’)
‘FULLTIME’)
current date. and Start_Date <= Sysdate;

For each record of the


EMPL
EM PLOY
OYEE
EE tabl
table,
e, the
the su
subq
bquer
uerie
iess wi
will
ll be perf
perfor
orme
med.
d. To avoid
avoid havi
having
ng mult
multip
iple
le
subqueries performed per record, you can combine these two unrelated subqueries into a
single subquery, as shown in the following listing. The ANY operator is used to evaluate
which rows are returned from the combined subquery.

Update EMPLOYEE The combined subquery is


Set Active_Code = 1 a Cartesian product of the
Where (Active_Code, Hourly_Code) = ANY two
two code
codess tabl
ables – bubutt
(select Active_Code, Hourly_Code since they are small codes
from ACTIVE_CODE, HOURLY_CODE tabl
tables
es an
and
d eaeach
ch retu
return
rnss
where ACTIVE_CODE.Description = ‘HIRED’
only one row, this should
and HOURLY_COD
HOURLY_CODE.Description
E.Description = “FULLTIME’)
“FULLTIME’)
and Start_Date <= Sysdate; not place an undue burden
on your performance. Any
performance costs from
combining the queries in this manner may be offset by the performance improvements
gained by reducing the number of subqueries performed per row.

2.30 How to
t o Perform
Perf orm Existence
Ex istence Checks
Often, subqueries do not return rows, but instead perform a data validation purpose. For
example, you may check that a table with a primary key does not have any “children”
records in a foreign key table for specific values of the primary key column. Such logical
checks – that records in related tables either exist or do not exist – are called existence
checks.
chec ks. You can use the ex istss and not exists
exist exists clauses to improve the performance of
existence checks.

USING exists whenever a join does not return any columns from one of the tables, or if
you only care that a row exists, the join can use the exists clause. Using the exists clause
may eliminate unnecessary table accesses. The two queries in the following listing are
functionally equivalent; they return a row from the COMPANY table if it has a matching
record that meets the limiting conditions placed on
o n the SALES table.

Select COMPANY.Name Both of the queries in the


From COMPANY, SALES preceding listing could be
Where COMPANY.Company_ID = SALES.Com
SALES.Company_ID
pany_ID rewritten
rewritten using the exists operator,
And SALES.Period_ID
SALES.Period_ID = 4 as shown in the following listing.
And SALES_Total
SALES_Total > 10000;

Select COMPANY.Name
From COMPANY Select COMPANY.Name
Where COMPANY.Company_ID in From COMPANY
(select SALES.Company_ID
SALES.Company_ID Where EXISTS
from SALES (select 1 from SALES
where SALES.Period_ID = 4 Where COMPANY.Company_ID = SALES.Company_ID
SALES.Company_ID
and Sales_Total > 10000); And SALES.Period_ID
SALES.Period_ID = 4
And SALES_Total
SALES_Total > 10000);

The value of the exists operator is that it only needs to check for a single row returned by
the subquery. An in clause selects all rows from the subquery. The not exists operator
can also
also help
help you reali
realize
ze great
great perfor
performan
mance
ce gains
gains for subqueri
subqueries,
es, for the types of
existence checks described in the next section.

Using
Usin g nonott ex
exists In the example
ists exampless used
used through
throughout
out this
this chapte
chapter,
r, the SALES
SALES table’
table’ss
Company_ID column refers back to the Company_ID column of the COMPANY table.
To select the Company_Ids from COMPANY that do not have records in the SALES
table, the query shown in the following listing uses a not in clause:

Select Company_ID The main query in the preceding listing


listing uses a
From COMPANY not in clause – so there is no limiting condition
Where Company_ID NOT IN in the main query (see the section on avoiding
(select Company_ID from SALES); full table scans in this chapter for a discussion
of lilimi
miti
ting
ng condi
conditition
ons)
s).. Si
Sinc
ncee ther
theree is no
limiting condition on the query, Oracle will perform a full table scan of the COMPANY
table. For each record in COMPANY, the subquery will be executed. Since the subquery
has no where clause, it has no limiting condition. Thus, this query will perform a full
table scan of SALES for every record in the full table scan of COMPANY. You need to
rewrite the query to eliminate the full table scans.

To improve the performance of the query, you need to take a different approach. The
query does not return any columns from the SALES table; you are only using SALES as
a logical check for the COMPANY records – an existence check. Since you are checking

to see which
following records
listing do not
shows the meet a given
syntax for thecriteria, a not
preceding exists
query clause
after should
it has been be used. The
rewritten to
use the not exists clause.

Select Company_ID
From COMPANY When the query in
Where NOT EXISTS the preceding
(select 1 from SALES listing is executed,
where SALES.Company_ID = COMPANY.
COMPANY.Company_ID);
Company_ID); the main query will
perform a full table
scan of the COMPANY table, since there are no limiting conditions on the query. For
each record in COMPANY, the not exists clause will be evaluated.

When the not exists clause is evaluated, the value of COMPANY.Company_ID from the
main query is compared to values of the Company_ID column in the SALES table. If
there is a match – the Company_ID exists in both COMPANY and SALES – then a ‘1’ is
returned by the subquery. The logic of the non exists clause tells Oracle to not return the
row from the main query if the subquery returns a ‘1’. The only records that will be
returned from COMPANY are those that return no rows from the subquery.

The performance improvements for this type of existence check can be enormous. When
using not in, in the query performs nested full table scans. When using not exists, the
query can use an index within the subquery. Since when using not exists, the query can
use an index within the subquery. Since

where SALES.Company_ID = COMPANY.Company_ID

can be used as a limiting condition in the subquery, the index on the


SALES.Company
SALES.C ompany_ID
_ID column can be used in resolving
resolving the query. For each record
record in the
COMPANY table, an index probe of the SALES_PK index (which as Company_ID as its
leading column) will be performed. Instead of iteratively performing full table scans, you
can use an index scan. Because a single character is selected (the ‘1’ in the subquery)
instead
instead of a column, there is no need for the database to so a subsequent table
table access by
RowID after searching the SALES_PK index.

2.31 Use Composite


Co mposite Keys/Star Queries

Queries using tables created with a data warehousing style of application design can
realize significant performance improvements by using “star” query execution paths.
These perfor
These performan
mancece benefit
benefitss are also
also availab
available
le to som
somee querie
queriess based
based on tradi
traditio
tional
nal
relational table layouts. In this section you will see a simple data warehouse design, the
traditional execution path for a typical query, and the paradigm shift that occurs when
tuning a “star” query.

2.32 How to
t o Create
Creat e a Star Schema

A “star” query selects data from a star schema. Star schemas are common to data
warehouse table designs. Each type of data that is used in grouping columns (such as
Product, if you sum data by Product, or Period, if you sum data by Period) is represented
by a separate table. For example, the PRODUCT table, shown in the following listing,
ash
ash a Pr
Produ
oduct
ct_I
_ID
D an
and
d Prod
Product
uct_N
_Nam
amee colu
column
mn.. A prim
primar
ary
y key
key is crea
create
ted
d on the
the
Product_ID column (and a unique index is implicitly created on the Product_ID column).

Create table PRODUCT Within the conte


ntext of data
(Product_ID NUMBER’ warehousing design, PRODUCT
Period_Name VARCHAR2(20),
VARCHAR2(20), is ca
call
lled
ed a dime
dimensnsio
ion,
n, and
and the
the
Constraint PERIOD_PK
PERIOD_PK primary key (Period_ID) );PROD
PR ODUCUCTT tab
ablle is calcalled a
dimension table. In the following
lis
listi
ting,
ng, two additi
additional
onal dimens
dimension
ion tables
tables are create
created:
d: one for per
period
iods,
s, and one for
customers.

Create table PERIOD


(Period_ID NUMBER’
Period_Name VARCHAR2(20),
VARCHAR2(20),
Constraint PERIOD_PK primary key (Period_ID) );

Create table CUSTOME


CUSTOMER R
(Customer_ID NUMBER’
Customer_Name VARCHAR2(20),
VARCHAR2(20),
Constraint CUSTOMER_PK
CUSTOMER_PK primary key (Customer_ID) );

Three dimensions tables – PRODUCT, PERIOD, and CUSTOMER – have been created.
Each of the dimension tables has an ID column and a Name column, and each has a
unique index on its ID column.

To store data about the dimensions, create a fact table. A fact table has a primary key that
is created by combining the primary keys of its related dimensions tables. For each record
in the fact table, record the attributes being measured. For example, you could choose to
track orders by product, period, and customer. In the following listing, an ORDERS table
is created to track sales orders totals by the three dimensions that have been created. The
primary key of the ORDERS table is a three-column concatenation of Product_ID,

Period_ID, and
total sales for theCustomer_ID. Forand
product, period, each record, the Order_Amount attribute records the
customer.
Create table ORDERS A s shownlisting,
preceding in the
the
(Product_ID NUMBER,
Period_ID NUMBER, ORDERS table has a
Customer_ID NUMBER, primary key on
Or
Order
der_Am
_Amoun
ountt NUMB
NUMBER,
ER, Product_ID,
Constraint ORDERS_PK p prrimary ke
key
Period_ID, and
(Product_ID, Period_ID, Customer_ID),
Cons
Constr
trai
aint
nt ORDE
ORDERS RS_P
_PRO
RODU
DUCTCT_P
_PKK fo
fore
reiign k
key
ey (Pr
(Prod
oduc
uct_t_ID
ID)) Cust
Cu stom
omerer_I
_ID,
D, and
References PRODUCT(Product_ID),
PRODUCT(Product_ID), therefore has a
Cons
Constr
trai
aint
nt ORDE
ORDERS RS_P
_PER
ERIO
IOD_
D_PPK fo
fore
reig
ignn ke
keyy (Per
(Perio
iod_
d_ID
ID)) unique index
References PERIOD(Period_ID),
PERIOD(Period_ID), implicitly
impli citly created on
Cons
Constr
trai
aint
nt ORDE
ORDERS RS_C
_CUS
USTO
TOMEMER_
R_PKPK fore
foreig
ign
n key
key (C
(Cus
usto
tome
mer_
r_ID
ID)) those columns.
References CUSTOMER(Customer_ID)
CUSTOMER(Customer_ID) ); ORDE
OR DERSRS also
also ha
hass
foreign keys back to
each of th
each thee dime
dimens
nsio
ion
n tabl
tables
es it refe
refere
renc
nces
es.. The
The graph
graphic
ical
al repr
repres
esen
enta
tati
tion
on of the
the
relationships between the ORDERS table and its dimension tables is shown in Figure .
The “star” nature of the graphical layout of the schema gives it its name.

PRODUCT PERIOD
Product_ID Period_ID
Product_Name Period_Name

ORDERS
Product_ID
Period_ID
Customer_ID
Order_Amount
CUSTOMER
Customer_ID
Customer_Name

Figure 2- 5Star Schema featuring


featuring the ORDERS table

2.33 Querying the Start


S tart Schema

When a ension
dim startion
the dimens schema
tablesiscontain
tables queried,
cont dimension
ain the Name
Name ortables
Desare
Descri usually
cripti
ption involved
on column
col in the
umnss common
com query.
monly useSince
ly used
d by
queries
queri es,, dime
dimens
nsio
ion
n tabl
tables
es wi
will
ll alwa
always
ys be us
used
ed by star
star quer
querie
iess unle
unless
ss the
the quer
querie
iess
specifically refer to the ID values. A star query therefore typically has two parts to its
where clause: a set of joins between the fact table and the dimension tables, and a set of
criteria for the Name columns of the dimensions tables.

The following listing shows a star query using the ORDERS table and its PRODUCT,

PERIOD, and CUSTOMER dimension tables.


Select PRODUCT.Product_Name,
PRODUCT.Product_Name, The where clause of the
PERIOD.Period_Name, preceding query contain a
CUSTOMER.Customer.Name, section that specifies the
ORDERS.Order_Amount, criteria for the dimension
From ORDERS, PERIOD, CUSTOMER, PRODUCT tables:
Where PRODUCT.Product_Name = ‘WIDGET’
And PERIOD.Period_Name
PERIOD.Period_Name = ‘Last 3 Months’
And CUSTOMER.Customer
CUSTOMER.Customer.Name ‘MAGELLAN’Where PRODUCT.Product_Name = ‘WIDGET’
.Name = ‘MAGELLAN’
And ORDERS.Period_ID
ORDERS.Period_ID = PERIOD.P
PERIOD.Period_ID
eriod_ID And PERIOD.Period_Name
PERIOD.Period_Name = ‘Last 3 Months’
And ORDERS.Customer_ID
ORDERS.Customer_ID = CUSTOMER.
CUSTOMER. Customer_ID And CUSTOMER.Customer.Name
CUSTOMER.Customer .Name = ‘MAGELLAN’
‘MAGELLAN’
And ORDERS.Product_ID
ORDERS.Product_ID = PRODUCT.
PRODUCT. Product_ID;

And a section that joins the fact table to the dimension tables:

And ORDERS.Period_ID
ORDERS.Period_ID = PERIOD.P
PERIOD.Period_ID
eriod_ID The se
The sele
lect
ct list
list of the
the
And ORDERS.Customer_ID
ORDERS.Customer_ID = CUSTOMER.
CUSTOMER. Customer_ID quer
uery retrieves
ves the
And ORDERS.Product_ID = PRODUCT. Product_ID; Order_Amount from the
ORDERS
ORD ERS table
table for the
product, period, and customer specified.

2.34 The Traditiona


Tr aditionall Exec
Execution
ution Path

Given the example star schema, and the example star query, the execution path shown in
the following listing will be used to resolve the query. The execution path uses a series of
NESTED LOOPS joins. First, the PRODUCT table is used as the driving table for the
query. Each row of PRODUCT is joined to ORDERS via the index on ORDERS primary
key (since Product_ID is the first column of the ORDERS_PK index). The output of that
join is then joined to CUSTOMER, and then to PERIOD.

NESTED LOOPS If OR
ORDEDERSRS is very
very larg
largee
NESTED LOOPS rela
relati
tive
ve to its
its dime
dimensnsio
ionn
NESTED LOOPS tables – and in a data
TABLE ACCESS FULL PRODUCT ware
wa reho
hous
usin
ing
g appl
applic
icat
atio
ion,
n,
TABLE ACCESS BY ROWID ORDERS this is typically the case –
INDEX RANGE SCAN ORDERS_PK
the
henn the execexecut
utiion papatth
TABLE ACCESS BY ROWID CUSTOMER
sho
hownwn in the prec preced
ediing
INDEX UNIQUE SCAN CUSTOMER_PK listing may perform poorly.
TABLE ACCESS BY ROWID PERIOD
INDEX UNIQUE SCAN PERIOD_PK
Despite its usage of all the available indexes, and a small driving table, the first join –
between PRODUCT and ORDERS – may doom the performance of the query. If a large
number of records are returned from that join, that large driving set of records will be
used during the subsequent joins. If ORDERS is very large relative to its dimension
tables, you should consider using a star query or composite key execution path instead.

2.35 How
Ho w to Create
Creat e a Star Query Execution
Execu tion Path
Pat h

The first step toward using a star query is to properly index the dimension tables. Since
star queries will typically refer to the Name columns of the dimension tables in their
wher
wh eree clauses, you need to create indexes on those columns. The commands in the
following listing create indexes on the Name columns of the PRODUCT, PEIRIOD, and
CUSTOMER dimension tables.

Create index PRODUCTS$PRODUCT_NAME With these


With these three
three indexe
indexess availa
available
ble,, the
On PRODUCT(Product_Name) ; opti
optimi
mize
zerr wi
will
ll no
now
w be abable
le to crea
create
te a
differ
different
ent execut
execution
ion path.
path. It will
will create
create
Create index PERIOD$PERIOD_NAME
PERIOD$PERIOD_NAME Cartes
Car tesian
ian joins
joins of the three
three dimens
dimensionion
On PERIOD(Period_Name) ; tabl
ables
es,, using
sing the NamName co collumns
umns to
elimin
eliminateate record
recordss as it doe
does.
s. The three
three
Create index CUSTOMER$CUSTOMER_NAME
On CUSTOME
CUSTOMER(Customer_Nam
R(Customer_Name)
e) ; dimension tables, when joined together in
this manner, will form a virtual table that
Oracle
Ora cle will
will hold
hold in memory
memory during
during the
query. The virtual table will contain all of the columns from all of the dimension tables
in
invo
volv
lved
ed.. Its
Its prim
primar
ary
y key
key will
will be th
thee compo
composi site
te of the
the prim
primar
ary
y keys
keys of all
all of the
the
dimension tables involved – which is the same primary key as the fact (ORDERS) table!

The complete table of the three dimension tables will have a primary key of (Product_ID,
Period_ID, and Customer_ID). It will contain three attribute columns (Product_Name,
Period_Name, and Customer_Name). The where clause conditions of the star query will
eliminate rows from the dimension tables, so the Cartesian join of the three dimension
tables may contain very few records. Figure 2.6 shows the composite key table for this
example
exam ple,, with
with sample
sample data
data for the row returned.
returned. The Name
Name column
columnss show
show values
values
specified by the where clause, and sample ID values associated with those Name values.
The three ID columns, taken together, form the primary key for the composite key table.
For this example, each of the dimension table has a limiting condition on its Name
column, so only one row is returned from each table and the composite of the three tables
has only one row.

Once the virtual table has been created in memory, it is joined to the ORDERS table. The
execution path is shown in the following listing.
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
TABLE ACCESS BY ROWID PRODUCT
INDEX RANGE SCAN PRODUCT$PRODUCT_NAME
TABLE ACCESS BY ROWID CUSTOMER
INDEX RANGE SCAN CUSTOMER$CUSTOMER_NAME
CUSTOMER$CUSTOMER_ NAME
TABLE ACCESS BY ROWID PERIOD
INDEX RANGE SCAN PERIOD$PERIOD_NAME
TABLE ACCESS BY ROWID ORDERS
INDEX UNIQUE SCAN ORDERS_PK

Product_ID Period_ID Customer_ID Product_Name Period_Name Customer_Name

17 3 427 WIDGET Last 3 Months MAGELLAN

Primary Key

Figure 2- 6Composite
6Composite key table used for star query
The accesses to the ORDERS table and the ORDERS_PK index shown in bold in the
preceding listing can result in significant performance impact for star queries. In the
traditional
traditional execution
execution path, a range scan was performed on ORDERS_PK,
ORDERS_PK, since only the
first column of that index was used while joining ORDERS to PRODUCT; in the star
query execution path, all three columns of the ORDERS_PK index are used via a unique
index scan. The placement of the access to the ORDERS table, as the last join performed,
eliminates the potential of a large driving set of records being used during each join.
ORDERS is joined last, in the most efficient manner possible. If ORDERS is large
compared to the composite of its dimension tables, the performance of the star query can
improve by orders of magnitude when using the revised execution path.

2.36 Management
M anagement Issues for Star
Sta r Queries
The cost-based optimizer explicitly identifies star schemas as of Oracle7.2. The optimizer
may evaluate a traditional application design – such as a master table with many detail
tables – as a star schema. If the optimizer evaluates the table layout as a star schema, it
may use a star schema execution path for the query. You need to be aware of the
circumstances in which a star schema is being used; otherwise, your performance may
suffer. If a master table is not significantly larger than its detail tables, or if the limiting
conditions on the detail tables return many rows, you may obtain better performance from
a traditional execution path.
To see if the optimizer is using star query execution paths on your queries, check the
execution paths for the queries via the explain plan command. The telltale sign of star
explain
query execution path is the use of indexes that are not part of the join criteria. In the
preceding example, the indexes on the Name columns of the dimension tables were used;
the indexes on the ID columns of the dimension tables (the join columns for the query)

were not used.


NOTE

In the star query execution


execution path the majority
majority of the “cost” of the query execution
execution comes
from accesses of the dimension tables. Since dimension tables control the rest of the
query – and the number of records created in the composite key table – you should use
highly selective criteria in the where clauses that apply to the dimension tables.

The performance gains from per query execution paths can be dramatic. Using the tables
described
descri bed in this section,
section, a test system
system was created
created with 4 periods, 25 products,
products, and 500
customers in the dimension tables. 50,000 orders were created in ORDERS which is
fairly small for a fact table). The criteria on the Name columns returned 1 row from each
dimension table. Using the traditional path, Oracle required over 10,000 logical reads to
perform the query; using the star query execution path, only 14 logical reads were
requir
required.
ed. When
When the number
number of perperiod
iodss and produc
products
ts were
were doubled
doubled (with
(with ass
associ
ociate
atedd
increases in the number of orders), the number of logical reads required to resolve the star
query execution path remained unchanged at 14.

In versions of Oracle prior to Oracle7.3,


Oracle7.3, a star query execution
execution path will only be used if
five or fewer tables are used in the star query. In Oracle7.3 and beyond, there is no limit
to the number of tables used in the star query. You can force a query to use a star query
execution path via the STAR hint. In versions prior to Oracle7.3, you can force a query to
use a star query execution path by giving the optimizer hints about which indexes to use –
but of the right indexes are in place, the optimizer may choose a star query execution path
without being given a hint. Depending on the sizes of the tables involved and the queries
limiting conditions, the star query.

For example, if you have an EMPLOYEE


EMPLOYEE table referenced
referenced by three foreign key tables –
such as DEPT, JOB, and COUNTRY – you would normally index each of the three
fore
foreiign ke
key
y colu
colum
mns in the EMPL EMPLOY OYEEEE tablable (EMPL
EMPLO OYEE.
YEE.DeDept
pt_C
_Cod
ode,
e,
EMPLOYEE.Job_Code, and EMPLOYEE.Country_Code) separately. However, if you
simply create a concatenated index containing all three columns, then the optimizer may
dynamically interpret the tables as being part of a star schema – with great potential
benefit to your query performance. In testing, the following set of hints forced the star
query execution path to be chosen (forcing an index access to the fact table and NESTED
LOOPS joins for the joins to the dimension tables):

/*+ use_nl (dept) full(job) use_nl (country) index (employees) */


In the test, the number of logical reads required to return 4 rows from the query of
EMPLOYEE, DEPT, COUNTRY, and JOB went from 3300 (traditional path) down to
19 (star query execution path). If you choose to enable star query execution paths for
such queries, you must be sure that the “fact” table is significantly larger than the
“dimension” tables, and that limiting conditions are used against the records returned
from the dimension tables.

2.37 Properly Index CONNECT BY


B Y Operations
Operat ions

You can use the connect by clause to query data that is related in a hierarchical fashion.
For examples, an organization chart for a corporation has a hierarchy of levels; you can
use coconn ectt by to traverse
nnec traverse the hierarch
hierarchy
y in a single
single query.
query. Since
Since conn ectt by is an
connec
interative join, you must properly index the table used by the query; otherwise, the query
will perform nested full table scans on the same table. In the following examples, you
willl see the proper
wil proper indexin
indexing
g str
strate
ategy;
gy; as applied
applied to a common
common co connect by query. A
nnect
hierarchical query has three clauses:

 Start with identifies the root of the hierarchy, that is, the point from which the
query will begin to traverse up or down the hierarchy.
 Connect by identifies the manner in which the hierarchy is to be traversed (from
the root downward, or upward from a lower level). The direction in which the
hierarchy is traversed is determined by the placement of the prior keyword within
the connect by clause.
 Where filters rows before they are returned to the user.

The start with and connect by clauses should be the focus of your tuning efforts when
tuning queries that use the connect by operation.

Throughout the listings in this section, you will see the Explain Plan query as the
ex
exam
ampl
plee quer
query.
y.
The query syntax
Select LPAD(‘ ‘,2*Level) ||Operation||’ ‘||Options ||’ ‘||Object_Name Q_Plan is
From PLAN_TABLE PLAN_TABLE
Where Statement_ID = ‘TEST’ is us
used
ed to stor
storee
Connect by prior ID = Parent_ID and Statement_ID = ‘TEST’ the output of the
Start with ID=0 and Statement_ID = ‘TEST’ ;
explain plan
comm
comman and.
d. The
The
query shown in the preceding listing queries PLAN_TABLE and returns a listing of the
operations involved in resolving the query. The operations in the resulting listing are
shown in hierarchical order by the use of the connect by clause. The LPAD function is
used to indent the “child” operations of “parent” operations.

By defa
defaul
ult,
t, PLAN
PLAN_T
_TABL
ABLE
E does
does not have
have any inde
indexe
xess crea
create
ted
d on it.
it. Wh
What
at is the
the
execution path of the preceding query? You can use the explain plan command on the
preceding query, as shown in the following listing.
Explain plan The execution
Set Statement_ID = ‘TEST’ for path for the
Select Explain Plan
LPAD (‘ ‘,2*Level)||Operation||’ ‘||Options||’ ‘||Object_Name Q_Plan query is shown in
From PLAN_TABLE the following
Where Statement_ID = ‘TEST’
Connect by prior ID = Parent_ID and Statement_ID = ‘TEST’ listing.
Start with ID=0 and Statement_ID = ‘TEST’ ;
FILTER
CONNECT BY
TABLE ACCESS FULL PLAN_TABLE
As shown in the plan, the PLAN_TABLE is read TABLE ACCESS BY ROWID PLAN_TABLE
via a full table scan (to get the starting point for TABLE ACCESS FULL PLAN_TABLE
the query). For each record read, Oracle performs
a table access by RowID. For each table access by RowID, Oracle then performs another
full table scan of PLAN_TABLE. The FILTER operation applies the

where Statement_ID = ‘TEST’

condition after the CONNECT BY operation has completed.

You can avoid performing nested full table scans of the same table during connect by
queries
queries.. First,
First, the column
columnss used
used in the sta
start
rt wit
with h clause should be indexed, thereby
preventing the optimizer from performing a full table scan to find the root for the query.
For the Explain Plan query, the start with clause is

start with ID=0 and Statement_ID = ‘TEST’ ;

so the ID and Statement_ID columns should be indexed, as shown in the following


listing.

create index PLAN-TABLE$ID_STATEMENT on PLAN_TABLE (ID,Statement_ID);

The PLAN_TABLE$ID_STATEMENT index created in the preceding listing will be


used during the evaluation of the start with clause. The new execution path for the query
is shown in the following listing.

FILTER As sh
show
ownn in bold
bold
CONNECT BY in the prec
preced
edin
ingg
INDEX RANGE SCAN PLAN_TABLE$ID_STATEMEN
PLAN_TABLE$ID_STATEMENT T listing, the first full
TABLE ACCESS BY ROWID PLAN_TABLE table scan of
TABLE ACCESS FULL PLAN_TABLE PLAN_TABLE has
been replaced with
an index scan using the index on the ID and Statement_ID columns of PLAN_TABLE.
Two addition
additional
al indexe
indexess are needed
needed for the conn
connecectt by queries: one for each set of
columns in the connect by clause. In the case of the Explain Plan query the connect by
clause is

connect by prior ID = Parent_ID and Statement_ID = ‘TEST’

The ID column
Parent_ID has The
column. already been
index on indexed, soisthe
Parent_ID next step
created is tocommand
via the create an shown
index on the
in the
following listing.

Create index PLAN_TABLE$PARENT_ID on PLAN_TABLE(Parent_ID);


PLAN_ TABLE(Parent_ID);

PLAN_TABLE$PARENT_ID index created in the preceding listing will be used during


the evaluation of the connect by clause
clause.. The new execution path for the query is shown
in the following listing.

FILTER As shown by
CONNECT BY the bold lines in
INDEX RANGE SCAN PLAN_TABLE$ID_STATEMENT the
the prec
preced
edin
ingg
TABLE ACCESS BY ROWID PLAN_TABLE lis
listin
ting,
g, the full
full
TABLE ACCESS BY ROWID PLAN_TABLE table
ble scan of
INDEX RANGE SCAN PLAN_TABLE$PARENT_I
PLAN_TABLE$PARENT_ID
D
PLAN_TABLE
has been
replaced by an index scan using the PLAN_TABLE$PARENT_ID index, followed by
table access by RowID.

The outermost FILTER operation cannot be avoided, since there is a where clause in the
query. However, since the Statement_ID column is also used in the connect by clause, an
index on Statement_
Statement_ID
ID could also be used during
during the resolution
resolution of the query. The index
on Statement_ID is created by the command in the following listing.

Create index PLAN_TABLE$STATEMENT_ID


On PLAN_TABLE(Statement_ID);

The PLAN_TABLE$STATEMENT_ID index created in the preceding listing will be


used during the evaluation of the connect by clause, but not when applying the limiting
conditi
cond ition
on in the wheree clause. The new execution path for the query is shown in the
wher
following listing

FILTER As shown
CONNECT BY in the
INDEX RANGE SCAN PLAN_TABLE$ID_STATEMENT preceding
TABLE ACCESS BY ROWID PLAN_TABLE list
listin
ing,
g, the
the
TABLE ACCESS BY ROWID PLAN_TABLE inde
indexx scscan
an
AND-EQUAL
INDEX RANGE SCAN PLAN_TABLE$PARENT_I
PLAN_TABLE$PARENT_ID
D o f the
Parent_ID
INDEX RANGE SCAN PLAN_TABLE$STATEMENT_ID
inde
index
x has
has been
been repl
replac
aced
ed by an AND-AND-EQEQUA
UALL of thethe se
sepa
para
rate
te Pa
Pare
rent
nt_I
_ID
D and
and
Statement_ID indexes. The usefulness of the additional Statement_ID index is therefore
dependent on its selectivity; if Statement_ID is more selective than the Parent_ID index,
then indexing Statement_ID may improve your query performance. If Statement_ID is
not selective, do not index it.

To
usedimprove
used the your
in the co
conneindexing scheme
ct by clause.
nnect further, the
Consider youID
canand
create two indexes
Parent_ID on the
columns columns
used in the
examples in this section. You can create an index on (ID, Parent_ID), and a second index
on (Parent_ID, ID), alternating the leading column of the index. The optimizer would
then be able to use an index in resolving the query regardless of the direction in which
you traverse the hierarchy. That is, the index on (ID, Parent_ID) will be useful when
going down the hierarchy (finding all the children of a given parent record). The index on
(Parent_ID, ID) will be useful when going up the hierarchy (finding the parentage of a
given child record). All of the examples in this section focus on traveling down the
hierarchy from a parent record.

If both the (ID, Parent_ID) and (Parent_ID, ID) indexes are created, the execution path
woul
wo uld
d be simi
simila
larr to the
the ex
exec
ecut
utio
ion
n path
path show
shownn ea
earl
rlie
ierr in this
this se
sect
ctio
ion
n wh
whenen the
the
PLAN_TABLE.ID and PLAN_TABLE.Parent_ID columns were indexed separately. The
following listing shows the commands needed to indes the ID and Parent_ID columns
together, along with the subsequent execution path. In this listing, the old indexes are
dropped prior to the creation of the new indexes.

Drop index PLAN_TABLE$ID_STATE


PLAN_TABLE$ID_STATEMENT;
MENT; The new
The new execu
executi
tion
on path
path for
for
the
the Exp
xpllain
ain Plan
Plan ququer
ery
y is
Drop index PLAN_TABLE$PARENT
PLAN_TABLE$PARENT_ID;
_ID; shown in the following listing.

Drop index PLAN_TABLE$STATEM


PLAN_TABLE$STATEMENT_ID;
ENT_ID;
FILTER
CONNECT BY
Create index PLAN_TABLE$ID$STATE$PARENT
INDEX RANGE
On PLAN_TABLE (ID, Statement_ID, Parent_ID, ID); SCAN PLAN_TABLE$ID$STATE$PARENT
TABLE ACCESS BY ROWID PLAN_TABLE
TABLE ACCESS BY ROWID PLAN_TABLE
INDEX RANGE SCAN PLAN_TABLE$PARENT$STATE$ID
The examples in this section
show the significant
significant impact of incremental
incremental changes in the indexing
indexing strategy
strategy for connect
by queries. At the start the query used nested full table scans. By properly indexing the
columns used in the start with and connect by clauses, you can eliminate full table scans
from your connect by queries.

2.38 Limit Remote Table Accesses

Whenever a SQL statement uses a database link or a synonym for a remote object, a

portion of athe
involving SQL table
remote statement is extracted
can have to be
significant executed
impact on the
on your remote
tuning node.instead
efforts: A query
of
just tuning the main query, you now need to also tune the SQL on the remote node and
the method by which the remote tables are accessed. NESTED LOOPS joins are a
common cause of problems during remote table accesses, as you will see in the examples
in this section.

The total cost of a query that uses remote objects is the sum of the costs of the local and
remote portions
executed. of the
You need query
to tune andthe
first oflocal
the cost of the
portion of manner in followed
the query, which thebyportions are
the remote
portion. The SQL that is sent to be executed on the remote node is stored in the “Other”
column of PLAN_TABLE during explain plan commands.

Once the two separate portions of the query have been tuned, you need to consider the
method by which they are joined. In the following example, the COMPANY and SALES
tables are joined.

Select COMPANY.Name The execution path for the


From COMPANY, SALES@REMOTE1 NESTED LOOPS join query in the
Where COMPANY.Company_ID = SALES.Com
SALES.Company_ID
pany_ID preceding example is shown in the
And SALES.Period_ID
SALES.Period_ID =3 following listing.
And SALES.Sales_Total>1000;
SALES.Sales_Total>1000;
MERGE JOIN
SORT JOIN
As shown in the preceding listing, TABLE ACCESS FULL SALES
the MERGE JOIN method prevents SORT JOIN
the ite
iterat
rative
ive remote
remote accesse
accessess that
that REMOTE (TABLE ACCESS FULL COMPANY@REMOTE1)
COMPANY@REMOTE1 )
th
thee NEST
NESTEDED LOOP
LOOPS S metho
ethodd
causes; however, the cost of performing a full table scan on both tables may be great.

The query to be executed in the remote database is stored in the PLAN_TABLE. Other
column for the PLAN_TABLE record whose
wh ose Operation value is equal to ‘REMOTE’

If you have multiple remote table accesses in the same query, and no remote table can be
easily chosen as the driving table, you have two options:

 Use a view on one of the remote databases, to force the execution to be driven
from there. See Tip #4 on managing SQL statements containing views, earlier in
this chapter.
 Force every join to execute as a MERGE JOIN via USE_MERGE hints.

Using remote
Using remote tables
tables in queries
queries forces
forces you to choose
choose between
between MERGE
MERGE JOINs
JOINs and
NESTED LOOPS joins that may have a nonoptimal driving table. If the tables are large,
MERGE JOINs may not be an acceptable choice. The more remote table accesses you
perform, the more tuning issues you will encounter with them – and as described here,
there is no guarantee that you will be able to develop a successful solution. Often, your
remote table accesses will prevent the database from using its best NESTED LOOPS
combination. Once you prevent the database from using the best driving table in a
NESTED LOOPS join, the second-best option available to you may not be acceptable.

For frequently accessed remote tables, consider using Oracle’s replication options. These
include read-only snapshots and multisite ownership of data.

2.39 Manage Very La


Large
rge Table
Ta ble Accesses
Acces ses

As a table grows to be significantly larger than the size of the SGA’s data block buffer
cache, you need to approach tuning queries against that table from a different perspective.
Whereas multiple users can benefit from sharing data from small tables in the SGA, that
benefit disappears when very large tables are accessed. In the following sections you will
see how to address the tuning of queries against very large tables, using the following
methods:

 Managing data proximity


 Avoiding unhelpful index scans
 Creating fully indexed tables
 Creating hash clusters
 Creating partitioned tables and views
 Implementing parallel options

The tips within this section differ from those earlier in this chapter, since an application
with
with a very
very large
large table
table is fundam
fundamententall
ally
y differ
different
ent from
from the transa
transacti
ction-p
on-proc
rocess
essing
ing
environment typical of relational database applications. The tips provided in this section
apply only if the table being queried is significantly larger than the size of the SGA’s data
block buffer cache.

2.40
2.4 0 The
T he Problem
Prob lem

When a table and its indexes are small, there can be a high degree of data sharing within
the SGA. Multiple users performing table reads or index range scans can use the same
blocks over and over.
ov er. As a result of the reuse of blocks within the SGA, the hit ratio – a
measure of the reuse of blocks within the SGA – increases.

As a table grows, the table’s indexes grow too. As the table and its indexes grow larger
than the available space in the SGA, it becomes less likely that the next row needed by a
range scan will be found within the SGA. The reusability of data within the SGA’s data
block buffer cache will diminish. The hit ratio for the database will decrease. Eventually,
each logical read will require a separate physical read.

The SGA is designed to maximize the reuse (among multiple users) of the blocks read
from the datafiles. To do this, the SGA maintains a list of the blocks that have been read;
if the blocks were read via index accesses or via table access by RowID, those blocks are
kept in the SGA the longest. If a block is read into the SGA via a full table scan, that
block is the first to be removed from the SGA when more space is needed in the data
block buffer cache.

For applications with small tables, the data block buffer cache management in the SGA
maximizes the reuse of blocks and increases the hit ratio. What happens, though, if an
index range
long time inscan is performed
the SGA, on a very
even though it is large
likelytable? The
that no index’s
other usersblocks will
will be be to
able kept
usefor
thea
values in the index’s blocks. Since the index is large, many of its blocks may be read,
consuming a substantial portion of the available space in the SGA’s data block buffer
cache. A greater amount of space will be consumed by the table blocks accessed by
RowID, and those blocks will be even less likely to be reused. The hit ratio will begin to
drop – ironically, because index scans are being performed. The tuning methods used for
very large tables therefore focus on special indexing techniques and on alternatives to
indexes.

2.41
2.4 1 Manage
Manag e Data Proximi
Pro ximity
ty

If you intend to continue using indexes during accesses of very large tables, you concern
yourself with data proximity – the physical relationship between logically related records.
To maxim
maximize
ize data
data proxim
proximity
ity,, insert
insert record
recordss int
into
o the table
table seq
sequen
uentia
tiall
lly,
y, ordere
orderedd by
columns commonly used in range scans of the table. For example, the primary key of the
large COMPANY table is Company_ID. Accesses that use Company_ID as a limiting
condition will be able to use the unique index on the primary key. If range scans are
commonly performed on the COMPANY.Name column, the data should be stored in
order of Name values.

If the data is stored in an ordered format, then during range searches, such as

where Name like ‘AA%’

you will more likely be able to reuse the table and index blocks read into the SGA,
because all of the Name values beginning with ‘AA’ would be stored together. Fewer
index and table blocks will be read into the SGA’s data block buffer cache, minimizing
the impact of the index range scan in the SGA. Storing data in an ordered fashion helps
range scans regardless of the size of the table, but it is particularly critical for large tables
due to the negative implications of large range scans.

2.42 Avoid Unhelpful Index Scans

If you are going to use index scans against a large table, you cannot assume that the index
scan will perform better than a full table scan. Unique scans or range scans of indexes

thata are
by notaccess
table followed by tablemay
by RowID accesses perform
perform well,
poorly. As but
the atable
range scan to
grows of be
index followed
significantly
larger than the data block buffer cache, the break-even point between index scans and full
table scans decreases – eventually, if you read more than 1 percent of the rows in a
10,000,000 row table, you are better off performing a full table scan rather than an index
range scan and table access by RowID combination.

For example, the COMPANY table has a primary key of Company_ID. If COMPANY
has 10,000,000 rows, the Company_ID values range sequentially from1 to 10,000,000. In
a multiuser environment, should the following query use an index to retrieve the first
1,000,000 records from COMPANY?

select * The query


query in the precedi
preceding
ng lis
listin
tingg
from COMPANY selects 10 percent of the COMPANY
where Company_ID between 1 and 1000000; tabl
table.
e. The
The rule
rule-ba
-base
sedd opti
optimi
mize
zer’
r’ss
execution path for the query is shown
in the following listing.

TABLE ACCESS BY ROWID COMPANY Because the select clauclause


se sesele
lect
ctss all
all
INDEX RANGE SCAN COMPANY_PK colum
columns
ns,, Or
Orac
acle
le has to acces
accesss the
the tabl
tablee
after performing the index range scan. If
each index block contains entries for 100 records, you will need to scan 10,000 index
blocks (1,000,000 rows selected/10 records per table block) during the table access by
RowID – and that’s the best-case scenario. This scenario assumes that the data stored in
ordered fashion, so that no data is stored in the table blocks read except for the data
required to resolve the query.

If the data is not stored in an ordered fashion, you may need to access 1,000,000 table
blocks (1 block read per row for 1,000,000 rows selected) – for a total of 1,010,
1,010,000
000 block
accesses. Depending on data proximity, the total number of blocks read will therefore
range between 110,000 and 1,010,000. Because these blocks are read via index range
scans, they will be kept in the SGA for as long as possible. For a 2K block size, that
means that this query would try to use and retain at least 220MB (2K per block * 110,000
blocks) of its blocks in the data block buffer cache. Unless you have an extremely large
SGA, this query’s block reads would effectively remove all other data from the data
block buffer cache.

Conversely, how many blocks will be read during a full table scan? If there are 10
records per block, a full table scan will read 1,000,000 blocks. Not only is that fewer than
the worst-case index range scan/table access by RowID combination, but also the blocks
read by the full table scan are not marked for long-term storage in the SGA. The blocks
read by the full table scan into the data block buffer cache are removed from the SGA as
quickly as possible – thus reducing the query’s impact on other users in the database. The
only blocks held in the SGA because of the full table scan are those blocks read in via the
last read; the number of blocks read is determ ermined by the setting of the
DB_F
DB _FIL
ILE_
E_MU
MULT LTIB
IBLO
LOCK
CK_R_REA
EAD_
D_CO
COUN UNTT INIT
INIT.O
.ORA
RA pa para
rame
mete
ter.
r. Ta Tabl
blee 2.4
2.4

summarizes
percent theexample.
query” block usage and data block buffer cache usage for the COMPANY “10
What if only 1 percent (1000,000 rows) of the records had been read? The full table scan
would still require 1,000,000 block reads. The index range scan would scan 1,000 blocks.
If each table block contained 10 records, you would need between 10,000 and 100,000
table block reads, depending on data proximity. The total for the index range scan/table
access by RowID combination is therefore between 11,000 and 101,000 blocks – between
22MB and 202MB of space used in the data block buffer cache (if each block is 2K in
size). Table 2.5 summarizes the block usage and data block buffer cache usage for the
COMPANY “1 percent query” example.

Method Best-Case Worst-Case Marked as Blocks Held in


Blocks Read Blocks Read Most-Recently SGA
Used
Index Range
nge 10,000 10,000 Y 10,000
Scan
Tabl
Tablee Acce
Accesss 100,00 1,000,000 Y <=1,000,000
by RowID
======= ======= =======
110,000 1,010,000 <=1,010,000
Full Table Scan 1,000,000 1,000,000 N (multiblock
read count)
Table 2-4Block Reads and Memory Usage
Table Usage for Large Table
Table Access
Access Method
Methodss for 10
Percent Scan
Scan of a Very Large
Large Table
When comparing the two methods results in Tables 2.4 and 2.5, you must consider the
rightmost column: the number of blocks that will be held long-term in the SGA by the
data read by the query. If the query is run in batch mode with no other users in the
database,
databa se, it may be acceptable
acceptable to completely remove all the other data from the SGA. If
there are multiple users of the database, the performance cost of keeping the data in the
SGA could be so great for the other users of the database (due to the size of the table and
indexes) that you may be better off using the full table scan method even when querying
just 1 percent of the table.

Method Best-CaRead
Blocks se Worst-Case
Blocks Read Marked as Blocks
Most-Recently SGA Held in
Used
Index Range
nge 1,000 1,000 Y 10,000
Scan
Tabl
Tablee Acce
Accesss 10,000 1,000,000 Y <=1,000,000
by RowID
======= ======= =======
11,000 101,000 <=1,010,000
Full Table Scan 1,000,000 1,000,000 N
(multiblock
read count)
Tablee 2-5 Block Reads
Tabl Reads and Memory
Memory Usage
Usage for Large Table Access Methods
Methods for 1
Percent Scan of a Very Large Table

In addition to the cache management issues, the full table scan method also benefits by
Oracle’s multiblock read mechanism. As described earlier, multiblock reads allow for
quick data access during full table scan. The one-consistent-get-per-block feature used
during full table scans also helps improve the performance of full table scans.

As noted earlier, favoring full table scans is not a typical tuning method. However, if a
table is so large that its index-based accesses will not overwhelm the SGA, you should
consider using a full table scan for queries of very large tables in multiuser environments.

2.43 Create Fully Indexed Tables

To improve the use of indexes during large table accesses, you need to eliminate two
operations: range scans and subsequent table accesses.
accesses. The cost of index-only accesses
to the SGA can still be high – as illustrated in Table 2.5, an index-only access of 1
percent of a very large table could still require 1,000 index blocks to be read into the
SGA (at a 2K block size, that equates to 2MB worth of index blocks). To achieve these
index-related goals, you can fully index the table.

Fully indexing a table is useful if the table’s data is fairly static. Create a concatenated
index that contains all of the columns commonly selected during queries. During a query,
all of the data requested by the query can be provided via index accesses, and no table
accesses will be necessary.

If you have an associative table, which is used to maintain many-to-many relationships


between major tables, the associative table’s primary key is usually the composite of the
primary keys of the tables it relates. For example, if there is a many-to-many relationship
between COMPANY and PRODUCT, an associative table for those two tables (named
COMPAN
COM PANY_PY_PROD
RODUCTUCT)) will
will have the primar
primaryy key (Compa
(Company_I
ny_ID,
D, Product
Product_ID
_ID).
). If
associativ
associativee tables
tables exist,
exist, create
create two concatenated
concatenated indexes,
indexes, one with each table’s
table’s primary
primary
key as the leading set of columns. In the case of the COMPANY_PRODUCT, you should
create one index with Company_ID as the leading column (Company_ID, Product_ID)

and a second
enable index with
the optimizer to Product_ID as a leading
use either table as the column
driving (Product_ID, Company_ID),
table in the join. to
For further
information on indexing for associative tables, see the section on indexing many-to-many
relationships in Tip #3, “Manage Multi-Table Joins (NESTED LOOPS, MERGE JOINs,
and HASH JOINs).”

2.44
2.4 4 Create
Crea te Hash
Ha sh Cluster
Clu sterss

In a hash cluster, the order in which records are inserted into a table does not matter; the
physical location of a record is determined by its values in key columns. A hashing
function is applied to the key values for the row, and Oracle uses the result to determine
in which block to store the record. The rows within the hash cluster are quickly spread
across all of the allocated blocks.

The use if a hash cluster eliminates the need for an index during a simple primary key
value lookup. Hash clusters are useful under certain conditions, as described in the
following sections.

2.45 If Equivalence
E quivalence Queries Are Used

An equivalence query contains a where clause that sets the primary key equal to a
specific value, as shown in the following listing

where Company_ID = 1234

When the query is resolved, the optimizer will apply the hashing function to the value
specified in the where clause (in this case 1234). The hashing function will provide the
physical location of the row that matches the value in the where clause. The row can then
quickly be located and returned to the user.

2.46 If There Is No
N o Way to Enforce Row
Ro w Proximity
Proximit y

If the rows are stored order, an index on the column will perform almost as well as a hash
cluster for equivalence queries, and will use less storage space. If there are mutiple
successive equivalence queries of a range of values, and the rows are not in sorted order,
then the index blocks read into the SGA’s data block buffer cache will be less likely to be
reused. If data proximity is poor, there is little chance of index block reusability. If you
cannot reuse the index blocks read into the SGA, a hash cluster may be appropriate.
2.47 If
I f Space
Spac e Allocat
Allocations
ions Is
I s Not a Prob
Problem
lem

A hash cluster will typically require about 50 percent more space than a comparably
indexed table.

To create a hash cluster, you must first create a cluster, as shown in the following listing.
The size parameter specifies the amount of space allocated to each “bucket” in the table.
Note that the addressing
add ressing of blocks in a cluster is done by addressing the bucket – that is,
each bucket gets a RowID. Each bucket in turn cab be smaller or larger than a database
block. Optimally, each bucket house one row. If more than one row is designated to fit in
the same bucket and there is not enough space in that bucket (the size of the two rows is
greater than the value of the size parameter), Oracle calls this a collision and will allocate
an overflow bucket (usually in a different database block).

Since there is no separate addressing to individual rows in the bucket regardless of how
many database blocks are included, Oracle is forced to read into memory the entire
bucket when searching for any of the rows in the bucket. Therefore, one of the biggest
challenges is how to set the size parameter correctly. You should set the size to be a
multiple (or division) of your database block size. For example, if your row size is about
850 bytes in length and your database block size is 4KB, set your size parameter to 1,000
bytes. This means that four buckets will fit in one data block. If you set your bucket sieze
to be 850 still only four buckets would fit into a database block. However, if occasionally
you have a row that is larger than 850 bytes, tow buckets will be required to hold that
row, often in a different database block, causing two reads when retrieving. To avoid
collisions – multiple hashed rows stored in the same bucket and creating overflow blocks,
you should allocate more space in the hash cluster and use a hash function that will
spread the data homogenously throughout the cluster. The hashkeys parameter specifies
the expected number of rows with the hashed table. See the Oracle Server Concepts
Manual for information on the hashkeys and size parameters.

The storage parameters


create cluster
storage (initialCOMPANY_CLUSTER
50 next 50M) (Company_ID NUMBER (12) ) for the hashed table are
hash is Company_ID specified at the cluster
size 60 hashkeys 1000000; level, as shown in the
preceding listing.

Once the cluster has been created, create the table within the cluster, as shown in the
following listing.

create table COMPANY The table


storag
storage will
e parame
par be
terscreated
ameter s of theusing the
cluste
cluster,
r,
(Company_ID NUMBER(12)
NUMBER(12) ),
Name VARCHAR2 (20) ),
Address VARCHA
VARCHAR2R2 (20) ),
cluster COMPANY_CLUSTER
COMPANY_CLUSTER (Company_ID) ;
and the Company_ID column will be used as the hashed column for the COMPANY
table.

As of Orcale7.2, you can specify a hashing formula for the cluster. Specifying your own
hashing formula is particularly useful for applications that use system-generated sequence
numbers for their tables primary keys. The hash formula in that case would be the
sequence column itself. Specifying a hash formula on a sequence column may eliminate
many of the space and collision concerns associated with hash clusters. The more records
that have the same hash value, the less efficient the hash cluster is. If the number of such
“collisions: is unpredictable, use a fully indexed table instead.

The challenge in creating a hash cluster is twofold: setting the correct size and using a
hash
has h fo
form
rmul
ulaa th
that
at wi
will
ll caus
causee homo
homoge
genou
nouss dist
distri
ribut
butio
ion
n of data
data.. If the
the tabl
tablee us
uses
es a
sequence-generated number for its primary key, this key can be used as the hash formula
and you only need to worry about sizing the bucket correctly. Note that hash clusters can
hold multiple tables and you can set the hash formula intentionally to have many rows fit
in the same bucket. For example, in a master-detail table configuration, you can have
both tables reside in the same hash cluster. If you use the master portion
p ortion of the keys as
the hash formula, the master and all its detail rows will reside in the same bucket. This
technique should only be employed if the number of detail records per master record is
predetermined and the total size of the master and all its details fit in one database block.
If this is not the case (as in most business tables), then the performance degradation of
manipulating overflow buckets offsets the gains for data proximity.

2.48 Create Partitioned


Partiti oned Tables
If a large table has several logical partitions – groups of records that share a common
value for a key column – you may able to break it into several smaller tables. These could
be queried
q ueried together via a union all involving separate queries against each of the smaller
tables. Partitioning data based on its column values is called horizontal partitioning.

NOTE

You can use Oracle’s partitioning capability, described previously, to partition tables and
indexes when they are created. In this section, you will see a different method for
implementing partitioning.

For exampl
example,
e, conside
considerr the SALES
SALES table,
table, which
which has three
three column
columns:
s: Company
Company_ID_ID,,
Period_ID, and Sales_Total. There is a logical division of records among Period_ID
value
values.
s. If user
userss fr
frequ
equen
entl
tly
y requ
reques
estt data
data for
for ju
just
st one Pe
Peri
riod
od_I
_ID
D valu
value,
e, it may
may be
appropriate to create multiple SALES tables, one for each Period_ID or set of Period_ID
values.

If there are only four valid Period_ID values (for example, one for each quarter of the

year),
small
sma eryou
ller tabmay
tables choose
les simpli
simplifiestothe
fies create
admfour
admini smaller
inistr
stration oftables
ation fromSthe
the SALES
SALE datSALES
data, sincetable.
a, since Creating
you can now
truncate or replace one quarter’s worth of the SALES data without directly affecting the
rest of the data. You could also dynamically modify the union view to include or exclude
particular quarter’s tables.

When creating the quarter’s tables, you need to create a CHECK constraint on the
column used as the basis for partitioning. The create table command for Period 1’s data is
shown in the following listing.

Create table SALES_PERIOD_1


SALES_PERIOD_1 As shown in the preceding listing,
(Company_ID, Period_ID, Sales_Toatl) as there are several steps involved in
select Company_ID, Period_ID, Sales_Total properly creating the
from SALES SALES_
SAL ES_PER
PERIOD
IOD_1_1 table.
table. First,
First,
where Period_ID = 1; the tabl
able is cre
created.
ed. Next, a
CHECK
CHE CK constr
constrain
aintt is create
createdd on
alter table SALES_PERIOD_1
add constraint CHECK_SALES_PERIOD
CHECK_SALES_PERIOD_1
_1 the column
column used
used for parti
partitio
tionin
ning.
g.
check (Period_ID = 1) ; Index
Indexes
es are
are then
then creat
created
ed for
for the
the
primary key and the partition
alter table SALES_PERIOD_1 column for the
add constraint CHECK_SALES_PERIOD
CHECK_SALES_PERIOD_1_PK
_1_PK SALES_PERIOD_1 table.
primary key (Company_ID, Period_ID) ;
create index SALES_PERIOD_1$PERIO
SALES_PERIOD_1$PERIOD_ID
D_ID
You mus
You ustt crea
creatte the CHEC
CHECK K
on SALES_PERIOD_1 (Period_ID) ; co
cons
nstr
trai
aint
nt on the the ne
newl
wly
y crea
create
ted
d
table. If you create fours SALES
PERIOD tables, and then create a
view that performs a union across the four tables, then queries against the view can be
optimi
optimized
zed via a techni
technique
que called
called parti
partitio
tion
n elimin
eliminati
ation.
on. In partit
partition
ion elimi
eliminati
nation,
on, the
optimi
optimizer
zer uses
uses the CHECK
CHECK constr
constrain
aints
ts on the SALES_
SALES_PERPERIODIOD tables
tables to determ
determine
ine
which tables could not have records that meet the query’s where criteria. For example,
the following listing shows the create view command for a view named SALES_ALL
that performs a union across four SALES_PERIOD tables.

create view SALES_ALL as If you query SALES_ALL using a Period_ID value as a


select * from SALES_PERIOD_1 li
limi
miti
ting
ng condi
conditi
tion,
on, most
most of the
the tabl
tables
es wi
will
ll not
not be
union all involved in returning rows to the query. The query in
select * from SALES_PERIOD_2 th
thee foll
follow
owin
ing
g list
listin
ing
g se
sele
lect
ctss only
only rows
rows that
that ha
have
ve a
union all Period_ID value of ‘1’.
select * from SALES_PERIOD_3
union all
select * from SALES_PERIOD_4; select * from SALES_ALL
where Period_ID = 1
and Company_ID > 10000;

The query in the preceding listing will only return records from the SALES_PERIOD_1

table; therecan
optimizer is no reason
use for the query
the CHECK to read
clauses from any other
to determine whichSALES_PERIOD table.
tables should not The
be used
during the query resolution. Partitioned tables whose CHECK conditions prevent them
from being used to return records to the query will not be scanned.

If your queries are based on strict partitions of data (whether they are related to exact
values
values or exact
exact ranges
ranges of values
values),
), partit
partition
ion elimin
eliminati
ation
on can signif
significa
icantl
ntly
y reduce
reduce the
number of data blocks that must be read to resolve a query. If you cannot establish the
partition boundaries, or if queries frequently cross many partitions, you may not see
performance benefits from this design. The greatest performance benefits from a
partitioned table design come from partition elimination.

Other very important management benefits of a partitioned design include:

 In the event of disaster, recovery could be much faster than if the table were not
partitioned.
 It is easier to back up the table’s partitions than to back up the entire table.
 You can improve I/O balancing by placing different portions of the table on
different disk drives.

As noted
noted earli
earlier
er in this
this sectio
section,
n, you can use the partit
partition
ioning
ing featur
features
es introd
introduced
uced in
Oracle8 to manage the process of horizontally partitioning your tables.

NOTE

When partitioning a table or an index, you should always keep in mind the business rules
that govern the distribution of values within the table or index. Your partitions should
reflect the true distribution of data within the table.

2.49 Implement the Parallel


P arallel Options

As noted earlier in this section, there are a number of conditions under which full table

scans
scans,are
youpreferable to index
can use the range
Parallel scans.
Query To improve
option. the performance
The performance benefitofgained
index and
fromtable
the
Parallel Query option is particularly significant for full table scans. For full information
on the administration and implementation of the available parallel options, see Chapter
12.

2.50 Use UNION


UNI ON ALL Rather than UNION

One of the most commonly used set operations is the UNION operation. As shown,
UNION operation concatenates multiple sets of rows into a single set. The mathematical
definition of the UNION operation calls for returning a single set of rows with no
duplicates, so Oracle returns only distinct rows in the merged result set.
The uniqueness requirement forces Oracle to remove the duplicate rows when a UNION
operation is used as part of a SQL statement. The only Oracle function that removes
duplicates is the SORT UNIQUE operation, which is similar to the operation executed
when you use the DISTINCT clause.

The UNION ALL operation allows for duplicates. The fact that UNION ALL does not
require a SORT UNIQUE operation saves significant overhead. UNION ALL is a row
operation, so rows are returned to the user as soon as they are available. UNION, by
contrast, includes the SORT UNIQUE set operation and therefore does not return any
rows to the user until the entire set of rows has been sorted.

When the UNION ALL operation generates large result sets, the fact that no sort is
needed to return the rows to the application means faster response time for retrieval of the
first row and in many cases the operation can complete without using a temporary
segment. A notable advantage for UNION ALL over UNION is the behavior of views
contain
cont aining
ing UNION
UNION ALL.ALL. When
When trying
trying to joi
join
n a view
view contai
containin
ning
g UNION,
UNION, the sort sort
operation is executed before the join can be performed. Therefore, any limiting criteria
that apply to the joined table will not be taken into account before the entire UNION is
resolved. In a UNION ALL view there is no such issue.

The foll
The ollowin
owing
g li
list
stiing crea
creattes a view
view ba
base
sed
d on the UNIO
NION of tabl
ables na
name
med
d
SALES_PERIOD_1 and SALES_PERIOD_2.

create view SALES_UNION as You can


You can now
now join
join the
the SALE
SALES_ S_UN
UNIO
ION
N view
view to a
select * from SALES_PERIOD_1 different table, as shown in the following listing.
union
select * from SALES_PERIOD_2;
select *
from SALES_UNION, PRODUCT
where PRODUCT.Product_ID = SALES_UNION.Product_ID
The explain plan for the and PRODUCT.Product_Name
PRODUCT.Product_Name = ‘YO-YO’ ;
preceding query is shown in the
following listing.

SELECT STATEMENT As shown in the


NESTED LOOPS preceding code
SORT UNIQUE segment, the
UNION ALL join between the
TABLE ACCESS FULL SALES_PERIOD_1 SALES_UNION
TABLE ACCESS FULL SALES_PERIOD_2
view and the
TABLE ACCESS BY ROWID PRODUCT
INDEX QUEUE SCAN PRODUCT_PK; PRODUCT
table is
performed after
both sales period tables have been scanned. The fact that we are only looking for product
name ‘YO-YO’ does not improve the selectivity of the previous query.
If the same view is created with UNION ALL operation in place of UNION, the result
would be quite different. The following listing shows the create view command and the
test query.

create view SALES_UNION_ALL as The expl


The explai
ain
n plan
plan for
for the
the
select * from SALES_PERIOD_1 query is shown in the
union all following listing.
select * from SALES_PERIOD_2

select *
from SALES_UNION_ALL, PRODUCT
where PRODUCT.Product_ID = SALES_UNION_ALL.Product_ID
and PRODUCT.Product_Name
PRODUCT.Product_Name = ‘YO-YO’ ;

SELECT STATEMENT
UNION ALL
NESTED LOOPS
TABLE ACCESS BY ROWID PRODUCT
INDEX UNIQUE SCAN PRODUCT_PK

TABLE ACCESS BY ROWID


INDEX RANGE SCANSALES_PERIOD_1
SALES_PERIOD_1$PRODID
NESTED LOOPS
TABLE ACCESS BY ROWID PRODUCT
INDEX QUEUE SCAN PRODUCT_PK
TABLE ACCESS BY ROWID SALES_PERIOD_2
INDEX RANGE SCAN SALES_PERIOD_2$PRODID

As you can see in the explain plan, the join is operated on each of the UNIONed tables
before the UNION ALL is resolved. Performing the join in this manner speeds up the
operation significantly as only a handful of rows are returned from each UNIONed part
(rather than the entire tables).

NOTE

UNION ALL and UNION do not return the same results in all situations. You can only
convert UNION to UNION ALL if the UNIONed sets do not have any rows in common.
In many business environments, the result sets do not contain any duplicates (such in the
case of UNIONing goods shipped and goods still stored in the warehouse). In those cases,
UNION ALL will be more efficient then UNION.

2.51 Avoid
Av oid Using PL/SQL F
Function
unction Calls
C alls in SQL
S QL

With the proliferation of PL/SQL usage, many users try to take advantage of PL/SQL
functions to generate reusable code. One of the most compelling ways to reuse PL/SQL
functions is inside of SQL statements. For example, you can create a function that
converts international currency to US$. The function can be named US$, with input
variables of amount and currency. You can use this function in a select statement to query
a table with mixed currencies to find all the transactions that are larger than US$1,000, as
shown in the following listing.

select Transaction_Type, US$(Amount, Currency) The preced


preceding
ing SQL statemen
statementt did not
from INTERNATIONAL_TRANSACTION perform as well as expected. In tests, its
where US$(Amount, Currency) > 1000; performance was about 44 times slower
than the following SQL statement that
yields the same result:

select Transaction_Type, The reason for the


Amount*Excahnge_Rate US$ differ
differenc
encee in respon
response
se
from EXCHANGE_RATE ER, INTERNATIONAL_TRANSACTION IT time is the mechanism
where ER.Currency = IT.Currency Oracle uses when
and Amount*Exchange_Rate > 1000; mi
mixi
xing
ng PL/SQ
PL/SQL L wi
with
th
SQL. When embedding
a PL/SQL function inside a SQL query, Oracle separates the calls at execution time into
two components; a SQL statement with bind variables in place of the function calls and
anonymous PL/SQL block for each function call:

select Transaction_Type, :a1 The an


The anon
onym
ymou
ouss bloc
blockk show
shown n in the
the
from INTERNATIONAL_TRANSACTION previous code sample will be executed
where :a2 > 1000; twice for every row in the
INTERNATIONAL_TRANSACTION
and table (once for the US$ function call in
the select clause and once for the call in
BEGIN
:a1 := US
US$ ((::Amount, :C
:Curr
urrency) ; the where clause). The invocation of the
END; anon
anonym
ymou
ouss bloc
blockk cacaus
uses
es the
the sha
harrp
increase in response time for the query.
You
Yo u sh
shoul
ouldd avoi
avoid
d the
the us
usee of PL/S
PL/SQL
QL
function calls within your SQL statements unless you can bear the performance costs.

2.52 Manage
M anage the
t he Use of
o f Bind Variables

Oracle’s shared SQL area caches the SQL statements based on the full syntax of the
statement. Similar (but not identical) SQL statements will consume a lot of space in the
shared SQL area, since each must be cached separately.
separately. The caching
caching of multiple
multiple similar
similar
statements will cause the effectiveness of the cache to diminish. If there are many SQL
statements that are submitted from batch jobs, the volume of SQL statements may erode
the performance of the shared SQL area.
In many batch jobs, transactions are read in from a flat file and applied to the database
one by one. As a developer who is programmatically loading these values, you can use
one of tw
twoo te
techn
chniq
ique
uess to apply
apply the
the tr
trans
ansac
acti
tion
ons.
s. You cacan
n eith
either
er us
usee the
the exec
execut
utee
immedi
imm ediate
ate techni
technique
que or the execute
execute using
using techni
technique.
que. With
With the execut
executee immed
immediat
iatee
technique, you must create a character string that contains the entire SQL statement as if
it was entered in SQL*Plus, as shown in the
get (work_string); following listing.
exec sql execute immediate :work_string; Althou
Alt hough
gh the execute
execute immedi
immediate
ate techni
technique
que is
easy
easy to implmplemen
ementt, ea
each
ch SQL
SQL stat statem
emen
entt
submitted to the database will be stored in the shared SQL area separately. If a batch job
processes 10,000 rows, then 10,000 different SQL statements will be sent to the shared
SQL area, probably overriding any other SQL statement already there. As a result, the
effectiveness of the shared SQL area will diminish.

To implement the execute using technique,


technique, the programmer
programmer has to do more work. In the
execute using technique, a cursor is created and the content of the columns is bound into
the SQL. However, if the execute using technique is used, only one syntax us used and
the share SQL area is not
n ot overwhelmed by many different SQL statements. The following
listing shown an example of the use of the execute using technique.

work_string : string(1..200); Because of their effect on the shared SQL


worker_name : string(1..25); area, you should use bind variables and the
get(work_string); ex
exec
ecut
utee us
usin
ing
g tech
techni
niqu
quee in plac
placee of the
the
exec sql prepare ADAH from work_string ex
execu
ecute
te imimmemedi
diat
atee tech
techni
nique
que.. Howe
However
ver,,
exec sql execute ADAH using :worker_name; ther
here are cases in which the execu ecute
immedi
imm ediate
ate techni
technique
que is prefer
preferabl
ablee to the
execute using technique. In the “When a Full Table Scan Is Used” section of Tip #1, you
can see that when you use a bind variable,
variable, the execution plan can be diffdifferent
erent than if the
value was explicitly specified. The impact on the explain plan is important of the value
used can determine the final plan. For example, if a like clause then is used, the execution
plan can become an index range scan or a full table scan, depending on the condition
value:

 Leading ‘%’ causes a full table scan (Column like ‘%value%’)


 Trailing ‘%’ can use an index range scan (Column like ‘Value%’)
 No ‘%’ can use a unique scan if the index is unique (Column like ‘Value’)

Since the SQL execution plan is determines at the parse phase, the value of any bind
variable is not yet available. The result is that the Oracle optimizer will not yet know if an
index is available to resolve the condition, and a full table scan will usually be chosen.

The prob
The proble
lem
m wi
with
th th
thee Or
Orac
acle
le optim
optimiz
izer
er choos
choosin
ing
g a full
full tabl
tablee sc
scan
an is es
espec
pecia
iall
lly
y
problematic in OLTP applications when the user has the option to use the ‘%’ wild card
in a query screen. The like operation is typically coded into the application regardless of
the value the end-user will eventually use. The result is always a full table scan.
A partial solution for client applications using the like clause is to use the between clause
in its place. However, this solution can be used only if the user cannot use the ‘%’ wild
card as the first character. Using the between clause is most suitable for applications in
which the user only types in the first few characters of the value to query. For example,
the following query using the like operation will be translated to between:

Select * -- Will NOT use the index on Company_Name


If Company_Name
Company_Name is compared
compared to a
From COMPANY value, then the optimizer could use
Where Company_Name likw :name||’%’; an inde
indexx on Comp
Companany_
y_Na
Name
me to
resolve the preceding query. Since
Company_Name is compared to a bind variable, the optimizer cannot use an index on the
Company_Name column.

You can modify the query to a form that can use an index on Company_Name, as shown
in the following listing

Select * -- Will use the index on Company_Name NOTE


From COMPANY
Where Company_Name between :name and :name||CHAR(255) ; This ex
This exam
ampl
plee as
assu
sume
mess
that the value for the bind
variable name does not contain the ‘%’ wild card.

The earlier example shows that the like clause will cause a full table scan to be used. To
avoid using the like clause, the modified query uses a between clause that simulates the
like clause. To construct the between clause, use the value given by the user as the low
value
value of th
thee range
range and ththee same
same valu
valuee conc
concat
aten
enat
ated
ed wi
with
th the
the char
charac
acte
terr ma
maxv
xval
alue
ue
(CHAR(255)) as the high value of the range.

The problem with bind variables exists in all the situations where the actual value given
in the statement determines the plan. For example, the histogram feature of the cost-based
optimizer uses the data values to determine the usefulness of an index over a range of
values. When using bind variables,
v ariables, the histogram is not taken into account.

2.53 Revisit the T


Tuning
uning Process

As stated in this book, change is constant. Tables get larger, data may be moved, and new
options may be available within the database. A well-tuned NESTED LOOPS join may
work fine now, but if one of the tables is moved to a remote server, you may need to
force the use of a MERGE JOIN or a HASH JOIN instead. As your tables grow in size,
the rate at which they grow may not be uniform;
uniform; as a result,
result, you may need to reevaluate
the order of tables within a NESTED LOOPS join.

To make matters worse, the data is changing within the tables – and the statistics that the
cost-based
cost-b
the asedmay
query optimizer
optimizer used
change over to determine
deter
time. mine
As is optimal
optimalchange,
the statistics execution path during
the query’s the creation
creat
execution pathion of
may
change. Suddenly, a query that was once acceptable without manual tuning intervent
intervention
ion
may require tuning.

The chang
The changes
es to the
the tabl
table’
e’ss volu
volume
me and da data
ta conte
content
ntss are
are larg
largel
ely
y in the
the hands
hands of
application users, so it is difficult to predict when the optimizer’s chosen path will
change. To manage the tuning changes as your application grows, schedule a routine
audit of your application’s queries. Determine which queries have nonoptimal execution
paths. When queries use these paths, check the appropriate section of this chapter for
information on how to manually intervene in the selection of the queries execution paths.
After correcting the query’s tuning problem, schedule a time to check the query again –
for unless its tables are static, the query’s performance is constantly subject to change.

Tuning is not a simple process, but if you use the proper approach, it can quickly yield
great benefits. The first chapter in this section, provided five steps to follow during the
tuning process, in the following order.

1. Ensure
Ensure a consist
consistent
ent produ
producti
ction
on envi
environ
ronmen
ment.
t.
2. Categorize
Categorize the applic
application
ation’s
’s transac
transactions
tions and queries
queries
3. Fo
Focus
cus on offe
offens
nsiv
ivee SQL
SQL
4. Tune specific
specific long-runnin
long-running g processe
processess and
and batch
batch jobs.
jobs.
5. Tune
Tune the
the access
access to indi
individ
vidual
ual tabl
tables
es

Given that five-step method for approaching tuning, you can use the information in this
chapter. If your production environment is consistent, you should be able to identify the
few SQL statements that cause most of the burden on your system. By focusing on those
statem
statements
ents and the applic
applicati
ations
ons that
that call
call them,
them, you can acquir
acquiree great
great perfor
performan
mancece
benefits.

Because of changes within the systems, though, you will need to revisit the tuning
process every time you make a change to an application’s database structures. For
example, adding an index may help one query, but may change the execution path of an
unrelated query. If a query has been tuned, be sure to use hints or comments to record the
proper execution
e xecution path for your query. If you properly comment your
you r work, future tuning
efforts will be less likely to undo the performance
p erformance benefits you gained.

Within this chapter, the first focus was on the proper use of indexes and joins. If you use
indexes and manage your joins correctly, you will resolve most of the tuning problems
you are likely to encounter. As your system grows and moves over time, you can refer to
the sections on managing large tables and remote tables for tuning advice specific to
them. To avoid tuning queries while in crisis mode, schedule regular audits; if a major
change is about to occur in the system, proactively evaluate its impact on your prior
tu
tuni
ning
ng ef
effo
fort
rts.
s. Once
Once th
thee sy
syst
stem
em has
has been
been reev
reeval
alua
uate
ted
d an
and
d provi
provides
des a cons
consis
iste
tent
nt
production environment, you can again focus on tuning the most offensive SQL.

You might also like