Chap 12
Chap 12
• Overview
• Catalog Information for Cost Estimation
• Measures of Query Cost
• Selection Operation
• Sorting
• Join Operation
• Other Operations
• Evaluation of Expressions
• Transformation of Relational Expressions
Database
%
12.1
Silberschatz, Korth and Sudarshan ◯c
' $
Basic Steps in Query Processing
2. Optimization
3. Evaluation
Optimizer
& %
Data Statistics
About Data
Database
12.2
Silberschatz, Korth and Sudarshan ◯c
' $
Basic Steps in Query Processing (Cont.)
Evaluation
& Database
%
12.3
Silberschatz, Korth and Sudarshan ◯c
' $
Basic Steps in Query Processing
& %
<
one2500,
with orpossible
cheapest can perform complete relation
evaluation-plan. scan and
Cost estimate of
discard accounts
a plan on
based statisticalwith balance
information in≥the
2500
DBMS
catalog.
Database Systems 12.
Silberschatz, Korth and Sudarshan ◯c
Concepts 4
' $
Catalog Information for Cost Estimation
& %
nr
br
• If tuples of r are stored together physically in a
= f r
file, then:
Database
12.5
Silberschatz, Korth and Sudarshan ◯c
' $
Catalog Information about Indices
Database
%
12.6
Silberschatz, Korth and Sudarshan ◯c
' $
Measures of Query Cost
& %
• We refer to the cost estimate of algorithm A as EA. We
case estimates.
do not cost of writing output to
include
disk.
Database Systems 12.
Silberschatz, Korth and Sudarshan ◯c
Concepts 7
' $
Selection Operation
& 12.
8
' $
Selection Operation (Cont.)
& %
– Equality condition on a key attribute: SC(A, r ) = 1;
estimate
reduces to E A2 = [log
2 r
(b )|
Database Systems 12.
Silberschatz, Korth and Sudarshan ◯c
Concepts 9
' $
Statistical Information for Examples
Database
%
12.10
Silberschatz, Korth and Sudarshan ◯c
' $
Selection Cost Estimate Example
σbranch-name=“Perryridge”(account)
&
2. Therefore, 2 index blocks must be read.
%
• This strategy requires 12 total block
reads.
Database
12.13
Silberschatz, Korth and Sudarshan ◯c
' $
Selections Involving Comparisons
& %
where c is defined as before. (Linear
c file scan may be
cheaper
if c is
large!)
Database Systems 12.14
Silberschatz, Korth and Sudarshan ◯c
Concepts
' $
Implementation of Complex Selections
& %
tuples:
nr − size(σθ
(r ))
%
have appropriate
Applicable if haveindices,
all conditions available apply test in
indices. memory.
Otherwise
use linear
& scan.
Database Systems
Concepts
12.16
Silberschatz, Korth and Sudarshan ◯c
' $
Example of Cost Estimate for Complex Selection
& %
reads.
– Thus using branch-name index is preferable, even
use the balance
though
index. its condition is less selective.
– If both indices were12.17
Database Systems
Concepts
non-clustering, Silberschatz,
it would Korthbe
and Sudarshan ◯c
' $
Example (cont.)
& %
– Estimate that one tuple in 50 ∗ 500 meets both
– conditions. Since naccount
The total estimated cost=of10000, conservatively
this strategy is five
block reads.
overestimate that S1 ∩ S2 contains one
pointer.
Database Systems 12.18
Silberschatz, Korth and Sudarshan ◯c
Concepts
' $
Sorting
& 12.1
9
' $
External Sort–Merge
%
(c) Delete the record from the buffer page; if the buffer
(b)
page Write
is the record to the output
&
Concepts
empty, read the next block (if any) of the run into the
Database Systems 12.20
Silberschatz, Korth and Sudarshan ◯c
' $
Example: External Sorting Using Sort–Merge
a 19
a 19
g 24 d 31 a 14
b 14
a 19 g 24 a 19
c 33
d 31 b 14
b 14 d 31
c 33 c 33
c 33 e 16
b 14 d 7
e 16 g 24
e 16 d 21
r 16 d 21 d 31
a 14
d 21 m 3 e 16
d 7
m 3 r 16 g 24
d 21
p 2 m 3
m 3
d 7 a 14 p 2
p 2
a 14 d 7 r 16
r 16
p 2
Initial Sorted
& %
Relation Runs Runs Output
Create Merge Merge
Runs Pass1 Pass2
& %
write out results)
b (2[log ( b r /M )| +
– Total number of rmerge M−1 passes required:
1)
[log M − 1 ( b r /M)|.
Database Systems 12.22
Concepts Thus total number of disk accesses forSilberschatz,
external sorting:
Korth and Sudarshan ◯c
' $
Join Operation
& 12.2
3
' $
Join Operation: Running Example
Running example:
depositor 1
customer
= 10, 000.
• n
Catalog information for join examples:
customer
& %
Also assume that customer-name in depositor is a foreign
key on
customer
.
Database Systems 12.2
Silberschatz, Korth and Sudarshan ◯c
Concepts 4
' $
Estimation of the Size of Joins
• The Cartesian product r × s contains nr ns tuples; each
tuple
occupies sr + ss bytes.
• If R ∩ S = ∅, then r 1 s
is the same as r × s.
• If R ∩ S is a key for R, then a tuple of s will join with
at most one tuple from r ; therefore, the number of
tuples in r 1 s is no greater than the number of tuples
in s.
If R ∩ S in S is a foreign key in S referencing R, then
the number of tuples in r 1 s is exactly the same as the
number of tuples in s.
The case for R ∩ S being a foreign key
referencing S is symmetric.
& %
• In the example query depositor 1 customer , customer-
name in depositor is a foreign key of customer; hence,
the result has exactly ndepositor tuples, which is 5000.
Database
12.25
Silberschatz, Korth and Sudarshan ◯c
' $
Estimation of the Size of Joins (Cont.)
&
(A, r )
The lower of these two estimates is probably
the more accurate one.
Database
12.26
Silberschatz, Korth and Sudarshan ◯c
%
' $
Estimation of the Size of Joins (Cont.)
& Database
%
12.27
Silberschatz, Korth and Sudarshan ◯c
' $
Nested-Loop Join
• r is
called
the
outer
relatio
& n and
s the
inner
Database
%
relatio 12.28
Silberschatz, Korth and Sudarshan ◯c
' $
Nested-Loop Join (Cont.)
& •
• If the nested-loops
Block smaller relation (depositor
algorithm (next )slide)
memory, the cost estimate will be 500 disk
preferable.
accesses.
Database
fits is
entirely in
%
12.29
Silberschatz, Korth and Sudarshan ◯c
' $
Block Nested-Loop Join
& %
only onceblock in the outer relation (instead of once
for each
end
for each
tuple
end in the outer
relation)
Database Systems 12.3
Silberschatz, Korth and Sudarshan ◯c
Concepts 0
' $
Block Nested-Loop Join (Cont.)
& 12.3
1
' $
Indexed Nested-Loop Join
& %
• If– indices
Cost ofare
theavailable
join: br +on ∗ c, where
nr both r and sc, is
usethe cost
the oneof
with a single
tuples fewer selection
as the outer on s using the join condition.
relation.
Database Systems 12.3
Silberschatz, Korth and Sudarshan ◯c
Concepts 2
' $
Example of Index Nested-Loop Join
& Database
%
12.33
Silberschatz, Korth and Sudarshan ◯c
' $
Merge–Join
a1 a2 a1 a3
pr ps
a 3 a A
b 1 b G
d 8 c L
d 13 d N
f 7 m B
m 5
& %
q 6
r s
Database
12.34
Silberschatz, Korth and Sudarshan ◯c
' $
Merge–Join (Cont.)
& Database
%
12.35
Silberschatz, Korth and Sudarshan ◯c
' $
Hash–Join
& %
empty. Each tuple ts ∈ s is put in partition Hsi ,
initially
= h(st
iwhere
[JoinAttrs]).
Database Systems 12.3
Silberschatz, Korth and Sudarshan ◯c
Concepts 6
' $
Hash–Join (Cont.)
& Database
%
12.37
Silberschatz, Korth and Sudarshan ◯c
' $
Hash–Join (Cont.)
0 0
. 1 1
. .
. .
2 2 .
.
.
3 3
4 4
r s
Partitions Partitions
& Database
of r of s
%
12.38
Silberschatz, Korth and Sudarshan ◯c
' $
Hash–Join algorithm
The hash-join of r and s is computed as follows.
1. Partition the relations s using hashing function h.
When partitioning a relation, one block of memory is
reserved as the output buffer for each partition.
2. Partition r similarly.
3. For each i:
(a) Load Hsi into memory and build an in-memory
hash index on it using the join attribute. This hash
index uses a different hash function than the
earlier one h.
(b) Read the tuples in Hri from disk one by one.
For each tuple tr locate each matching tuple ts in Hsi
& %
memory.
fit in Can resolve by further partitioning Hsi using
hash function. rH
different i
must be similarly
partitioned.
Database Systems 12.4
Silberschatz, Korth and Sudarshan ◯c
Concepts 0
' $
Cost of Hash–Join
• If recursive partitioning is not required: 3(br + bs ) + 2
∗ max
• If recursive partitioning is required, number of passes
required for partitioning s is [ log M − 1 ( b s ) − 1|. This is
because each final partition of s should fit in memory.
• The number of partitions of probe relation r is the same
as that for build relation s; the number of passes for
partitioning of r is also the same as for s. Therefore it is
best to choose the smaller relation as the build
relation.
• Total cost estimate is:
customer 1 depositor
& %
Hybrid hash-join √most useful if M >>
•hash-join. bs .
& r 1θi s:
Database Systems
(r 1θ1 s) ∪ (r 1θ2 s) ∪ . . . ∪ (r 1θn s)
12.44
Silberschatz, Korth and Sudarshan ◯c
%
Concepts
' $
Complex Joins (Cont.)
& %
purpose operation that is more efficient than
implementing
two two joins of
relations.
Database Systems 12.45
Silberschatz, Korth and Sudarshan ◯c
Concepts
' $
Other Operations
& Database
%
12.46
Silberschatz, Korth and Sudarshan ◯c
' $
Other Operations (Cont.)
& Database
%
12.47
Silberschatz, Korth and Sudarshan ◯c
' $
Other Operations (Cont.)
& %
delete it from the index. Add remaining tuples in
index,
the hash index to the result.
12.4
9
' $
Evaluation of Expressions
customer-name
&
Database Systems
account
12.50
Silberschatz, Korth and Sudarshan ◯c
%
Concepts
' $
Evaluation of Expressions (Cont.)
%
algorithms that
• Pipelines can be executed in two ways: demand
driven and
& Database
%
12.52
Silberschatz, Korth and Sudarshan ◯c
' $
Equivalence of Expressions
Relations generated by two equivalent expressions have
the same set of attributes and contain the same set of
tuples, although their attributes may be ordered
differently.
customer-name
customer-name
branch-city=Brooklyn
branch branch-city=Brooklyn
& Database
Equivalent
expressions %
12.53
Silberschatz, Korth and Sudarshan ◯c
' $
Equivalence Rules
& %
Selections 1 1θ with Cartesian products
Eand
2
(b) σθtheta
1 (E1 1
joins.
E2) = E1 1θ1 ∧ θ 2 E2
θ2
E1 1θ E2 =
E2 1θ E1
( E1 1 E 2) 1 E3 = E1 1
( E2 1 E 3)
(E2 1θ2 E3)
(E1 1θ1 E2) 1θ2 ∧θ 3 E3 = E1 1θ1 ∧θ 3
(b) Theta joins are associative in the following
where θ2 involves attributes from only E2
manner:
and E3.
& Database
%
12.55
Silberschatz, Korth and Sudarshan ◯c
' $
Equivalence Rules (Cont.)
& Database
%
12.56
Silberschatz, Korth and Sudarshan ◯c
' $
Equivalence Rules (Cont.)
& Database
L1 ∪ L 2 (E1 1θ E2) = L1 ∪ L 2 ((L1 ∪ L 3 (E1)) 1θ (L2 ∪ L 4 (E2)))
%
12.57
Silberschatz, Korth and Sudarshan ◯c
' $
Equivalence Rules (Cont.)
E1 ∩ E2 = E2 ∩ E1
10. Set union and intersection are associative.
11. The selection operation distributes over ∪, ∩ and −.
E.g.:
σP (E1 − E2) = σP (E1)− σP (E2)
customer -name
((σbranch-city = “Brooklyn” (branch))
1 (account 1 depositor ))
• Performing the selection as early as possible reduces
Database
%
12.59
Silberschatz, Korth and Sudarshan ◯c
' $
Selection Operation Example (Cont.)
& Database
σbranch-city = “Brooklyn” (branch)
(account)
1 σbalance>1000
%
12.60
Silberschatz, Korth and Sudarshan ◯c
' $
Projection Operation Example
& (σbranch-city
customer
(branch))
Database
name = (( account -number (
“Brooklyn”
1 account)) 1
depositor )
%
12.61
Silberschatz, Korth and Sudarshan ◯c
' $
Join Ordering Example
& Database
%
12.62
Silberschatz, Korth and Sudarshan ◯c
' $
Join Ordering Example (Cont.)
& %
σbranch-city = “Brooklyn” (branch) 1
bank’s customers have accounts in branches located
first
in Brooklyn, it is better to compute
.
Database Systems 12.6
Silberschatz, Korth and Sudarshan ◯c
Concepts 3
' $
Evaluation Plan
(hash-join)
(merge-join) depositor
Pipeline Pipeline
branch account
& Database
An evaluation
plan %
12.64
Silberschatz, Korth and Sudarshan ◯c
' $
Choice of Evaluation Plans
Database
%
12.65
Silberschatz, Korth and Sudarshan ◯c
' $
Cost-Based Optimization
& Database
%
12.66
Silberschatz, Korth and Sudarshan ◯c
' $
Cost-Based Optimization (Cont.)
r5
r4 r3 r4 r5
r3 r1 r2
& %
r1 r2
s.
& 12.6
9
' $
Heuristic Optimization
%
• Some systemsbefore
operations use only heuristics,
other others combine
similar operations.
heuristics
&
projections where needed (Equiv. rules 3, 8a, 8b, 12).
6. Identify those subtrees whose operations can be
and execute them using
pipelined,
Database Systems 12.71
Silberschatz, Korth and Sudarshan ◯c
%
Concepts
' $
Structure of Query Optimizers
& Database
%
12.72
Silberschatz, Korth and Sudarshan ◯c
' $
Structure of Query Optimizers (Cont.)
& Database
%
12.73
Silberschatz, Korth and Sudarshan ◯c