0% found this document useful (0 votes)
10 views26 pages

Vu Lec 34

Virtual university

Uploaded by

zahid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views26 pages

Vu Lec 34

Virtual university

Uploaded by

zahid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Distributed Database

Management Systems

Lecture 34
In the previous lecture
• Concluded Data Localization
• Query Optimization
– Components: Search space, cost
model, search strategy
– SS consists of eq. query trees
– SSts could be static, dynamic or
randomized
– Cost model sees response and total
times…
• Query Optimization
–Transmission cost is the most
important
–Another major factor is size of
interm tables
–Database statistics are used to
evaluate size of iterm. Tables
–Selectivity factor, card, size are
some major figures
In today’s Lecture
• Query Optimization
• Centralized QO
–Best access path
–Join Processing
• QO in Distributed Environment.
• Input to Optimizer is a
query tree generated
through QD
• Single relation queries:
executed according to
the best access path
• Queries involving Joins
in three steps
[Link] the possible
ordering of joins-
[Link] the cost of
each ordering
[Link] the join ordering
with minimal cost
• Cost model assigns
(estimates) costs to
operations based on
cardinality of operands
• Obtained from
Database statistics-
• Two major steps in
Optimization Algorithm
–Best access path for
individual relation with pred
–The best join ordering; two
possibilities.
1- Nested loops
• for each tuple of external
relation (card(n1))
• for each tuple of internal
relation (card(n2))
–join two tuples if the join predicate
is true
• Complexity: n1∗n2 (no index)
• Merge join
• Sorted relations
• Merge relations
• Complexity: n1 + n2 if
relations are
previously sorted
• Example: Select eName
From EMP, ASG, PROJ
Where
[Link] = [Link] &
[Link] = [Link] &
pName = ‘CAD/CAM’
• EMP has an index on eNo
• ASG has an index on pNo
• PROJ has an index on
pNo and an index on
pName
ASG

eNo pNo

EMP PROJ
1- Choose the best access
paths to each relation
• EMP: sequential scan (no
selection on EMP)
• ASG: sequential scan (no
selection on ASG)
• PROJ: index on pName
(there is a selection on
PROJ based on pName)
2- Determine the best join
ordering
–Total 3! orderings are
possible
–Rather than computing for
all, some of them are pruned
–Shown in the tree, next page
EMP ASG PROJ

EMP⋈ A ASG ⋈ E ⋈
SG EMP x PROJMP ASG ⋈ PR
PROJ A
SG PROJ x EMP
OJ
(ASG⋈EMP )⋈PROJ (PROJ ⋈ASG)⋈
EMP
Join Ordering in
Fragmented Queries
• It is important
• Assumptions-
• Two alternatives
–Join Ordering
–Replaced by Semi-Joins
• Former is more difficult.
Join Ordering
• Two relations: move the
smaller relation to the
other site
If size(R) < size(S)
R S

If size(S) < size(R)


• More than 2 relations
–Calculate all possible
costs
–Requires to compute size
of intermediate tables

–EMP ⋈ ASG ⋈ PROJ


–Difficult! Lets see why
Site 2
ASG
eNo pNo

EMP PROJ

Site 1 Site 3
• Strategy 1:

EMP’= EMP ⋈ ASGsite3


EMPsite2, site2 computes

computes EMP’ ⋈ PROJ


Site 2
ASG
eNo pNo

EMP PROJ

Site 1 Site 3
• Strategy 2:

EMP’= EMP ⋈ ASGsite3


ASGsite1, site1 computes

computes EMP’ ⋈ PROJ


Site 2
ASG
eNo pNo

EMP PROJ

Site 1 Site 3
• Strategy 3:

ASG’= PROJ ⋈ ASGsite1


ASGsite3, site3 computes

computes EMP ⋈ ASG’


Site 2
ASG
eNo pNo

EMP PROJ

Site 1 Site 3
• Strategy 4:

PROJ’= PROJ ⋈ ASGsite1


PROJsite2, site2 computes

computes EMP ⋈ PROJ’


Site 2
ASG
eNo pNo

EMP PROJ

Site 1 Site 3
• Strategy 5:

computes PROJ ⋈ ASG ⋈


EMP, PROJsite2, site2

EMP
Which one to Choose
• We need to know
–Size of operand tables
–Estimate interm tables’ size
• Computing all possibilities
could be lengthy
• Heuristic: Consider only
the size of tables-
Thanks

You might also like