0% found this document useful (0 votes)

86 views55 pages

Query Optimization in Database Systems

The document discusses query optimization in database management systems. It covers estimating the cost of query plans by selecting optimal access paths, estimating cardinalities of intermediate relations, and computing sizes of joined relations based on statistics of base relations. The goal of query optimization is to select the lowest cost physical query execution plan from alternative logical plans by estimating costs using relation statistics and size estimates of intermediate results.

Uploaded by

Miranda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views55 pages

Query Optimization in Database Systems

Uploaded by

Miranda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

CSE 544

Principles of Database
Management Systems

Fall 2016
Lecture 8 - Query optimization
Announcements

•  HW2 (SimpleDB) is due next Friday!

•  Midterm in two weeks, Nov. 10, in class

•  Project Milestone due on Nov. 16

CSE 544 - Fall 2016 2

Access path selection in a relational database management

system.
Selinger. et. al. SIGMOD 1979

Additional resources:
•  Chaudhuri, "An Overview of Query Optimization in
Relational Systems," Proceedings of ACM PODS, 1998

•  Database management systems.

Ramakrishnan and Gehrke.
Third Ed. Chapter 15.
CSE 544 - Fall 2016 3
Query Optimization Motivation
SQL query
Declarative query
Recall physical and
Parse & Rewrite Query logical data independence

Logical
Query Select Logical Plan
plan
optimization
Select Physical Plan
Physical
plan
Query Execution

Disk 4
What We Already Know…
Supplier(sno,sname,scity,sstate)
Part(pno,pname,psize,pcolor)
Supply(sno,pno,price)
For each SQL query….
SELECT S.sname
FROM Supplier S, Supply U
WHERE S.scity='Seattle' AND S.sstate='WA’
AND S.sno = U.sno
AND U.pno = 2

There exist many logical query plan…

CSE 544 - Fall 2016 5

Example Query: Logical Plan 1

π sname

σ sscity=‘Seattle’ ∧sstate=‘WA’ ∧ pno=2

sno = sno

Supplier Supply
CSE 544 - Fall 2016 6
Example Query: Logical Plan 2

π sname

sno = sno

σ sscity=‘Seattle’ ∧sstate=‘WA’ σ pno=2

Supplier Supply
CSE 544 - Fall 2016 7
What We Also Know

•  For each logical plan…

•  There exist many physical plans

CSE 544 - Fall 2016 8

Example Query: Physical Plan 1
(On the fly) π sname

(On the fly)

σ scity=‘Seattle’ ∧sstate=‘WA’ ∧ pno=2

(Nested loop)
sno = sno

Supplier Supply
(File scan) (File scan)
CSE 544 - Fall 2016 9
Example Query: Physical Plan 2
(On the fly) π sname

(On the fly)

σ scity=‘Seattle’ ∧sstate=‘WA’ ∧ pno=2

(Index nested loop)

sno = sno

Supplier Supply
(File scan) (Index scan)
CSE 544 - Fall 2016 10
Query Optimization

Three major components:

1.  Cardinality and cost estimation

2.  Search space

3.  Plan enumeration algorithms

CSE 544 - Fall 2016 11

Estimating Cost of a Query Plan
Goal: compute the cost of an entire physical query plan

•  We already know how to

–  Compute the cost of different operations in terms of number Ios,
given the T(R)’s and the B(R)’s

•  We still need to do
–  Access path selection: compute cost of retrieving tuples from disk
with different access paths
–  Size estimation: compute the T(R)’s and the B(R)’s for
intermediate relations R
CSE 544 - Fall 2016 12
Access Path

Access path: a way to retrieve tuples from a table

•  A file scan

•  An index plus a matching selection condition

CSE 544 - Fall 2016 13

Access Path Selection

•  Supplier(sid,sname,scity,sstate)

•  Selection condition: sid > 300 ∧ scity=‘Seattle’

•  Indexes: B+-tree on sid and B+-tree on scity

•  Which access path should we use?

•  We should pick the most selective access path

CSE 544 - Fall 2016 14

Access Path Selectivity

•  Access path selectivity is the number of pages

retrieved if we use this access path
–  Most selective retrieves fewest pages

•  As we saw earlier, for equality predicates

–  Selection on equality: σa=v(R)
–  V(R, a) = # of distinct values of attribute a
–  1/V(R,a) is thus the reduction factor
–  Clustered index on a: cost B(R)/V(R,a)
–  Unclustered index on a: cost T(R)/V(R,a)
–  (we are ignoring I/O cost of index pages for simplicity)

CSE 544 - Fall 2016 15

Selectivity for Range Predicates

Selection on range: σa>v(R)

•  How to compute the selectivity?

•  Assume values are uniformly distributed
•  Reduction factor X
•  X = (Max(R,a) - v) / (Max(R,a) - Min(R,a))

•  Clustered index on a: cost B(R)*X

•  Unclustered index on a: cost T(R)*X

CSE 544 - Fall 2016 16

Back to Our Example

•  Selection condition: sid > 300 ∧ scity=‘Seattle’

–  Index I1: B+-tree on sid clustered
–  Index I2: B+-tree on scity unclustered

•  Let’s assume
–  V(Supplier,scity) = 20
–  Max(Supplier, sid) = 1000, Min(Supplier,sid)=1
–  B(Supplier) = 100, T(Supplier) = 1000

•  Cost I1: B(R) * (Max-v)/(Max-Min) = 100*700/999 ≈ 70

•  Cost I2: T(R) * 1/V(Supplier,scity) = 1000/20 = 50

CSE 544 - Fall 2016 17

Selectivity with
Multiple Conditions
What if we have an index on multiple attributes?
•  Example selection σa=v1 ∧ b= v2(R) and index on <a,b>

How to compute the selectivity?

•  Assume attributes are independent
•  X = 1 / (V(R,a) * V(R,b))

•  Clustered index on <a,b>: cost B(R)*X

•  Unclustered index on <a,b>: cost T(R)*X

CSE 544 - Fall 2016 18

Estimating Cost of a Query Plan
Goal: compute the cost of an entire physical query plan

•  We already know how to

–  Compute the cost of different operations in terms of number Ios,
given the T(R)’s and the B(R)’s

•  Collected information for each relation

–  Number of tuples (cardinality) T(R)
–  Number of physical pages B(R), clustering info
–  Indexes, number of keys in the index V(R,a)
–  Statistical information on attributes
•  Min value, max value, number distinct values
•  Histograms
–  Correlations between columns (hard)

•  Collection approach: periodic, using sampling

CSE 544 - Fall 2016 20

Size Estimation

Projection: output size same as input size

Selection: multiply input size by reduction factor

•  Similar to what we did for estimating access path
selectivity
•  Assume independence between conditions in the
predicate
•  Examples:
T(σA=...(R)) = T(R) / V(R,A)
T(σA=...∧ B=... (R)) = T(R) / (V(R,A) * V(R,B))

CSE 544 - Fall 2016 21

Estimating Result Sizes

Join R ⋈ S

•  Take product of cardinalities of relations R and S

•  Apply reduction factors for each term in join condition
•  Terms are of the form: column1 = column2
•  Reduction: 1/ ( MAX( V(R,column1), V(S,column2))
•  Why? Will explain next...

CSE 544 - Fall 2016 22

Assumptions

•  Containment of values: if V(R,A) <= V(S,B), then

the set of A values of R is included in the set of B
values of S
–  Note: this indeed holds when A is a foreign key in R,
and B is a key in S

•  Preservation of values: for any other attribute C,

V(R ⨝A=B S, C) = V(R, C) (or V(S, C))

CSE 544 - Fall 2016 23

Selectivity of R ⨝A=B S

Assume V(R,A) <= V(S,B)

•  Each tuple t in R joins with T(S)/V(S,B) tuple(s) in S

•  Hence T(R ⨝A=B S) = T(R) T(S) / V(S,B)

In general: T(R ⨝A=B S) = T(R) T(S) / max(V(R,A),V(S,B))

CSE 544 - Fall 2016 24

Complete Example

Supplier(sid, sname, scity, sstate) SELECT sname

Supply(sid, pno, quantity) FROM Supplier x, Supply y
WHERE x.sid = y.sid
and y.pno = 2
•  Some statistics
and x.scity = ‘Seattle’
–  T(Supplier) = 1000 records and x.sstate = ‘WA’
–  T(Supply) = 10,000 records
–  B(Supplier) = 100 pages
–  B(Supply) = 100 pages
–  V(Supplier,scity) = 20, V(Suppliers,state) = 10
–  V(Supply,pno) = 2,500
–  Both relations are clustered
•  M = 11
CSE 544 - Fall 2016 25
Computing the Cost of a Plan

•  Estimate cardinality in a bottom-up fashion

–  Cardinality is the size of a relation (nb of tuples)
–  Compute size of all intermediate relations in plan

•  Estimate cost by using the estimated cardinalities

CSE 544 - Fall 2016 26

T(Supplier) = 1000 B(Supplier) = 100 V(Supplier,scity) = 20 M = 11
T(Supply) = 10,000 B(Supply) = 100 V(Supplier,state) = 10
V(Supply,pno) = 2,500

Physical Query Plan 1

(On the fly) π sname Selection and project on-the-fly
-> No additional cost.
(On the fly)
σ scity=‘Seattle’ ∧sstate=‘WA’ ∧ pno=2
Total cost of plan is thus cost of join:
= B(Supplier)+B(Supplier)*B(Supplies)
(Nested loop) = 100 + 100 * 100
sno = sno
= 10,100 I/Os

Supplier Supply
(File scan) (File scan)
CSE 544 - Fall 2016 27
T(Supplier) = 1000 B(Supplier) = 100 V(Supplier,scity) = 20 M = 11
T(Supply) = 10,000 B(Supply) = 100 V(Supplier,state) = 10
V(Supply,pno) = 2,500

Physical Query Plan 2

Total cost
(On the fly) π sname (4) = 100 + 100 * 1/20 * 1/10 (1)
+ 100 + 100 * 1/2500 (2)
+ 2 (3)
(Sort-merge join) (3) + 0 (4)
sno = sno Total cost ≈ 204 I/Os
(Scan
write to T1) (Scan
write to T2)
(1) σ scity=‘Seattle’ ∧sstate=‘WA’ (2) σ pno=2

Supplier Supply
(File scan) (File scan)
CSE 544 - Fall 2016 28
Plan 2 with Different Numbers
Total cost
What if we had: π sname (4) = 10000 + 50 (1)
10K pages of Suppliers
+ 10000 + 4 (2)
10K pages of Supplies
+ 4*50 + 2*4 + 4 + 50 (3)
(Sort-merge join) (3) + 0 (4)
sno = sno Total cost ≈ 20,316 I/Os
(Scan
write to T1) (Scan
write to T2)
(1) σ scity=‘Seattle’ ∧sstate=‘WA’ (2) σ pno=2

Assuming naive
Supplier Supply two-pass sort
(File scan) (File scan) algorithm
CSE 544 - Fall 2016 29
T(Supplier) = 1000 B(Supplier) = 100 V(Supplier,scity) = 20 M = 11
T(Supply) = 10,000 B(Supply) = 100 V(Supplier,state) = 10
V(Supply,pno) = 2,500

Physical Query Plan 3

(On the fly) (4) π sname
Total cost
= 1 (1)
(On the fly)
+ 4 (2)
(3) σ scity=‘Seattle’ ∧sstate=‘WA’ + 0 (3)
+ 0 (3)
Total cost ≈ 5 I/Os
(2) (Index nested loop)
sno = sno

(Use hash index) 4 tuples

(1) σ pno=2

Supply Supplier
(Hash index on pno ) (Hash index on sno)
30
Assume: clustered Clustering does not matter
Simplifications

•  In the previous examples, we assumed that all index

pages were in memory

•  When this is not the case, we need to add the cost of

fetching index pages from disk

CSE 544 - Fall 2016 31

Different Cost Models

•  In previous examples, we considered IO costs

•  Typically, want IO+CPU

•  For parallel/distributed queries, add network bandwidth

•  If need to compare logical plans

–  Compute the cardinality of each intermediate relation
–  Sum up all the cardinalities

CSE 544 - Fall 2016 32

Summary
Goal: compute the cost of an entire physical query plan

•  We already know how to

–  Compute the cost of different operations in terms of number Ios,
given the T(R)’s and the B(R)’s

Three major components:

1.  Cardinality and cost estimation

2.  Search space

3.  Plan enumeration algorithms

CSE 544 - Fall 2016 34

Relational Algebra Laws

•  Selections
–  Commutative: σc1(σc2(R)) same as σc2(σc1(R))
–  Cascading: σc1∧c2(R) same as σc2(σc1(R))

•  Projections
–  Cascading

•  Joins
–  Commutative : R ⋈ S same as S ⋈ R
–  Associative: R ⋈ (S ⋈ T) same as (R ⋈ S) ⋈ T

CSE 544 - Fall 2016 35

Left-Deep Plans and
Bushy Plans

R3 R1 R3 R1 R2 R4
Left-deep plan Bushy plan

CSE 544 - Fall 2016 36

Relational Algebra Laws

•  Selects, projects, and joins

–  We can commute and combine all three types of operators
–  We just have to be careful that the fields we need are available
when we apply the operator
–  Relatively straightforward. See book 15.3.

•  More info in optional paper (by Chaudhuri), Section 4.

CSE 544 - Fall 2016 37

Group-by and Join

R(A, B), S(C,D)

γA, sum(D)(R(A,B) ⨝ B=C S(C,D)) = ?

CSE 544 - Fall 2016 38

Group-by and Join

R(A, B), S(C,D)

γA, sum(D)(R(A,B) ⨝ B=C S(C,D)) =

γA, sum(D)(R(A,B) ⨝ B=C (γC, sum(D)S(C,D)))

These are very powerful laws.

They were introduced only in the 90’s.

CSE 544 - Fall 2016 39

Search Space Challenges

•  Search space is huge!

–  Many possible equivalent trees (logical)
–  Many implementations for each operator (physical)
–  Many access paths for each relation (physical)

•  Cannot consider ALL plans

•  Want a search space that includes low-cost plans

•  Typical compromises:
–  Only left-deep plans
–  Only plans without cartesian products
–  Always push selections down to the leaves
40
Query Optimization

Three major components:

1.  Cardinality and cost estimation

2.  Search space

3.  Plan enumeration algorithms

CSE 544 - Fall 2016 41

Two Types of Optimizers

•  Heuristic-based optimizers:
–  Apply greedily rules that always improve plan
•  Typically: push selections down
–  Very limited: no longer used today

•  Cost-based optimizers:
–  Use a cost model to estimate the cost of each plan
–  Select the “cheapest” plan
–  We focus on cost-based optimizers

CSE 544 - Fall 2016 42

Three Approaches to Search
Space Enumeration
•  Complete plans

•  Bottom-up plans

•  Top-down plans

CSE 544 - Fall 2016 43

Complete Plans

R(A,B) SELECT *
S(B,C) FROM R, S, T
T(C,D) WHERE R.B=S.B and S.C=T.C and R.A<40

⨝
⨝
⨝ Why is this
T search space
σA<40 ⨝ inefficient ?
σA<40 S
R S T
R
CSE 544 - Fall 2016 44
Bottom-up Partial Plans

R(A,B) SELECT *
S(B,C) FROM R, S, T
T(C,D) WHERE R.B=S.B and S.C=T.C and R.A<40

Why is this ⨝
better ?
⨝ ⨝ T

σA<40 ⨝ σA<40 S ⨝ σA<40 S

…..
R S T R R S R
CSE 544 - Fall 2016 45
Top-down Partial Plans
R(A,B) SELECT *
S(B,C) FROM R, S, T
T(C,D) WHERE R.B=S.B and S.C=T.C and R.A<40

⨝ ⨝ σA<40

⨝
T T SELECT R.A, T.D
SELECT * FROM R, S, T
FROM R, S WHERE R.B=S.B
WHERE R.B=S.B
and R.A < 40 SELECT * S
and S.C=T.C …..
FROM R
WHERE R.A < 40

CSE 544 - Fall 2016 46

Two Types of Plan
Enumeration Algorithms
•  Dynamic programming (in class)
–  Based on System R (aka Selinger) style optimizer[1979]
–  Limited to joins: join reordering algorithm
–  Bottom-up

•  Rule-based algorithm (will not discuss)

–  Database of rules (=algebraic laws)
–  Usually: dynamic programming
–  Usually: top-down

CSE 544 - Fall 2016 47

System R Search Space

•  Only left-deep plans

–  Enable dynamic programming for enumeration
–  Facilitate tuple pipelining from outer relation
•  Consider plans with all “interesting orders”
•  Perform cross-products after all other joins (heuristic)
•  Only consider nested loop & sort-merge joins
•  Consider both file scan and indexes
•  Try to evaluate predicates early

CSE 544 - Fall 2016 48

Plan Enumeration Algorithm

•  Idea: use dynamic programming

•  For each subset of {R1, …, Rn}, compute the best plan
for that subset
•  In increasing order of set cardinality:
–  Step 1: for {R1}, {R2}, …, {Rn}
–  Step 2: for {R1,R2}, {R1,R3}, …, {Rn-1, Rn}
–  …
–  Step n: for {R1, …, Rn}
•  It is a bottom-up strategy
•  A subset of {R1, …, Rn} is also called a subquery

CSE 544 - Fall 2016 49

Dynamic Programming Algo.

•  For each subquery Q ⊆{R1, …, Rn} compute the

following:
–  Size(Q)
–  A best plan for Q: Plan(Q)
–  The cost of that plan: Cost(Q)

CSE 544 - Fall 2016 50

Dynamic Programming Algo.

•  Step 1: Enumerate all single-relation plans

–  Consider selections on attributes of relation

–  Consider all possible access paths
–  Consider attributes that are not needed

–  Compute cost for each plan

–  Keep cheapest plan per “interesting” output order

CSE 544 - Fall 2016 51

Dynamic Programming Algo.

•  Step 2: Generate all two-relation plans

–  For each each single-relation plan from step 1

–  Consider that plan as outer relation
–  Consider every other relation as inner relation

–  Compute cost for each plan

–  Keep cheapest plan per “interesting” output order

CSE 544 - Fall 2016 52

Dynamic Programming Algo.

•  Step 3: Generate all three-relation plans

–  For each each two-relation plan from step 2

–  Consider that plan as outer relation
–  Consider every other relation as inner relation
–  Compute cost for each plan
–  Keep cheapest plan per “interesting” output order

•  Steps 4 through n: repeat until plan contains all the

relations in the query

CSE 544 - Fall 2016 53

Commercial Query Optimizers

DB2, Informix, Microsoft SQL Server, Oracle 8

•  Inspired by System R
–  Left-deep plans and dynamic programming
–  Cost-based optimization (CPU and IO)

•  Go beyond System R style of optimization

–  Also consider right-deep and bushy plans (e.g., Oracle and DB2)
–  Variety of additional strategies for generating plans (e.g., DB2
and SQL Server)

CSE 544 - Fall 2016 54

Other Query Optimizers

•  Randomized plan generation

–  Genetic algorithm
–  PostgreSQL uses it for queries with many joins

•  Rule-based
–  Extensible collection of rules
–  Rule = Algebraic law with a direction
–  Algorithm for firing these rules
•  Generate many alternative plans, in some order
•  Prune by cost
–  Startburst (later DB2) and Volcano (later SQL Server)

CSE 544 - Fall 2016 55

Plan Cost
No ratings yet
Plan Cost
37 pages
CSE 444 Practice Problems
No ratings yet
CSE 444 Practice Problems
8 pages
Query Processing in Database Systems
No ratings yet
Query Processing in Database Systems
25 pages
Zyqwadawfafslecture09 Query Optimization
No ratings yet
Zyqwadawfafslecture09 Query Optimization
90 pages
Query Optimization Practice Problems
No ratings yet
Query Optimization Practice Problems
7 pages
Unit IV Part II
No ratings yet
Unit IV Part II
37 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
Overview of Query Evaluation: R&G Chapter 12
No ratings yet
Overview of Query Evaluation: R&G Chapter 12
30 pages
Lecture09 Optimization
No ratings yet
Lecture09 Optimization
55 pages
DBMS Unit5 Lecture1
No ratings yet
DBMS Unit5 Lecture1
22 pages
Query Optimization Techniques by Warih Maharani
No ratings yet
Query Optimization Techniques by Warih Maharani
39 pages
CSE 444: Database Internals: Section 4: Query Optimizer
No ratings yet
CSE 444: Database Internals: Section 4: Query Optimizer
16 pages
CSE 444 DBMS Practice Problems
No ratings yet
CSE 444 DBMS Practice Problems
13 pages
Query Processing for CS Students
No ratings yet
Query Processing for CS Students
47 pages
15 Optimization
No ratings yet
15 Optimization
8 pages
QEII
No ratings yet
QEII
44 pages
3 Query Processing and Optimization-1
No ratings yet
3 Query Processing and Optimization-1
18 pages
Lecture11 Query Processing
No ratings yet
Lecture11 Query Processing
37 pages
05 QueryProcessing LecW4 Feb7 22
No ratings yet
05 QueryProcessing LecW4 Feb7 22
55 pages
Database Query Processing & Security
No ratings yet
Database Query Processing & Security
39 pages
Advanced Database Systems Lecture Notes
No ratings yet
Advanced Database Systems Lecture Notes
79 pages
Database Query Optimization Guide
No ratings yet
Database Query Optimization Guide
30 pages
13 QP1
No ratings yet
13 QP1
33 pages
ADB Slides 4
No ratings yet
ADB Slides 4
47 pages
Query Optimization
No ratings yet
Query Optimization
20 pages
Query Evaluation
No ratings yet
Query Evaluation
51 pages
L10-Query Evaluaion
No ratings yet
L10-Query Evaluaion
50 pages
Introduction To Database Management Systems CS470
No ratings yet
Introduction To Database Management Systems CS470
11 pages
QueryProcess Optim
No ratings yet
QueryProcess Optim
60 pages
05 Optimization
No ratings yet
05 Optimization
58 pages
1.3 PPT - Measure of Query Cost
100% (1)
1.3 PPT - Measure of Query Cost
42 pages
Chapter15 1
No ratings yet
Chapter15 1
43 pages
CH 11
No ratings yet
CH 11
19 pages
DBMS Query Optimization Guide
No ratings yet
DBMS Query Optimization Guide
24 pages
Lec6 QP Indexing
No ratings yet
Lec6 QP Indexing
40 pages
DBMS R19 Unit Iv
No ratings yet
DBMS R19 Unit Iv
25 pages
Cost-Based Query Optimization Guide
No ratings yet
Cost-Based Query Optimization Guide
52 pages
Query Processing for DBMS Students
No ratings yet
Query Processing for DBMS Students
13 pages
Query Optimization
No ratings yet
Query Optimization
103 pages
Database Query Optimization Guide
No ratings yet
Database Query Optimization Guide
127 pages
Final Review
No ratings yet
Final Review
96 pages
Guc 437 59 30458 2023-03-13T08 52 30
No ratings yet
Guc 437 59 30458 2023-03-13T08 52 30
72 pages
06 Query Processing (2) - NDN
No ratings yet
06 Query Processing (2) - NDN
31 pages
Chap 12
No ratings yet
Chap 12
73 pages
CSE 544: Optimizations: Wednesday, 5/10/2006
No ratings yet
CSE 544: Optimizations: Wednesday, 5/10/2006
51 pages
Cost Estimation For Query Optimization
No ratings yet
Cost Estimation For Query Optimization
14 pages
Ch12-Query Processing
No ratings yet
Ch12-Query Processing
34 pages
Query Optimization in Relational Database Systems
No ratings yet
Query Optimization in Relational Database Systems
77 pages
Database Technology Query Processing: Heiko Paulheim
No ratings yet
Database Technology Query Processing: Heiko Paulheim
60 pages
Query Optimization in Database Systems
No ratings yet
Query Optimization in Database Systems
7 pages
Access Path Selection in RDBMS
No ratings yet
Access Path Selection in RDBMS
13 pages
Query Evaluation in DBMS
No ratings yet
Query Evaluation in DBMS
29 pages
Query Processing
No ratings yet
Query Processing
77 pages
SQL Query Optimization Techniques
No ratings yet
SQL Query Optimization Techniques
30 pages
Chapter 15
No ratings yet
Chapter 15
7 pages
Query Processing & Evaluation Guide
No ratings yet
Query Processing & Evaluation Guide
23 pages
QueryOptimization Siao
No ratings yet
QueryOptimization Siao
24 pages
Query Optimization
No ratings yet
Query Optimization
27 pages
15 Optimization
No ratings yet
15 Optimization
98 pages
Lecture03 Normalization
No ratings yet
Lecture03 Normalization
32 pages
JDBC Tutorial: Embedded SQL Basics
No ratings yet
JDBC Tutorial: Embedded SQL Basics
15 pages
Database Design: FDs, Decompositions, Normal Forms
No ratings yet
Database Design: FDs, Decompositions, Normal Forms
47 pages
Lecture09 Optimization Structural
No ratings yet
Lecture09 Optimization Structural
27 pages
Busybox Command Help
No ratings yet
Busybox Command Help
70 pages
DM Log
No ratings yet
DM Log
15 pages
Learning Objectives - If Else Statement
No ratings yet
Learning Objectives - If Else Statement
6 pages
Advantages and Disadvantages of Machine Learning Language
0% (1)
Advantages and Disadvantages of Machine Learning Language
2 pages
Business Network Solutions Pricing
No ratings yet
Business Network Solutions Pricing
3 pages
Essential Patent Search Tools Guide
No ratings yet
Essential Patent Search Tools Guide
34 pages
PCC CS601
No ratings yet
PCC CS601
4 pages
AS-i Safety Monitor Overview
No ratings yet
AS-i Safety Monitor Overview
2 pages
Describing Data in R
No ratings yet
Describing Data in R
3 pages
Razer Gold Gift Card - Google Search
No ratings yet
Razer Gold Gift Card - Google Search
1 page
Soft Computing PPT 071
No ratings yet
Soft Computing PPT 071
8 pages
Python Functions and Modules Guide
No ratings yet
Python Functions and Modules Guide
11 pages
Digital Cinema
100% (1)
Digital Cinema
28 pages
Panasonic NV Gs6, Gs17, Gs18, Gs21, Gs25, Gs28, Gs35, Gs38
100% (1)
Panasonic NV Gs6, Gs17, Gs18, Gs21, Gs25, Gs28, Gs35, Gs38
110 pages
Java Applet Lifecycle and Parameters
No ratings yet
Java Applet Lifecycle and Parameters
27 pages
UI Path First Intro
No ratings yet
UI Path First Intro
32 pages
Class 11 Asseration Reason Informatics Practices CHP 1 (2024-25)
No ratings yet
Class 11 Asseration Reason Informatics Practices CHP 1 (2024-25)
25 pages
21BEC1676 Avinash V Analog Communiction EXP 4
No ratings yet
21BEC1676 Avinash V Analog Communiction EXP 4
6 pages
Sample Partnership Letter (Modify Appropriately For Constitution Code) (On LETTER HEAD)
No ratings yet
Sample Partnership Letter (Modify Appropriately For Constitution Code) (On LETTER HEAD)
2 pages
Erm Coso PDF
0% (1)
Erm Coso PDF
2 pages
Understanding Pie Charts in Education
No ratings yet
Understanding Pie Charts in Education
6 pages
Fiber Optics
No ratings yet
Fiber Optics
10 pages
Finacle Testing Expertise of Rahul Gupta
No ratings yet
Finacle Testing Expertise of Rahul Gupta
4 pages
Smartphone Shopping Insights
No ratings yet
Smartphone Shopping Insights
19 pages
Newsjacking Blog Post Template
No ratings yet
Newsjacking Blog Post Template
3 pages
TD-110 Automatic Telephone Dialer Guide
No ratings yet
TD-110 Automatic Telephone Dialer Guide
2 pages
(PDF) Evaluation of Digital Photography From Model Aircraft For Remote Sensing of Crop Biomass and Nitrogen Status
No ratings yet
(PDF) Evaluation of Digital Photography From Model Aircraft For Remote Sensing of Crop Biomass and Nitrogen Status
21 pages
Aadhar Tours and Travels PVT LTD
No ratings yet
Aadhar Tours and Travels PVT LTD
76 pages
The Effects of Mobile Phone Use On Human Behaviors
No ratings yet
The Effects of Mobile Phone Use On Human Behaviors
10 pages
Tuples, Lists, Mutability, Cloning
No ratings yet
Tuples, Lists, Mutability, Cloning
26 pages

Query Optimization in Database Systems

Uploaded by

Query Optimization in Database Systems

Uploaded by

CSE 544

• HW2 (SimpleDB) is due next Friday!

• Midterm in two weeks, Nov. 10, in class

• Project Milestone due on Nov. 16

CSE 544 - Fall 2016 2

Access path selection in a relational database management

• Database management systems.

There exist many logical query plan…

CSE 544 - Fall 2016 5

σ sscity=‘Seattle’ ∧sstate=‘WA’ ∧ pno=2

σ sscity=‘Seattle’ ∧sstate=‘WA’ σ pno=2

• For each logical plan…

• There exist many physical plans

CSE 544 - Fall 2016 8

(On the fly)

(On the fly)

(Index nested loop)

Three major components:

1. Cardinality and cost estimation

2. Search space

3. Plan enumeration algorithms

CSE 544 - Fall 2016 11

• We already know how to

Access path: a way to retrieve tuples from a table

• An index plus a matching selection condition

CSE 544 - Fall 2016 13

• Selection condition: sid > 300 ∧ scity=‘Seattle’

• Indexes: B+-tree on sid and B+-tree on scity

• Which access path should we use?

• We should pick the most selective access path

CSE 544 - Fall 2016 14

• Access path selectivity is the number of pages

• As we saw earlier, for equality predicates

CSE 544 - Fall 2016 15

Selection on range: σa>v(R)

• How to compute the selectivity?

• Clustered index on a: cost B(R)*X

CSE 544 - Fall 2016 16

• Selection condition: sid > 300 ∧ scity=‘Seattle’

• Cost I1: B(R) * (Max-v)/(Max-Min) = 100*700/999 ≈ 70

CSE 544 - Fall 2016 17

How to compute the selectivity?

• Clustered index on <a,b>: cost B(R)*X

CSE 544 - Fall 2016 18

• We already know how to

• Collected information for each relation

• Collection approach: periodic, using sampling

CSE 544 - Fall 2016 20

Projection: output size same as input size

Selection: multiply input size by reduction factor

CSE 544 - Fall 2016 21

• Take product of cardinalities of relations R and S

CSE 544 - Fall 2016 22

• Containment of values: if V(R,A) <= V(S,B), then

• Preservation of values: for any other attribute C,

CSE 544 - Fall 2016 23

Assume V(R,A) <= V(S,B)

• Hence T(R ⨝A=B S) = T(R) T(S) / V(S,B)

In general: T(R ⨝A=B S) = T(R) T(S) / max(V(R,A),V(S,B))

CSE 544 - Fall 2016 24

Supplier(sid, sname, scity, sstate) SELECT sname

• Estimate cardinality in a bottom-up fashion

• Estimate cost by using the estimated cardinalities

CSE 544 - Fall 2016 26

Physical Query Plan 1

Physical Query Plan 2

Physical Query Plan 3

(Use hash index) 4 tuples

• In the previous examples, we assumed that all index

• When this is not the case, we need to add the cost of

CSE 544 - Fall 2016 31

• In previous examples, we considered IO costs

• Typically, want IO+CPU

• For parallel/distributed queries, add network bandwidth

• If need to compare logical plans

CSE 544 - Fall 2016 32

•  HW2 (SimpleDB) is due next Friday!

•  Midterm in two weeks, Nov. 10, in class

•  Project Milestone due on Nov. 16

•  Database management systems.

•  For each logical plan…

•  There exist many physical plans

1.  Cardinality and cost estimation

2.  Search space

3.  Plan enumeration algorithms

•  We already know how to

•  An index plus a matching selection condition

•  Selection condition: sid > 300 ∧ scity=‘Seattle’

•  Indexes: B+-tree on sid and B+-tree on scity

•  Which access path should we use?

•  We should pick the most selective access path

•  Access path selectivity is the number of pages

•  As we saw earlier, for equality predicates

•  How to compute the selectivity?

•  Clustered index on a: cost B(R)*X

•  Selection condition: sid > 300 ∧ scity=‘Seattle’

•  Cost I1: B(R) * (Max-v)/(Max-Min) = 100*700/999 ≈ 70

•  Clustered index on <a,b>: cost B(R)*X

•  We already know how to

•  Collected information for each relation

•  Collection approach: periodic, using sampling

•  Take product of cardinalities of relations R and S

•  Containment of values: if V(R,A) <= V(S,B), then

•  Preservation of values: for any other attribute C,

•  Hence T(R ⨝A=B S) = T(R) T(S) / V(S,B)

•  Estimate cardinality in a bottom-up fashion

•  Estimate cost by using the estimated cardinalities

•  In the previous examples, we assumed that all index

•  When this is not the case, we need to add the cost of

•  In previous examples, we considered IO costs

•  Typically, want IO+CPU

•  For parallel/distributed queries, add network bandwidth

•  If need to compare logical plans

•  We already know how to

1.  Cardinality and cost estimation

2.  Search space

3.  Plan enumeration algorithms

•  Selects, projects, and joins

•  More info in optional paper (by Chaudhuri), Section 4.

•  Search space is huge!

•  Cannot consider ALL plans

1.  Cardinality and cost estimation

2.  Search space

3.  Plan enumeration algorithms

•  Rule-based algorithm (will not discuss)

•  Only left-deep plans

•  Idea: use dynamic programming

•  For each subquery Q ⊆{R1, …, Rn} compute the

•  Step 1: Enumerate all single-relation plans

–  Consider selections on attributes of relation

–  Compute cost for each plan

–  Keep cheapest plan per “interesting” output order

•  Step 2: Generate all two-relation plans

–  For each each single-relation plan from step 1

–  Compute cost for each plan

–  Keep cheapest plan per “interesting” output order

•  Step 3: Generate all three-relation plans

–  For each each two-relation plan from step 2

•  Steps 4 through n: repeat until plan contains all the

•  Go beyond System R style of optimization

•  Randomized plan generation