QUERY
PROCESSING AND
Book Referred: Korth
6th Edition, Chapter
12 & 13
OPTIMIZATION
DATABASE
MANAGEMEN
T SYSTEMS
INTERNALS
BASIC STEPS IN QUERY
PROCESSING
1. Parsing and translation
2. Optimization
3. Evaluation
BASIC STEPS IN QUERY
PROCESSING (CONT.)
BASIC STEPS IN QUERY
PROCESSING : OPTIMIZATION
A relational algebra expression may have many equivalent
expressions
E.g.,
salary75000(salary(instructor)) is equivalent to
salary(salary75000(instructor))
Each relational algebra operation can be evaluated using
one of several different algorithms
Correspondingly, a relational-algebra expression can be
evaluated in many ways.
Annotated expression specifying detailed evaluation strategy
is called an evaluation-plan.
E.g., can use an index on salary to find instructors with
salary < 75000,
or can perform complete relation scan and discard
instructors with salary 75000
INTRODUCTION TO
QUERY OPTIMIZATION
Alternative ways of evaluating a given query
Equivalent expressions
Different algorithms for each operation
Find the names of all instructors in the Music department together with the course title
of all the courses that the instructors teach.
π name, title (σdept name =“Music” (instructor ⋈ (teaches ⋈ πcourse_id, title (course))))
INTRODUCTION (CONT.)
An evaluation plan defines exactly what algorithm is used
for each operation, and how the execution of the operations is
coordinated.
Find out how to view query execution plans on your favorite database
INTRODUCTION (CONT.)
Cost difference between evaluation plans for a query can be
enormous
E.g. seconds vs. days in some cases
Steps in cost-based query optimization
1. Generate logically equivalent expressions using
equivalence rules
2. Annotate resultant expressions to get alternative query
plans
3. Choose the cheapest plan based on estimated cost
Estimation of plan cost based on:
Statistical information about relations. Examples:
number of tuples, number of distinct values for an
attribute
Statistics estimation for intermediate results
to compute cost of complex expressions
Cost formulae for algorithms, computed using statistics
GENERATING
EQUIVALENT
EXPRESSIONS
TRANSFORMATION OF
RELATIONAL EXPRESSIONS
Two relational algebra expressions are said to be equivalent
if the two expressions generate the same set of tuples on
every legal database instance
Note: order of tuples is irrelevant
In SQL, inputs and outputs are multisets of tuples
Two expressions in the multiset version of the relational
algebra are said to be equivalent if the two expressions
generate the same multiset of tuples on every legal
database instance.
An equivalence rule says that expressions of two forms are
equivalent
It is statement that enables us to find an equivalent
expression from the given expression.
If on all legal instances of the database, both the expressions
provide the same result.
Can replace expression of first form by second, or vice versa.
EQUIVALENCE RULES
1. Conjunctive selection operations can be deconstructed
into a sequence of individual selections.
Cascading of conditions
2. Selection operations are commutative.
3. Only the last in a sequence of projection operations is
needed, the others can be omitted.
L1 ( L2 (...( Ln ( E ))...)) L1 ( E )
4. Selections can be combined with Cartesian products
and theta joins.
a. (E1 X E2) = E1 E2
b. 1(E1 2 E 2) = E 1 1 2 E2
EQUIVALENCE RULES
(CONT.)
5.Theta-join operations (and natural joins) are
commutative.
E1 E2 = E2 E1
6. (a) Natural join operations are associative:
(E1 E 2) E3 = E1 (E2 E 3)
(b) Theta joins are associative in the following
manner:
(E1 1 E2) 2 3 E3 = E1 1 3 (E2 2 E 3)
where 2 involves attributes from only E2 and E3.
PICTORIAL DEPICTION OF
EQUIVALENCE RULES
EQUIVALENCE RULES
(CONT.)
7. The selection operation distributes over the theta join
operation under the following two conditions:
(a) When all the attributes in involve only the attributes
of one of the expressions (E1) being joined.
E1 E2) = ((E1)) E2
(b) When 1 involves only the attributes of E1 and 2 involves
only the attributes of E2.
1 E1 E2) = (1(E1)) ( (E2))
EQUIVALENCE RULES
(CONT.)
8.The projection operation distributes over the theta join
operation as follows:
(a) If involves only attributes from L1 L2:
L1 L2 ( E1 E2 ) ( L1 ( E1 )) ( L2 ( E2 ))
EQUIVALENCE RULES
(CONT.)
9. The set operations union and intersection are commutative
E1 E2 = E2 E1
E1 E2 = E2 E1
(set difference is not commutative).
10.Set union and intersection are associative.
(E1 E2) E3 = E1 (E2 E3)
(E1 E2) E3 = E1 (E2 E3)
11.The selection operation distributes over , and –.
(E1 – E2) = (E1) – (E2)
and similarly for and in place of –
Also: (E1 – E2) = (E1) – E2
and similarly for in place of –, but not for
12. The projection operation distributes over
union
L(E1 E2) = (L(E1)) (L(E2))
TRANSFORMATION EXAMPLE: PUSHING
SELECTIONS
Query: Find the names of all instructors in the Music
department, along with the titles of the courses that
they teach
name, title(dept_name= “Music”
(instructor (teaches course_id, title
(course))))
Transformation using rule 7a.
name, title((dept_name= “Music”(instructor))
(teaches course_id, title (course)))
instructor(ID, name, dept name, salary)
teaches(ID, course id, sec id, semester, year)
Performing the selection as early as possible reduces
course(course id, title, dept name, credits)
the size of the relation to be joined.
EXAMPLE WITH
MULTIPLE
TRANSFORMATIONS
Query: Find the names of all instructors in the Music
department who have taught a course in 2009, along with
the titles of the courses that they taught.
name, title(dept_name= “Music”year = 2009
(instructor (teaches course_id, title (course))))
Transformation using join associatively (Rule 6a):
name, title(dept_name= “Music”gear = 2009
((instructor teaches) course_id, title (course)))
Second form provides an opportunity to apply the
“perform selections early” rule, resulting in the
subexpression
dept_name = “Music” (instructor) year = 2009 (teaches)
MULTIPLE
TRANSFORMATIONS
(CONT.)
TRANSFORMATION EXAMPLE: PUSHING
PROJECTIONS
Consider: name, title(dept_name= “Music” (instructor) teaches)
course_id, title (course))))
When we compute
(dept_name = “Music” (instructor teaches)
we obtain a relation whose schema is:
(ID, name, dept_name, salary, course_id, sec_id, semester,
year)
Push projections using equivalence rules 8a; eliminate
unneeded attributes from intermediate results to get:
name, title(name, course_id (
dept_name= “Music” (instructor) teaches))
course_id, title (course))))
Performing the projection as early as possible reduces the
size of the relation to be joined.
JOIN ORDERING
EXAMPLE
For all relations r1, r2, and r3,
(r1 r2) r3 = r1 (r2 r3 )
(Join Associativity)
If r2 r3 is quite large and r1 r2 is small, we choose
(r1 r2) r3
so that we compute and store a smaller temporary
relation.
JOIN ORDERING EXAMPLE
(CONT.)
Consider the expression
name, title(dept_name= “Music” (instructor) teaches)
course_id, title (course))))
Could compute teaches course_id, title (course) first, and
join result with
dept_name= “Music” (instructor)
but the result of the first join is likely to be a large relation.
Only a small fraction of the university’s instructors are likely
to be from the Music department
It is better to compute
dept_name= “Music” (instructor) teaches
first.
END OF CHAPTER