0% found this document useful (0 votes)

44 views37 pages

Unit IV Part II

The document outlines the course objectives and outcomes for CS312 Database Management Systems, focusing on logical database design, programming languages, transaction processing, and database architectures. It discusses query processing, including parsing, optimization, and evaluation, as well as measures of query cost and various selection and join operations. Additionally, it covers evaluation strategies such as materialization and pipelining, and introduces query optimization techniques to improve execution plans based on cost estimation.

Uploaded by

heramb.churi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views37 pages

Unit IV Part II

Uploaded by

heramb.churi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

CS312

Database Management Systems

School of Computer Engineering and Technology

CS312 Database Management Systems
Course Objectives:
1) Understand and successfully apply logical database design principles, including E-R
diagrams and database normalization.
2) Learn Database Programming languages and apply in DBMS application
3) Understand transaction processing and concurrency control in DBMS
4) Learn database architectures, DBMS advancements and its usage in advance
application

Course Outcomes:
5) Design ER-models to represent simple database application scenarios and Improve the
database design by normalization.
6) Design Database Relational Model and apply SQL , PLSQL concepts for database
programming
7) Describe Transaction Processing and Concurrency Control techniques for databases
8) Identify appropriate database architecture for the real world database application
Query Processing

Overview
Measures of Query Cost
Selection Operation
Sorting
Join Operation
Evaluation of Expressions
Basic Steps in Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
Basic Steps in Query Processing
(Cont.)
Parsing and translation
translate the query into its internal form. This
is then translated into relational algebra.
Parser checks syntax, verifies relations
Evaluation
The query-execution engine takes a query-
evaluation plan, executes that plan, and
returns the answers to the query.
Basic Steps in Query Processing :
Optimization
A relational algebra expression may have many equivalent expressions
E.g., salary75000(salary(instructor)) is equivalent to
salary(salary75000(instructor))
Each relational algebra operation can be evaluated using one of several
different algorithms
Correspondingly, a relational-algebra expression can be evaluated
in many ways.
Annotated expression specifying detailed evaluation strategy is called
an evaluation-plan.
E.g., can use an index on salary to find instructors with salary <
75000,
or can perform complete relation scan and discard instructors with
salary  75000
Basic Steps: Optimization (Cont.)
Query Optimization: Amongst all equivalent evaluation plans choose the one with lowest cost
Cost is estimated using statistical information from the
database catalog
e.g. number of tuples in each relation, size of tuples, etc.
In this chapter we study
How to measure query costs
Algorithms for evaluating relational algebra operations
How to combine algorithms for individual operations in order to evaluate a complete expre
In Chapter 14
We study how to optimize queries, that is, how to find an evaluation plan with lowest estim
cost
Measures of Query Cost
Cost is generally measured as total elapsed time for answering query
Many factors contribute to time cost
disk accesses, CPU, or even network communication
Typically disk access is the predominant cost, and is also relatively easy
to estimate. Measured by taking into account
Number of seeks * average-seek-cost
Number of blocks read * average-block-read-cost
Number of blocks written * average-block-write-cost
Cost to write a block is greater than cost to read a block
data is read back after being written to ensure that the write
was successful
Measures of Query Cost (Cont.)

For simplicity we just use the number of block

transfers from disk and the number of seeks as the
cost measures
tT – time to transfer one block
tS – time for one seek
Cost for b block transfers plus S seeks
b * tT + S * tS
We ignore CPU costs for simplicity
Real systems do take CPU cost into account
We do not include cost to writing output to disk in our
cost formulae
Measures of Query Cost (Cont.)

Several algorithms can reduce disk IO by using extra

buffer space
Amount of real memory available to buffer
depends on other concurrent queries and OS
processes, known only during execution
We often use worst case estimates, assuming
only the minimum amount of memory needed
for the operation is available
Required data may be buffer resident already,
avoiding disk I/O
But hard to take into account for cost estimation
Selection Operation
File scan
Algorithm A1 (linear search). Scan each file block and test all records to see
whether they satisfy the selection condition.
Cost estimate = br block transfers + 1 seek
br denotes number of blocks containing records from relation r
If selection is on a key attribute, can stop on finding record
cost = (br /2) block transfers + 1 seek
Linear search can be applied regardless of
selection condition or
ordering of records in the file, or
availability of indices
Note: binary search generally does not make sense since data is not stored
consecutively
except when there is an index available,
and binary search requires more seeks than index search
Selections Using Indices

Index scan – search algorithms that use an index

selection condition must be on search-key of index.
A2 (primary index, equality on key). Retrieve a single record
that satisfies the corresponding equality condition
Cost = (hi + 1) * (tT + tS)
A3 (primary index, equality on nonkey) Retrieve multiple
records.
Records will be on consecutive blocks
Let b = number of blocks containing matching records
Cost = hi * (tT + tS) + tS + tT * b
Selections Using Indices

A4 (secondary index, equality on nonkey).

Retrieve a single record if the search-key is a candidate key
Cost = (hi + 1) * (tT + tS)
Retrieve multiple records if search-key is not a candidate
key
each of n matching records may be on a different block
Cost = (hi + n) * (tT + tS)
Can be very expensive!
Selections Involving Comparisons
Can implement selections of the form AV (r) or A  V(r) by using
a linear file scan,
or by using indices in the following ways:
A5 (primary index, comparison). (Relation is sorted on A)
For A  V(r) use index to find first tuple  v and scan relation sequentially from there
For AV (r) just scan relation sequentially till first tuple > v; do not use index
A6 (secondary index, comparison).
For A  V(r) use index to find first index entry  v and scan index sequentially from
there, to find pointers to records.
For AV (r) just scan leaf pages of index finding pointers to records, till first entry > v
In either case, retrieve records that are pointed to
requires an I/O for each record
Linear file scan may be cheaper
Implementation of Complex Selections
Conjunction: 1 2. . . n(r)
A7 (conjunctive selection using one index).
Select a combination of i and algorithms A1 through A7 that results in
the least cost for i (r).
Test other conditions on tuple after fetching it into memory buffer.
A8 (conjunctive selection using composite index).
Use appropriate composite (multiple-key) index if available.
A9 (conjunctive selection by intersection of identifiers).
Requires indices with record pointers.
Use corresponding index for each condition, and take intersection of all
the obtained sets of record pointers.
Then fetch records from file
If some conditions do not have appropriate indices, apply test in memory.
Algorithms for Complex Selections
Disjunction:1 2 . . . n (r).
A10 (disjunctive selection by union of identifiers).
Applicable if all conditions have available indices.
Otherwise use linear scan.
Use corresponding index for each condition, and take union of all
the obtained sets of record pointers.
Then fetch records from file
Negation: (r)
Use linear scan on file
If very few records satisfy , and an index is applicable to 
Find satisfying records using index and fetch from file
Join Operation

Several different algorithms to implement joins

Nested-loop join
Block nested-loop join
Indexed nested-loop join
Merge-join
Hash-join
Choice based on cost estimate
Examples use the following information
Number of records of student: 5,000 takes: 10,000
Number of blocks of student: 100 takes: 400
Nested-Loop Join

To compute the theta join r s

for each tuple tr in r do begin
for each tuple ts in s do begin
test pair (tr,ts) to see if they satisfy the join condition 
if they do, add tr • ts to the result.
end
end
r is called the outer relation and s the inner relation of the join.
Requires no indices and can be used with any kind of join condition.
Expensive since it examines every pair of tuples in the two relations.
Nested-Loop Join (Cont.)
In the worst case, if there is enough memory only to hold one block of each relation, the
estimated cost is
nr  bs + br block transfers, plus
nr + b r seeks
If the smaller relation fits entirely in memory, use that as the inner relation.
Reduces cost to br + bs block transfers and 2 seeks
Assuming worst case memory availability cost estimate is
with student as outer relation:
5000  400 + 100 = 2,000,100 block transfers,
5000 + 100 = 5100 seeks
with takes as the outer relation
10000  100 + 400 = 1,000,400 block transfers and 10,400 seeks
If smaller relation (student) fits entirely in memory, the cost estimate will be 500 block
transfers.
Block nested-loops algorithm (next slide) is preferable.
Example of Nested-Loop Join Costs
Compute student takes, with student as the outer relation.
Let takes have a primary B+-tree index on the attribute ID, which contains 20
entries in each index node.
Since takes has 10,000 tuples, the height of the tree is 4, and one more access is
needed to find the actual data
student has 5000 tuples
Cost of block nested loops join
400*100 + 100 = 40,100 block transfers + 2 * 100 = 200 seeks
assuming worst case memory
may be significantly less with more memory
Cost of indexed nested loops join
100 + 5000 * 5 = 25,100 block transfers and seeks.
CPU cost likely to be less than that for block nested loops join
Evaluation of Expressions
So far: we have seen algorithms for individual operations
Alternatives for evaluating an entire expression tree
Materialization: generate results of an expression whose
inputs are relations or are already computed, materialize
(store) it on disk. Repeat.
Pipelining: pass on tuples to parent operations even as an
operation is being executed
We study above alternatives in more detail
Materialization
Materialized evaluation: evaluate one operation at a time, starting at the
lowest-level. Use intermediate results materialized into temporary relations
to evaluate next-level operations.
E.g., in figure below, compute and store
 building"Watson" (department)

then compute the store its join with instructor, and finally compute the
projection on name.
Materialization (Cont.)

Materialized evaluation is always applicable

Cost of writing results to disk and reading them back can be quite high
Our cost formulas for operations ignore cost of writing results to
disk, so
Overall cost = Sum of costs of individual operations +
cost of writing intermediate results to disk
Double buffering: use two output buffers for each operation, when
one is full write it to disk while the other is getting filled
Allows overlap of disk writes with computation and reduces
Pipelining

Pipelined evaluation : evaluate several operations simultaneously, passing the results of one
operation on to the next.
E.g., in previous expression tree, don’t store result of
 building"Watson" (department)
instead, pass tuples directly to the join.. Similarly, don’t store result of join, pass tuples directly
to projection.
Much cheaper than materialization: no need to store a temporary relation to disk.
Pipelining may not always be possible – e.g., sort, hash-join.
For pipelining to be effective, use evaluation algorithms that generate output tuples even as tuples
are received for inputs to the operation.
Pipelines can be executed in two ways: demand driven and producer driven
Pipelining (Cont.)

In demand driven or lazy evaluation

system repeatedly requests next tuple from top level operation
Each operation requests next tuple from children operations as required, in order
to output its next tuple
In between calls, operation has to maintain “state” so it knows what to return next
In producer-driven or eager pipelining
Operators produce tuples eagerly and pass them up to their parents
Buffer maintained between operators, child puts tuples in buffer, parent
removes tuples from buffer
if buffer is full, child waits till there is space in the buffer, and then generates
more tuples
System schedules operations that have space in output buffer and can process
more input tuples
Alternative name: pull and push models of pipelining
Pipelining (Cont.)

Implementation of demand-driven pipelining

Each operation is implemented as an iterator implementing the following
operations
open()
E.g. file scan: initialize file scan
state: pointer to beginning of file
E.g.merge join: sort relations;
state: pointers to beginning of sorted relations
next()
E.g. for file scan: Output next tuple, and advance and store file
pointer
E.g. for merge join: continue with merge from earlier state till
next output tuple is found. Save pointers as iterator state.
close()
Evaluation Algorithms for Pipelining

Some algorithms are not able to output results even as they get input tuples
E.g. merge join, or hash join
intermediate results written to disk and then read back
Algorithm variants to generate (at least some) results on the fly, as input
tuples are read in
E.g. hybrid hash join generates output tuples even as probe relation tuples in the
in-memory partition (partition 0) are read in
Double-pipelined join technique: Hybrid hash join, modified to buffer partition 0
tuples of both relations in-memory, reading them as they become available, and
output results of any matches between partition 0 tuples
When a new r0 tuple is found, match it with existing s0 tuples, output
matches, and save it in r0
Symmetrically for s0 tuples
Introduction to Query optimization
Introduction
Alternative ways of evaluating a given query
Equivalent expressions
Different algorithms for each operation
Introduction (Cont.)
An evaluation plan defines exactly what algorithm is used for each operation,
and how the execution of the operations is coordinated.

 Find out how to view query execution plans on your favorite database
Introduction (Cont.)

Cost difference between evaluation plans for a query can be enormous

E.g. seconds vs. days in some cases
Steps in cost-based query optimization
1. Generate logically equivalent expressions using equivalence rules
2. Annotate resultant expressions to get alternative query plans
3. Choose the cheapest plan based on estimated cost
Estimation of plan cost based on:
Statistical information about relations. Examples:
number of tuples, number of distinct values for an attribute
Statistics estimation for intermediate results
to compute cost of complex expressions
Cost formulae for algorithms, computed using statistics
Transformation of Relational Expressions

Two relational algebra expressions are said to be equivalent if the two

expressions generate the same set of tuples on every legal database
instance
Note: order of tuples is irrelevant
we don’t care if they generate different results on databases that
violate integrity constraints
In SQL, inputs and outputs are multisets of tuples
Two expressions in the multiset version of the relational algebra are
said to be equivalent if the two expressions generate the same
multiset of tuples on every legal database instance.
An equivalence rule says that expressions of two forms are equivalent
Can replace expression of first form by second, or vice versa
Equivalence Rules
1. Conjunctive selection operations can be deconstructed into a
sequence of individual selections.
sq1 Ùq 2 ( E ) = s (s ( E ))
q1 q2
2. Selection operations are commutative.

s q (s q ( E )) = s q (s q ( E ))
1 2 2 1
3. Only the last in a sequence of projection operations is needed, the
others can be omitted.

 L1 ( L2 ( ( Ln ( E )) ))   L1 ( E )
Equivalence Rules
5. Theta-join operations (and natural joins) are commutative.
E1  E2 = E2  E1
6. (a) Natural join operations are associative:
(E1 E2) E3 = E1 (E2 E3)

(b) Theta joins are associative in the following manner:

(E1 1 E2) 2 3 E3 = E1 1 3 (E2 2 E3)

where 2 involves attributes from only E2 and E3.

Equivalence Rules (Cont.)

7. The selection operation distributes over the theta join operation

under the following two conditions:
(a) When all the attributes in 0 involve only the attributes of one
of the expressions (E1) being joined.

0E1  E2) = (0(E1))  E2

(b) When  1 involves only the attributes of E1 and 2 involves

only the attributes of E2.
1 E1  E2) = (1(E1))  ( (E2))
Pictorial Depiction of Equivalence Rules
End

Ch12-Query Processing
No ratings yet
Ch12-Query Processing
34 pages
06 Query Processing (2) - NDN
No ratings yet
06 Query Processing (2) - NDN
31 pages
Query Processing for CS Students
No ratings yet
Query Processing for CS Students
47 pages
Query Optimization Techniques by Warih Maharani
No ratings yet
Query Optimization Techniques by Warih Maharani
39 pages
DBMS Unit5 Lecture1
No ratings yet
DBMS Unit5 Lecture1
22 pages
DBMS R19 Unit Iv
No ratings yet
DBMS R19 Unit Iv
25 pages
Database Query Processing & Security
No ratings yet
Database Query Processing & Security
39 pages
Query Processing: Steps and Optimization
No ratings yet
Query Processing: Steps and Optimization
30 pages
QueryProcess Optim
No ratings yet
QueryProcess Optim
60 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
Database Technology Query Processing: Heiko Paulheim
No ratings yet
Database Technology Query Processing: Heiko Paulheim
60 pages
Advanced Database Systems Lecture Notes
No ratings yet
Advanced Database Systems Lecture Notes
79 pages
DBMS
No ratings yet
DBMS
24 pages
1.3 PPT - Measure of Query Cost
100% (1)
1.3 PPT - Measure of Query Cost
42 pages
Unit-2 Query Processing and Optimization, Query Equivalence, Join Strategies
No ratings yet
Unit-2 Query Processing and Optimization, Query Equivalence, Join Strategies
38 pages
Query Processing and Optimization Techniques
No ratings yet
Query Processing and Optimization Techniques
15 pages
Unit-2 Query Processing and Optimization, Query Equivalence, Join Strategies
No ratings yet
Unit-2 Query Processing and Optimization, Query Equivalence, Join Strategies
37 pages
Dbms Chapter 5
No ratings yet
Dbms Chapter 5
54 pages
05 Optimization
No ratings yet
05 Optimization
58 pages
Database Query Optimization Guide
No ratings yet
Database Query Optimization Guide
38 pages
Lecture11 Query Processing
No ratings yet
Lecture11 Query Processing
37 pages
Query Evaluation Techniques and Costs
No ratings yet
Query Evaluation Techniques and Costs
17 pages
05 QueryProcessing LecW4 Feb7 22
No ratings yet
05 QueryProcessing LecW4 Feb7 22
55 pages
13 QP1
No ratings yet
13 QP1
33 pages
Unit 3 - DBMS
No ratings yet
Unit 3 - DBMS
15 pages
QEII
No ratings yet
QEII
44 pages
Query Processing & Evaluation Guide
No ratings yet
Query Processing & Evaluation Guide
23 pages
Basic Steps in Query Processing
No ratings yet
Basic Steps in Query Processing
49 pages
Database Query Optimization Guide
No ratings yet
Database Query Optimization Guide
127 pages
Lecture Notes
No ratings yet
Lecture Notes
96 pages
Introduction To Database Management Systems CS470
No ratings yet
Introduction To Database Management Systems CS470
11 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
55 pages
3 Query Processing and Optimization-1
No ratings yet
3 Query Processing and Optimization-1
18 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
45 pages
Lesson 05
No ratings yet
Lesson 05
29 pages
Overview of Query Evaluation: R&G Chapter 12
No ratings yet
Overview of Query Evaluation: R&G Chapter 12
30 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
49 pages
Chapter 13: Query Processing: Database System Concepts, 5th Ed
No ratings yet
Chapter 13: Query Processing: Database System Concepts, 5th Ed
55 pages
Unit 4 - Query Processing
No ratings yet
Unit 4 - Query Processing
49 pages
Chap 12
No ratings yet
Chap 12
73 pages
BCS Topic
No ratings yet
BCS Topic
66 pages
Unit 4
No ratings yet
Unit 4
24 pages
Query Processing: Cost and Optimization
No ratings yet
Query Processing: Cost and Optimization
25 pages
L10-Query Evaluaion
No ratings yet
L10-Query Evaluaion
50 pages
Lesson 06
No ratings yet
Lesson 06
44 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
28 pages
Query Processing and Optimisation - Intr
No ratings yet
Query Processing and Optimisation - Intr
41 pages
ch15 Queryprocessing
No ratings yet
ch15 Queryprocessing
63 pages
Chapter 12: Query Processing
No ratings yet
Chapter 12: Query Processing
57 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
53 pages
Measures of Query Cost
No ratings yet
Measures of Query Cost
15 pages
Measures of Query Cost
No ratings yet
Measures of Query Cost
15 pages
Chapter 12 - 2
No ratings yet
Chapter 12 - 2
38 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
33 pages
Query Processing Concepts
No ratings yet
Query Processing Concepts
99 pages
Query Processing for DBMS Students
No ratings yet
Query Processing for DBMS Students
13 pages
Cost Estimation For Query Optimization
No ratings yet
Cost Estimation For Query Optimization
14 pages
Query Processing
No ratings yet
Query Processing
77 pages
Database System Final Project
No ratings yet
Database System Final Project
3 pages
Week 7 Practice Questions - Normalization
No ratings yet
Week 7 Practice Questions - Normalization
4 pages
TAFJ-Oracle Install 12c
100% (1)
TAFJ-Oracle Install 12c
17 pages
GCP Data Storage & BigQuery Guide
No ratings yet
GCP Data Storage & BigQuery Guide
15 pages
Database Management System K Scheme Expected Paper
No ratings yet
Database Management System K Scheme Expected Paper
3 pages
Free SQL Book Material PDF Download
No ratings yet
Free SQL Book Material PDF Download
198 pages
Road Map To Final Project Viva Important by NASIR ABBAS
No ratings yet
Road Map To Final Project Viva Important by NASIR ABBAS
7 pages
Class XI Python Exam Prep
0% (1)
Class XI Python Exam Prep
6 pages
SQL Server Cheat Sheet: by Via
No ratings yet
SQL Server Cheat Sheet: by Via
1 page
SAP Note 1598594 BRTools With Oracle User
No ratings yet
SAP Note 1598594 BRTools With Oracle User
8 pages
IT Project (CBSE)
No ratings yet
IT Project (CBSE)
16 pages
The Evolution of Database Management System
No ratings yet
The Evolution of Database Management System
2 pages
P2 - ER Diagram
No ratings yet
P2 - ER Diagram
2 pages
Advantages of DDBMs
No ratings yet
Advantages of DDBMs
11 pages
Computer Science B.SC
No ratings yet
Computer Science B.SC
109 pages
Basic CREATE TABLE Mysql Statement
No ratings yet
Basic CREATE TABLE Mysql Statement
3 pages
Enable Quick Reboot
No ratings yet
Enable Quick Reboot
1 page
Database Normalization
No ratings yet
Database Normalization
18 pages
Linq Revision
No ratings yet
Linq Revision
22 pages
Mongodb Dba Course Content
No ratings yet
Mongodb Dba Course Content
2 pages
Symantec Data Loss Prevention Upgrade Guide For Windows: February 4, 2021
No ratings yet
Symantec Data Loss Prevention Upgrade Guide For Windows: February 4, 2021
74 pages
ACLScript 16: Comprehensive Guide
No ratings yet
ACLScript 16: Comprehensive Guide
2,765 pages
IMC 451: Organization of Information: Topic 6: Bibliographic Control Activities
No ratings yet
IMC 451: Organization of Information: Topic 6: Bibliographic Control Activities
35 pages
Learning Apache Spark With Python
No ratings yet
Learning Apache Spark With Python
10 pages
084 Liza Bda File
No ratings yet
084 Liza Bda File
23 pages
Durgesh Sr. Data Architect / Modeler/Bigdata
100% (1)
Durgesh Sr. Data Architect / Modeler/Bigdata
5 pages
18csl58 Dbms Lab Manual 2022-23
No ratings yet
18csl58 Dbms Lab Manual 2022-23
72 pages
SAP System Refresh Steps Guide
No ratings yet
SAP System Refresh Steps Guide
7 pages
PHP DRCP Fan Resultcaches 131994
No ratings yet
PHP DRCP Fan Resultcaches 131994
33 pages
Database Learning for Students
No ratings yet
Database Learning for Students
161 pages

Unit IV Part II

Uploaded by

Unit IV Part II

Uploaded by

CS312

Database Management Systems

School of Computer Engineering and Technology

For simplicity we just use the number of block

Several algorithms can reduce disk IO by using extra

Index scan – search algorithms that use an index

A4 (secondary index, equality on nonkey).

Several different algorithms to implement joins

To compute the theta join r s

Materialized evaluation is always applicable

In demand driven or lazy evaluation

Implementation of demand-driven pipelining

Cost difference between evaluation plans for a query can be enormous

Two relational algebra expressions are said to be equivalent if the two

(b) Theta joins are associative in the following manner:

(E1 1 E2) 2 3 E3 = E1 1 3 (E2 2 E3)

where 2 involves attributes from only E2 and E3.

7. The selection operation distributes over the theta join operation

0E1  E2) = (0(E1))  E2

(b) When  1 involves only the attributes of E1 and 2 involves

You might also like