0% found this document useful (0 votes)
31 views8 pages

Query Processing

Query processing involves translating SQL queries into low-level instructions, optimizing them for efficiency, and executing them to retrieve data from databases. The goal is to minimize costs associated with disk accesses and read/write operations while ensuring accurate results. Key steps include parsing, translation, optimization, execution planning, and evaluation of query plans to determine the most efficient method for data retrieval.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views8 pages

Query Processing

Query processing involves translating SQL queries into low-level instructions, optimizing them for efficiency, and executing them to retrieve data from databases. The goal is to minimize costs associated with disk accesses and read/write operations while ensuring accurate results. Key steps include parsing, translation, optimization, execution planning, and evaluation of query plans to determine the most efficient method for data retrieval.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

\Query Processing

Query Processing would mean the entire process or activity which involves query
translation into low level instructions, query optimization to save resources, cost
estimation or evaluation of query, and extraction of data from the database.
Goal: To find an efficient Query Execution Plan for a given SQL query which would
minimize the cost considerably, especially time.
Cost Factors: Disk accesses [which typically consumes time], read/write
operations [which typically needs resources such as memory/RAM].
The major steps involved in query processing are depicted in the figure below;

Figure 1 - Steps in Database Query Processing


Let us discuss the whole process with an example. Let us consider the following
two relations as the example tables for our discussion;

Employee(Eno, Ename, Phone)


Proj_Assigned(Eno, Proj_No, Role, DOP)
where,
Eno is Employee number,
Ename is Employee name,
Proj_No is Project Number in which an employee is assigned,
Role is the role of an employee in a project,
DOP is duration of the project in months.
With this information, let us write a query to find the list of all employees who are
working in a project which is more than 10 months old.
SELECT Ename
FROM Employee, Proj_Assigned
WHERE Employee.Eno = Proj_Assigned.Eno AND DOP > 10;
Input:
A query written in SQL is given as input to the query processor. For our case, let
us consider the SQL query written above.
Step 1: Parsing
In this step, the parser of the query processor module checks the syntax of the
query, the user’s privileges to execute the query, the table names and attribute
names, etc. The correct table names, attribute names and the privilege of the
users can be taken from the system catalog (data dictionary).
Step 2: Translation
If we have written a valid query, then it is converted from high level language
SQL to low level instruction in Relational Algebra.
For example, our SQL query can be converted into a Relational Algebra
equivalent as follows;
πEname(σDOP>10 Λ Employee.Eno=Proj_Assigned.Eno(Employee X Prof_Assigned))
Step 3: Optimizer
Optimizer uses the statistical data stored as part of data dictionary. The
statistical data are information about the size of the table, the length of records,
the indexes created on the table, etc. Optimizer also checks for the conditions
and conditional attributes which are parts of the query.
Step 4: Execution Plan
A query can be expressed in many ways. The query processor module, at this
stage, using the information collected in step 3 to find different relational algebra
expressions that are equivalent and return the result of the one which we have
written already.
For our example, the query written in Relational algebra can also be written as
the one given below;

πEname(Employee ⋈ Eno (σDOP>10 (Prof_Assigned)))


So far, we have got two execution plans. Only condition is that both plans should
give the same result.
Step 5: Evaluation
Though we got many execution plans constructed through statistical data,
though they return same result (obvious), they differ in terms of Time
consumption to execute the query, or the Space required executing the query.
Hence, it is mandatory choose one plan which obviously consumes less cost.
At this stage, we choose one execution plan of the several we have developed.
This Execution plan accesses data from the database to give the final result.
In our example, the second plan may be good. In the first plan, we join two
relations (costly operation) then apply the condition (conditions are considered as
filters) on the joined relation. This consumes more time as well as space.
In the second plan, we filter one of the tables (Proj_Assigned) and the result is
joined with the Employee table. This join may need to compare less number of
records. Hence, the second plan is the best (with the information known, not
always).
Output:
The final result is shown to the user.

The overall information discussed above are depicted in Figure 2 below;


Figure 2 - Query Processing [Note: in Step 4, NJ means Natural Join]

○ Indexing is used to optimize the performance of a database by minimizing the number


of disk accesses required when a query is processed.

○ The index is a type of data structure. It is used to locate and access the data in a
database table quickly.

Index structure:
Indexes can be created using some database columns.

○ The first column of the database is the search key that contains a copy of the primary
key or candidate key of the table. The values of the primary key are stored in sorted
order so that the corresponding data can be accessed easily.

○ The second column of the database is the data reference. It contains a set of pointers
holding the address of the disk block where the value of the particular key can be
found.

8. Query Processing
Goals: Understand the basic concepts underlying the
steps in
query processing and optimization and estimating query
processing
cost; apply query optimization techniques;
Contents:
_ Overview
_ Catalog Information for Cost Estimation
_ Measures of Query Cost
_ Selection
_ Join Operations
_ Other Operations
_ Evaluation and Transformation of Expressions
Query Processing & Optimization
Task: Find an efficient physical query plan (aka execution
plan)
for an SQL query
Goal: Minimize the evaluation time for the query, i.e.,
compute
query result as fast as possible
Cost Factors: Disk accesses, read/write operations, [I/O,
page
transfer] (CPU time is typically ignored)
Catalog Information for Cost Estimation
Information about relations and attributes:
_ NR: number of tuples in the relation R.
_ BR: number of blocks that contain tuples of the relation
R.
_ SR: size of a tuple of R.
_ FR: blocking factor; number of tuples from R that _t into
one
block (FR = dNR=BRe)
_ V(A; Measures of Query Cost
_ There are many possible ways to estimate cost, e.g.,
based on
disk accesses, CPU time, or communication overhead.
_ Disk access is the predominant cost (in terms of time);
relatively easy to estimate; therefore, number of block
transfers
from/to disk is typically used as measure.
{ Simplifying assumption: each block transfer has the
same
cost.
_ Cost of algorithm (e.g., for join or selection) depends on
database buffer size; more memory for DB buffer reduces
disk
accesses. Thus DB buffer size is a parameter for
estimating
cost.
_ We refer to the cost estimate of algorithm S as cost(S).
We
do not consider cost of writing output to disk.
Selection Operation
_A=a(R) where a is a constant value, A an attribute of R
_ File Scan { search algorithms that locate and retrieve
records
that satisfy a selection condition
_ S1 { Linear search
cost(S1)= BRR): number of distinct values for attribute A
in R.
_ SC(A; R): selectivity of attribute A
_ average number of tuples of R that satisfy an
equality condition on A.
SC(A; R) = NR=V(A; R).
Information about indexes:
_ HTI: number of levels in index I (B+-tree).
_ LBI: number of blocks occupied by leaf nodes in index I
(first-level blocks).
_ Vali: number of distinct values for the search key.

You might also like