0% found this document useful (0 votes)

27 views10 pages

Basic Concepts of Query Processing

Uploaded by

Rashmi Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views10 pages

Basic Concepts of Query Processing

Uploaded by

Rashmi Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Basic concepts of query processing;

Query processing is the set of activities a database management system

(DBMS) performs to translate a user's high-level query (like a SQL statement)
into a series of low-level, executable operations. The main goal is to execute
the query as efficiently as possible to minimize the time and resources needed
to retrieve the data.
This process is generally broken down into three main phases:
1. Parsing and Translation
This is the first step where the DBMS analyzes the query. It's essentially a
"compilation" phase.
• Lexical Analysis: The query string is broken down into meaningful
components or "tokens," such as keywords (e.g., SELECT, FROM,
WHERE), attribute names, and values.
• Syntactic Analysis: The system checks the query's syntax to ensure it
follows the rules of the language (e.g., SQL). If there's a typo, like
"SELCT," an error is reported.
• Semantic Analysis: The DBMS verifies that the query is logically sound. It
checks the database's catalog to ensure that all tables and columns
mentioned actually exist and that the user has the necessary
permissions.
• Translation: The validated query is then converted into an internal
representation, often a relational algebra expression or a query tree,
which is a more structured format the system can work with.

2. Optimization
This is the most critical phase for performance. The optimizer takes the internal
representation of the query and generates multiple possible execution plans to
retrieve the same data. It then estimates the cost of each plan and chooses the
one with the lowest cost.
• Generating Plans: The optimizer considers different ways to execute the
query. For example, it might decide to use an index to find specific rows,
or perform a full table scan. If the query involves multiple tables, it will
consider different join orders and join algorithms (e.g., nested-loop join,
hash join).
• Cost Estimation: The cost of each plan is calculated based on factors like
the number of disk I/O operations, CPU usage, and estimated number of
rows to be processed. This is where the database's statistics on tables
and indexes are crucial.
• Selecting the Best Plan: The optimizer chooses the most efficient plan,
which is often referred to as the query execution plan.

3. Evaluation
In this final phase, the chosen execution plan is executed. The query execution
engine takes the plan and runs the necessary operations to produce the final
result.
• Execution: The engine carries out the primitive operations defined in the
plan, such as reading data from disk, applying filters, performing joins,
sorting, and aggregating.
• Result Retrieval: The final result set is assembled and returned to the
user or application that submitted the query.
฀converting SQL queries into Relational Algebra;
Converting a SQL query into a Relational Algebra expression is a crucial step in
database query processing. Relational Algebra is a procedural query language,
which means it describes how to get a result, while SQL is a declarative
language, which describes what to get. The database management system's
(DBMS) query optimizer translates the declarative SQL into a procedural
Relational Algebra expression to generate an efficient execution plan.
The core idea is to map the clauses of a SQL query to the fundamental
operators of relational algebra.
Key Mappings Between SQL and Relational Algebra

• FROM Clause Cartesian Product (×) or Joins (⋈) The FROM clause,
which specifies the tables, is typically translated into a Cartesian Product
of those tables. This combines every row from the first table with every
row from the second table. However, since a Cartesian product can be
inefficient, the optimizer often converts this into a Join operation (⋈),
which is a more direct way of combining related tuples from different
tables.

• WHERE Clause Selection (σ) The WHERE clause's filtering condition

is represented by the Selection operator (sigma). This unary operator is
applied to a relation and filters out the rows (tuples) that do not satisfy
the specified condition. For example, WHERE Salary > 50000 becomes
sigma_textSalary50000.

• SELECT Clause Projection (π) The SELECT clause, which lists the
columns to be returned, is translated into the Projection operator (pi).
This unary operator is used to select specific columns (attributes) from a
relation, effectively creating a new relation with a vertical subset of the
original table. For example, SELECT Name, Age becomes
pi_textName,Age.

• GROUP BY and Aggregate Functions (e.g., COUNT, SUM)

Aggregation (γ) Queries with GROUP BY and aggregate functions are
translated using the Aggregation operator (gamma). This operator
partitions the tuples into groups based on the GROUP BY clause and then
applies aggregate functions to each group.
A Simple Example
Consider the following SQL query:
SQL
SELECT Fname, Lname
FROM Employee
WHERE Salary > 50000;
This query can be converted into the following Relational Algebra expression:
πFname, Lname(σSalary>50000(Employee))
This expression shows the order of operations:
1. Selection (sigma): Filter the Employee table to keep only the rows where
Salary is greater than 50,000.
2. Projection (pi): Take the result of the selection and project it onto the
Fname and Lname columns.

Basic Algorithms for executing query operations;

Selection (WHERE Clause)
For a SELECT statement with a WHERE clause, the database system must find
all tuples that satisfy a given condition.
• File Scan: The most basic approach is a full table scan or file scan. The
system reads every single tuple from the table and checks if it meets the
condition. This is simple but can be very slow for large tables.
• Index-based Selection: A much more efficient method is to use an index.
If a WHERE clause condition involves an indexed column, the system can
use the index (like a book's index) to quickly locate the relevant tuples
without scanning the entire table.

Projection (SELECT Clause)

The projection operation (the SELECT clause) extracts specific columns from a
relation.
• Tuple-by-tuple: This is the simplest algorithm. The system reads each
tuple, creates a new tuple containing only the desired columns, and
writes it to the output.
• Columnar storage: In some modern databases, data is stored in columns
instead of rows. This makes projection very efficient, as the system only
needs to read the relevant columns from the disk, ignoring the others
entirely.

Join Operation
The join operation combines data from two or more tables based on a join
condition. This is often the most expensive operation and is why a query
optimizer is so important.
• Nested Loop Join: This is the most basic join algorithm. For each tuple in
the outer relation (the first table), the algorithm iterates through every
tuple in the inner relation (the second table) and checks if the join
condition is met.
o Cost: High, especially for large tables, because it requires a full
scan of the inner relation for every tuple of the outer one.
• Sort-Merge Join: This is an efficient algorithm when the tables are large.
1. Sort Phase: The algorithm first sorts both relations on their
respective join attributes.
2. Merge Phase: It then scans both sorted relations simultaneously.
Pointers are used to find matching tuples, and the results are
combined.
• Hash Join: This is often the fastest join algorithm for large, unsorted
relations with an equi-join condition.
1. Build Phase: The algorithm builds a hash table on the smaller of
the two relations, using the join attribute as the hash key.
2. Probe Phase: It then scans the larger relation, and for each tuple,
it uses the hash function to look for matching tuples in the hash
table.

Aggregation (GROUP BY)

Aggregation functions like COUNT, SUM, AVG, MIN, and MAX summarize data.
When used with a GROUP BY clause, the system must first group the data
before calculating the aggregate value for each group.
• Sorting-based Aggregation: The relation is first sorted on the grouping
attributes. Then, a single scan is performed, and aggregate values are
calculated for each group of consecutive, identical values.
• Hashing-based Aggregation: A hash table is built in memory, with the
grouping attributes as keys and the aggregate values as values. The
system scans the relation and updates the aggregate value for each
group found in the hash table. This is often faster than sorting if the hash
table fits in memory.
Query tree and query graph;
Both a query tree and a query graph are internal representations of a query
used by a database system during the optimization and execution phases.
However, they differ in how they model the query and the information they
convey about the execution plan.
Query Tree
A query tree is a tree data structure that represents a specific Relational
Algebra expression. It shows a hierarchical, procedural plan for executing a
query.
• Structure:
o Leaf nodes: Represent the input relations (tables) from the FROM
clause.
o Internal nodes: Represent the relational algebra operators (like σ
for Selection, π for Projection, and ⋈ for Join) that will be applied
to the relations.
o Edges: Show the flow of data and the order of operations, with
the root node representing the final result.
• Key Characteristic: A query tree explicitly defines a specific order of
operations. This makes it a direct representation of a query execution
plan. For a single SQL query, a database optimizer may generate many
different, equivalent query trees to find the most efficient one.

Query Graph
A query graph is a graph data structure (which can have cycles) that represents
a query's Relational Calculus or declarative form. It represents the
relationships and conditions in a query without specifying an execution order.
• Structure:
o Nodes: Represent the relations (tables) involved in the query.
o Edges: Represent the join conditions and selection predicates
(from the WHERE clause) that connect the relations. Attributes to
be retrieved are sometimes shown within the nodes.
• Key Characteristic: A query graph is a non-procedural representation. It
shows all the constraints and relationships but leaves the ordering of
operations up to the query optimizer. There is usually only one canonical
query graph for a given query, from which multiple query trees can be
derived.
Heuristic optimization of query tree, functional dependencies, normal forms
Heuristic Optimization of Query Tree
Heuristic optimization is a rule-based approach to improving a query's
execution plan. The primary goal is to transform an initial, unoptimized query
tree into a more efficient, equivalent one. This is done by applying a set of rules
that, while not guaranteeing the absolute best plan, significantly reduce the
search space and lead to a good plan. The most important heuristic rule is to
perform selections and projections as early as possible to minimize the size of
intermediate results.
Key Heuristic Rules:
• Push Selections Down: Move the selection (σ) operation as close to the
leaf nodes (tables) as possible. This filters out a large number of rows
early, reducing the data that subsequent operations (like joins) have to
process.
• Push Projections Down: Similarly, move the projection (π) operation
down the tree. This reduces the number of attributes in intermediate
results, saving memory and disk I/O.
• Combine Operations: Replace a Cartesian product followed by a
selection with a more efficient join operation (⋈), as joins are generally
faster than performing a full Cartesian product.
• Reorder Joins: Reorder the join operations to process the most
restrictive joins first. A join that produces a smaller intermediate result is
more efficient, as it reduces the input size for the next join.

Functional Dependencies
A functional dependency (FD) is a constraint between two sets of attributes in
a relational database. It states that the value of one set of attributes (the
determinant) uniquely determines the value of another set of attributes (the
dependent). It's written as X→Y, which means that for any two tuples, if they
have the same value for attribute set X, they must also have the same value for
attribute set Y.
• Trivial Functional Dependency: A dependency X→Y is trivial if Y is a
subset of X. For example, {SSN, EmployeeID} \to {SSN} is trivial.
• Transitive Dependency: A dependency X→Z is transitive if there is an
intermediate attribute set Y such that X→Y and Y→Z, and Y does not
determine X.
• Partial Dependency: A dependency X→Y is partial if X is a composite key
and Y is determined by a proper subset of X. For example, if {StudentID,
CourseID} is a key and {StudentID} \to {StudentName} is true, this is a
partial dependency.
Functional dependencies are the theoretical basis for database normalization.

Normalization is a process for organizing data in a database to reduce

redundancy and improve data integrity. It involves a series of rules, called
normal forms, that help you design a logical schema. The main goal is to break
down a large table into smaller, well-structured tables and establish
relationships between them.
The process helps prevent three types of data anomalies:
• Insertion Anomaly: You can't add new data without adding data to other
required fields.
• Deletion Anomaly: Deleting a record unintentionally deletes other
related data.
• Update Anomaly: You have to update the same data in multiple places,
which can lead to inconsistencies.

The Most Common Normal Forms

First Normal Form (1NF)
A table is in 1NF if all its attribute values are atomic, meaning each cell contains
only a single value. It also eliminates repeating groups of data.
• Rule: Eliminate repeating groups and ensure each column has a single,
non-divisible value.
Second Normal Form (2NF)
A table is in 2NF if it's in 1NF and all non-key attributes are fully functionally
dependent on the primary key. This eliminates partial dependencies, where a
non-key attribute is dependent on only part of a composite primary key.
• Rule: Be in 1NF and remove partial dependencies.
Third Normal Form (3NF)
A table is in 3NF if it's in 2NF and has no transitive dependencies. A transitive
dependency occurs when a non-key attribute is determined by another non-
key attribute.
• Rule: Be in 2NF and remove transitive dependencies.

Denormalization
While normalization is crucial for data integrity, it can sometimes reduce query
performance because it requires complex joins to combine data from multiple
tables. Denormalization is a technique used in performance-critical
applications (like data warehousing) to intentionally add some controlled
redundancy to a previously normalized database. The goal is to speed up read
operations by reducing the need for costly joins, at the expense of potentially
slower write operations and increased data redundancy. It's a trade-off that
database designers make to optimize for specific use cases.

Boyce-Codd Normal Form (BCNF) is a stricter version of Third Normal Form

(3NF) that addresses a specific type of anomaly not handled by 3NF. A table is
in BCNF if and only if for every non-trivial functional dependency X→Y, X is a
superkey.
Key Differences from 3NF
While 3NF ensures that non-key attributes are not transitively dependent on
the primary key, it does not account for a more subtle issue: a non-trivial
functional dependency between two candidate keys. This is where BCNF
comes in. A table can be in 3NF but not in BCNF if:
• The table has two or more overlapping candidate keys.
• There is a functional dependency where a determinant is not a superkey,
and the dependent is part of another candidate key.
In simple terms, 3NF focuses on dependencies involving non-key attributes,
while BCNF applies to all functional dependencies.
Example
Consider a table Student_Course_Advisor with attributes {Student, Course,
Advisor}. The functional dependencies are:
1. {Student, Course} → Advisor (a student enrolled in a course has a single
advisor)
2. Advisor → Student (each advisor advises only one student)
From these dependencies, the candidate keys are {Student, Course} and
{Advisor, Course}.
This table is in 3NF because there are no transitive dependencies. However, it is
not in BCNF because the dependency Advisor → Student has Advisor as the
determinant, but Advisor is not a superkey on its own. It's only part of a
candidate key. To normalize this into BCNF, we would decompose the table into
two:
• Student_Advisor ({Student, Advisor})
• Student_Course ({Student, Course})
This decomposition removes the dependency Advisor → Student from the
original table, satisfying BCNF and eliminating the potential for data anomalies.

SQL Server Query Processing Guide
No ratings yet
SQL Server Query Processing Guide
10 pages
Presentation9 - Query Processing and Query Optimization in DBMS
No ratings yet
Presentation9 - Query Processing and Query Optimization in DBMS
36 pages
Chapter - 2-Query Prosessing and Optimization
No ratings yet
Chapter - 2-Query Prosessing and Optimization
44 pages
Chapter 5
No ratings yet
Chapter 5
45 pages
Ch-2 Query Processing and Optimization
No ratings yet
Ch-2 Query Processing and Optimization
26 pages
ADBMS Chapter One
No ratings yet
ADBMS Chapter One
21 pages
Chapter 2 Query Processing
No ratings yet
Chapter 2 Query Processing
21 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
44 pages
CO3-Notes-Query Processing and Optimization
No ratings yet
CO3-Notes-Query Processing and Optimization
5 pages
Chapter 1 - Query Processing and Optimization
No ratings yet
Chapter 1 - Query Processing and Optimization
62 pages
Chapter 2-Query Processing and Optimi
No ratings yet
Chapter 2-Query Processing and Optimi
43 pages
Unit 5 Query Processing Detail
No ratings yet
Unit 5 Query Processing Detail
38 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
63 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
7 pages
Module - 1
No ratings yet
Module - 1
94 pages
Database Query Optimization Guide
No ratings yet
Database Query Optimization Guide
5 pages
QUERY Processing and Relational Algebra
No ratings yet
QUERY Processing and Relational Algebra
27 pages
CO3 Session 7
No ratings yet
CO3 Session 7
32 pages
Query Optimization
No ratings yet
Query Optimization
103 pages
Query Optimization Techniques
100% (1)
Query Optimization Techniques
38 pages
Advanced Database System Chapter Two Query Processing and Optimization
No ratings yet
Advanced Database System Chapter Two Query Processing and Optimization
50 pages
Module 1 - Query Processing
No ratings yet
Module 1 - Query Processing
20 pages
Query Processing & Optimization
No ratings yet
Query Processing & Optimization
60 pages
Query Processing in DBMS
100% (1)
Query Processing in DBMS
22 pages
Ch-1 - Query Processing and Optimization
No ratings yet
Ch-1 - Query Processing and Optimization
39 pages
Advanced D.base 4
No ratings yet
Advanced D.base 4
20 pages
RDBMS Unit-3
No ratings yet
RDBMS Unit-3
13 pages
AMSAL
No ratings yet
AMSAL
58 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
23 pages
DE Module5 QueryOptimization
No ratings yet
DE Module5 QueryOptimization
11 pages
Query Optimization Techniques in DBMS
No ratings yet
Query Optimization Techniques in DBMS
61 pages
Advanced SQL for Database Experts
No ratings yet
Advanced SQL for Database Experts
34 pages
2.advanced Database System
No ratings yet
2.advanced Database System
184 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
31 pages
Ch-2 Query Processing and Optimization
No ratings yet
Ch-2 Query Processing and Optimization
21 pages
ADBS - Chapter Two
No ratings yet
ADBS - Chapter Two
41 pages
Query Optimization Part1
No ratings yet
Query Optimization Part1
52 pages
Query Processing in DBMS Explained
No ratings yet
Query Processing in DBMS Explained
20 pages
Module 4
No ratings yet
Module 4
33 pages
Query Execution
No ratings yet
Query Execution
87 pages
Database System 3.19
No ratings yet
Database System 3.19
33 pages
CH 2
No ratings yet
CH 2
97 pages
1 Intro Select Project
No ratings yet
1 Intro Select Project
28 pages
Advanced Database Systems: Chapter 3:query Processing and Evaluation
100% (1)
Advanced Database Systems: Chapter 3:query Processing and Evaluation
36 pages
Understanding Query Processing Steps
No ratings yet
Understanding Query Processing Steps
5 pages
Dbms Seminar
No ratings yet
Dbms Seminar
24 pages
Understanding Database Queries
No ratings yet
Understanding Database Queries
48 pages
SQL Data Wrangling Techniques Guide
No ratings yet
SQL Data Wrangling Techniques Guide
85 pages
ADBMS CH 2
No ratings yet
ADBMS CH 2
35 pages
Chapter 2
No ratings yet
Chapter 2
50 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
44 pages
Chapter 1
No ratings yet
Chapter 1
10 pages
Lab 3 - Search Based On Conditions
No ratings yet
Lab 3 - Search Based On Conditions
9 pages
Understanding Query Processing Steps
No ratings yet
Understanding Query Processing Steps
28 pages
Query Processing
No ratings yet
Query Processing
8 pages
Join Type and Calculas
No ratings yet
Join Type and Calculas
36 pages
Chapter 2
No ratings yet
Chapter 2
47 pages
Describe All Inheritance in Java With Examples 14 ...
No ratings yet
Describe All Inheritance in Java With Examples 14 ...
4 pages
1crop Prediction Analysis
No ratings yet
1crop Prediction Analysis
15 pages
BYOK
No ratings yet
BYOK
27 pages
Cloud Security Posture Management
No ratings yet
Cloud Security Posture Management
5 pages
Importance of Query Optimization in DBMS
No ratings yet
Importance of Query Optimization in DBMS
7 pages
SQL JOINs and UNION Tutorial
No ratings yet
SQL JOINs and UNION Tutorial
82 pages
Assignment Mythical Creatures I
No ratings yet
Assignment Mythical Creatures I
4 pages
SQL Commands and Examples Guide
No ratings yet
SQL Commands and Examples Guide
11 pages
Oracle Apps Reports Interview Questions
100% (2)
Oracle Apps Reports Interview Questions
33 pages
SAP HANA Guide Book
No ratings yet
SAP HANA Guide Book
110 pages
Project Report Format Int MTech
No ratings yet
Project Report Format Int MTech
48 pages
Oracle For Beginners
100% (2)
Oracle For Beginners
78 pages
Sir Padampat Singhania Education Centre: Kamla Nagar, Kanpur
No ratings yet
Sir Padampat Singhania Education Centre: Kamla Nagar, Kanpur
4 pages
Structured Query Language
No ratings yet
Structured Query Language
37 pages
Course Outline - 20461D
No ratings yet
Course Outline - 20461D
8 pages
SQL Example
No ratings yet
SQL Example
11 pages
Computer Science Class 12 - SQL
No ratings yet
Computer Science Class 12 - SQL
31 pages
Database Testing Essentials Guide
No ratings yet
Database Testing Essentials Guide
20 pages
PL SQL Practical
No ratings yet
PL SQL Practical
14 pages
Relational Model Concepts and SQL Overview
No ratings yet
Relational Model Concepts and SQL Overview
82 pages
(Optional) Db2 Lab - Sub-Queries and Nested SELECTs
No ratings yet
(Optional) Db2 Lab - Sub-Queries and Nested SELECTs
4 pages
Day Wise Notes-1
No ratings yet
Day Wise Notes-1
53 pages
Dbisam 4 Rsdelphiwin 6410 R
No ratings yet
Dbisam 4 Rsdelphiwin 6410 R
1,044 pages
Database Systems Lab 6 Joining Multiple Tables
No ratings yet
Database Systems Lab 6 Joining Multiple Tables
5 pages
SQL Subqueries: Examples & Usage
No ratings yet
SQL Subqueries: Examples & Usage
15 pages
FCP Sec Ops FortiAnalyzer Analyst 7 4 Study Guide Stamped
No ratings yet
FCP Sec Ops FortiAnalyzer Analyst 7 4 Study Guide Stamped
216 pages
SQL Case Study 1
No ratings yet
SQL Case Study 1
10 pages
CBSE Computer Science Practical Exam
No ratings yet
CBSE Computer Science Practical Exam
8 pages
Nemo Analyze Custom KPI Guidelines
100% (1)
Nemo Analyze Custom KPI Guidelines
42 pages
DBMS Cep 6
No ratings yet
DBMS Cep 6
11 pages
SQL Basics: Schema and Data Types
No ratings yet
SQL Basics: Schema and Data Types
37 pages
Humana Interview Prep
No ratings yet
Humana Interview Prep
10 pages
Database Transaction & Control
No ratings yet
Database Transaction & Control
23 pages
Employee Management System Project File - Aman Sunesh
No ratings yet
Employee Management System Project File - Aman Sunesh
47 pages
CampusX DSMP Syllabus
No ratings yet
CampusX DSMP Syllabus
48 pages

Basic Concepts of Query Processing

Uploaded by

Basic Concepts of Query Processing

Uploaded by

Basic concepts of query processing;

Query processing is the set of activities a database management system

• WHERE Clause Selection (σ) The WHERE clause's filtering condition

• GROUP BY and Aggregate Functions (e.g., COUNT, SUM)

Basic Algorithms for executing query operations;

Projection (SELECT Clause)

Aggregation (GROUP BY)

Normalization is a process for organizing data in a database to reduce

The Most Common Normal Forms

Boyce-Codd Normal Form (BCNF) is a stricter version of Third Normal Form

You might also like