University of Salahaddin
College of Engineering
Software & Informatics Engineering Department
DISK STORAGE AND
INDEXING
2ND STAGE
Lecturer:
Dr. Hanan Kamal
2
Content
• Introduction
• Storage of Database
• Files
• Types of Single-Level Ordered Indexes
• Primary Indexes
• Clustering Indexes
• Secondary Indexes
3
Introduction
• Computerized database → (physically) computer storage
medium.
• Computer storage media form a storage hierarchy that
includes two main categories:
1. Primary storage
2. Secondary and tertiary storage
• Usually Primary :
• Provides fast access to data but is of limited storage capacity.
• More expensive and have less storage capacity than secondary
and tertiary storage devices.
4
Storage of Databases
• Databases typically store large amounts of data that must
persist over long periods of time, and hence is often
referred to as persistent data.
• Transient data Parts of data are accessed and processed
repeatedly during this period.
• Databases are stored permanently (or persistently) on
magnetic disk, for the following reasons:
• Size
• losses
• The cost
5
Storage of Databases (cont.)
• The data stored on disk is organized as files of records.
• Each record is a collection of data values that can be
interpreted as facts about entities, their attributes, and
their relationships.
• There are several primary file organizations
• heap file (or unordered file)
• sorted file (or sequential file) (sort key).
• hashed file (hash key)
• B-trees
6
Files
Unordered (heap) files Ordered (Sorted) Files
1. New records are 1. File records are kept
inserted at the end of sorted .
the file. 2. A binary search can be
2. To search for a record, used
a linear search . 3. Insertion is expensive:
3. Record insertion is records must be
quite efficient. inserted in the correct
4. Reading the records in order.
order of a particular 4. Reading the records in
field requires sorting order of the ordering
the file records. field is quite efficient.
7
• Some blocks of
an ordered
(sequential) file
of EMPLOYEE
records with
Name as the
ordering key
field.
What is Indexing?
• A data structure to quickly locate records without scanning
the entire table.
• Acts like a book index for faster access.
• Improves read/query performance significantly.
Why Indexing Matters
• Reduces I/O cost of query execution.
• Improves performance for SELECT, JOIN, and ORDER
BY.
• Supports constraints like UNIQUE and PRIMARY KEY.
10
Types of Single-Level Ordered Indexes
• Indexing idea (book index)
• An index access structure is usually defined on a single field of
a file, called an indexing field (or indexing attribute).
• Indexes can also be characterized as dense or
sparse.
• A dense index has an index entry for every search
key value (and hence every record) in the data file.
• A sparse (or nondense) index, has index entries
for only some of the search values.
11
Primary Indexes
• A primary index is an ordered file whose records are of
fixed length with two fields
• Index entry (or index record) in the index file, will refer to
the two field values of index entry i as <K(i), P(i)>.
• K(i) is of t he same data type as the ordering
key Field called the primary key of the data file
• P(i) is a pointer to a disk block (a block
address).
12
13
Clustering Indexes
• If file records are physically ordered on a nonkey field
which does not have a distinct value for each record that
field is called the clustering field and the data file is
called a clustered file.
• A clustering index is also an ordered file with two
fields:
1. field is of the same type as the clustering field of the
data file
2. disk block pointer.
14
15
Secondary Indexes
• A secondary index provides a secondary means of
accessing a data file for which some primary access
already exists.
• The data file records could be ordered, unordered.
• The secondary index may be created on:
1. Candidate key and has a unique value in every record
2. Nonkey field with duplicate values.
16
Secondary index on a key, nonordering field of a
file.
17
Secondary index on a nonkey, nonordering field
of a file.
18
Summary of Indexes
19
Thank you
20
1. What is indexing?
2. Talk about
[Link] Indexes
[Link] Indexes
[Link] Indexes
University of Salahaddin
College of Engineering
Software & Informatics Engineering Department
QUERY OPTIMIZATION
ND
2 STAGE
Lecturer:
Dr. Hanan Kamal
22
Introduction
• In network and hierarchical DBMSs, low-level
procedural query language is generally
embedded in high-level programming language.
• Programmer’s responsibility to select most
appropriate execution strategy.
• With declarative languages such as SQL, user
specifies what data is required rather than how it
is to be retrieved.
• Relieves user of knowing what constitutes good
execution strategy.
23
Introduction
• Also gives DBMS more control over system
performance.
• Two main techniques for query
optimization:
• Heuristic rules that order operations in a query;
• Comparing different strategies based on relative
costs, and selecting one that minimizes
resource usage.
• Disk access tends to be dominant cost in
query processing for centralized DBMS.
24
Query Processing
Activities involved in retrieving data from
the database.
• Aims of QP:
• Transform query written in high-level language
(e.g. SQL), into correct and efficient execution
strategy expressed in low-level language
(implementing RA);
• Execute strategy to retrieve required data.
Introduction to Query Optimization
• Query optimization (QO) is the process of
choosing the most efficient way to execute a SQL
query.
• The importance of QO:
- Reduces response time
- Minimizes resource usage
- Essential for large-scale databases
26
Query Optimization
• A query optimizer evaluates multiple query execution
plans and chooses the best one based on cost
estimates.
• As there are many equivalent transformations of same
high-level query, aim of QO is to choose one that
minimizes resource usage.
• Generally, reduce total execution time of query.
• May also reduce response time of query.
• Problem computationally intractable with large number of
relations, so strategy adopted is reduced to finding near
optimum solution.
Goals of Query Optimization
- Minimize I/O cost
- Reduce CPU usage
- Ensure fastest response time
- Avoid unnecessary computations
Cost-Based Optimization
This method evaluates alternative plans using a
cost model based on:
- Disk I/O
- CPU cost
- Network latency
Rule-Based Optimization
Uses heuristics (rules) to transform queries:
- Push selections early
- Combine projections
- Join order optimization
30
Example 21.1 - Different Strategies
Find all Managers who work at a London
branch.
SELECT *
FROM Staff s, Branch b
WHERE [Link] = [Link] AND
([Link] = ‘Manager’ AND [Link] = ‘London’);
31
Example 21.1 - Different Strategies
• Three equivalent RA queries are:
(1) (position='Manager') (city='London')
([Link]=[Link]) (Staff X Branch)
(2) (position='Manager') (city='London')(
Staff [Link]=[Link] Branch)
(3) (position='Manager'(Staff))
[Link]=[Link]
(city='London' (Branch))
Join Optimization
Join operations can be expensive. Types of join
algorithms:
- Nested Loop Join
- Hash Join
- Merge Join
Subquery Optimization
• Rewrite subqueries as joins or EXISTS to
improve performance.
View Materialization
• Views can be materialized (stored) or
virtual (computed per query).
Materialization improves performance at
the cost of storage.
Statistics for Optimization
Database optimizers use table statistics:
- Row counts
- Value distributions
- Index usage
Query Rewriting Techniques
- Remove redundant conditions
- Simplify expressions
- Use EXISTS instead of IN
37
Phases of Query Processing
Summary
✔ Query optimization is crucial for efficient
database systems
✔ Use indexes, rewrite queries, and analyze
execution plans
✔ Know your data and schema
39
Thank you