0% found this document useful (0 votes)

270 views31 pages

Indexing vs Hashing in DBMS

This document discusses different methods for organizing records in files, including heap file organization, sequential file organization, and hashing file organization. It also covers topics like indexing, hashing, handling bucket overflows, and creating hash indices. The key methods discussed are ordered indices, static hashing, and dynamic hashing.

Uploaded by

Rajvinder Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

270 views31 pages

Indexing vs Hashing in DBMS

Uploaded by

Rajvinder Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 31

Chapter 12: Indexing and Hashing

Organization of Records in Files

Several of the possible ways of organizing records in

files are:

Heap file organization.

Any

record can be placed anywhere in the file where there is space for the record.

There is no ordering of records.

there is a single file for each relation

Typically,

Organization of Records in Files

Sequential file organization.
Records

are stored in sequential order, according to the value of a search key of each record.

Hashing file organization.

hash function is computed on some attribute of each record. result of the hash function specifies in which block of the file the record should be placed.

The

Sequential File Organization

A sequential file is designed for efficient processing

of records in sorted order based on some searchkey.

A search key is any attribute or set of attributes; it

need not be the primary key, or even a superkey.

Clustering File Organization

Relational-database systems store each relation in a separate

file.
A clustering file organization is a file organization, that stores

related records of two or more relations in each block.

Such a file organization allows us to read records that would

satisfy the join condition by using one block read.

Data-Dictionary Storage
A relational-database system needs to maintain data about the

relations, such as the schema of the relations. This information is called the data dictionary, or system catalog.
That contains:

Names of the relations Names of the attributes of each relation Domains and lengths of attributes

Integrity constraints (for example, key constraints)

In addition, many systems keep the following data on users of

the system:

Names of authorized users Accounting information about users Passwords or other information used to authenticate users

Chapter 12: Indexing and Hashing

Basic Concepts Ordered Indices Multiple-Key Access Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing

Basic Concepts
Indexing mechanisms used to speed up access to desired data.

E.g., author catalog in library

Search Key - attribute to set of attributes used to look up

records in a file.
An index file consists of records (called index entries) of the

form
search-key pointer

Index files are typically much smaller than the original file. Two basic kinds of indices:

Ordered indices: search keys are stored in sorted order. Hash indices: search keys are distributed uniformly across buckets and values from these buckets can access using a hash function.

Index Evaluation Metrics

Each technique must be evaluated on the basis of these factors:
Access type: finding records with a specified value and finding

records whose attribute values fall in a specified range.

Access time: The time it takes to find a particular data item. Insertion time: The time it takes to insert a new data item.

Finding the place to insert and time to update the index structure.

Deletion time: Space overhead: The additional space occupied by an index

structure.

Ordered Indices
In an ordered index, index entries are stored sorted on the

search key value. E.g., author catalog in library.

Primary index: in a sequentially ordered file, the index whose

search key specifies the sequential order of the file.

Also called clustering index

The search key of a primary index is usually but not necessarily the primary key.

Secondary index: an index whose search key specifies an order

different from the sequential order of the file. Also called non-clustering index.
Index-sequential file: ordered sequential file with a primary index.

Dense and Sparse Index

Dense index:

An index record appears for every search key value in the file. The index record contains the search key and a pointer to the first data record with that search-key value. An index is created only for a few values. Each index contains a value and pointer to first record that contains that value.

Sparse index:

Dense Index Files

Dense index Index record appears for every search-key

value in the file.

Sparse Index Files

Sparse Index: contains index records for only some search-key values.

Applicable when records are sequentially ordered on search-key Find index record with largest search-key value < K Search file sequentially starting at the record to which the index record points

To locate a record with search-key value K we:

Sparse Index Files (Cont.)

Compared to dense indices:

Less space and less maintenance overhead for insertions and deletions.

Generally slower than dense index for locating records.

Good tradeoff: sparse index with an index entry for every block in

file, corresponding to least search-key value in the block.

Multilevel Index
If primary index does not fit in memory, access becomes

expensive.
Solution: treat primary index kept on disk as a sequential file

and construct a sparse index on it.

outer index a sparse index of primary index inner index the primary index file

If even outer index is too large to fit in main memory, yet

another level of index can be created, and so on.

Indices at all levels must be updated on insertion or deletion

from the file.

Indices themselves may become too large for efficient

processing.
Example:

Consider file with 100000 records with 10 records in a block. With sparse index and one index per block we have about 10,000 indices.

Assuming 100 indices fit into a block we need about 100 blocks.
It is desirable to keep the index file in the main memory. Problem: Searching a large index file becomes expensive.

Multilevel Index (Cont.)

Index Update: Record Deletion

If deleted record was the only record in the file with its particular search-

key value, the search-key is deleted from the index also.

Single-level index deletion:

Dense indices deletion of search-key: similar to file record deletion.

Sparse indices

if deleted key value exists in the index, the value is replaced by the next search-key value in the file (in search-key order).

If the next search-key value already has an index entry, the entry is deleted instead of being replaced.

Index Update: Record Insertion

Single-level index insertion:

Perform a lookup using the key value from inserted record Dense indices if the search-key value does not appear in the index, insert it. Sparse indices if index stores an entry for each block of the file, no change needs to be made to the index unless a new block is created.
If

a new block is created, the first search-key value appearing in the new block is inserted into the index.

Multilevel insertion (as well as deletion) algorithms are simple

extensions of the single-level algorithms

Secondary Indices Example

Secondary index on balance field of account Index record points to a bucket that contains pointers to all the

actual records with that particular search-key value.

Secondary indices have to be dense

Hashing

Static Hashing
In a hash file organization, we obtain the address of the disk block

containing a
desired record directly by computing a function on the search-key

value of the record.

A bucket is a unit of storage containing one or more records (a

bucket is typically a disk block).

In a hash file organization we obtain the bucket of a record directly

from its search-key value using a hash function.

Hash function h is a function from the set of all search-key values K

to the set of all bucket addresses B.

Hash function is used to locate records for access, insertion as well

as deletion.

Example of Hash File Organization

Hash file organization of account file, using branch_name as key (See figure in next slide.)
There are 10 buckets,

The binary representation of the ith character is assumed to be the

integer i.
The hash function returns the sum of the binary representations of

the characters modulo 10

E.g. h(Perryridge) = 5

h(Round Hill) = 3 h(Brighton) = 3

Example of Hash File Organization

Hash file organization of account file, using branch_name as key (see previous slide for details).

Hash Functions
Worst hash function maps all search-key values to the same bucket;

this makes access time proportional to the number of search-key values in the file.
An ideal hash function is uniform, i.e., each bucket is assigned the

same number of search-key values from the set of all possible values.
Ideal hash function is random, so each bucket will have the same

number of records assigned to it irrespective of the actual distribution of search-key values in the file.

Handling of Bucket Overflows

If the bucket does not have enough space, a bucket overflow is said

to occur.
Bucket overflow can occur because of

Insufficient buckets
Skew in distribution of records. Some buckets are assigned more records than are others, so a bucket may overflow even when other buckets still have space.

Although the probability of bucket overflow can be reduced, it

cannot be eliminated; it is handled by using overflow buckets.

Handling of Bucket Overflows (Cont.)

Overflow chaining the overflow buckets of a given bucket are chained

together in a linked list.

Above scheme is called closed hashing.

An alternative, called open hashing, which does not use overflow buckets, is not suitable for database applications.

Hash Indices
Hashing can be used not only for file organization, but also for index-

structure creation.
A hash index organizes the search keys, with their associated record

pointers, into a hash file structure.

Strictly speaking, hash indices are always secondary indices

if the file itself is organized using hashing, a separate primary hash index on it using the same search-key is unnecessary. However, we use the term hash index to refer to both secondary index structures and hash organized files.

Example of Hash Index

Deficiencies of Static Hashing

In static hashing, function h maps search-key values to a fixed set of B

of bucket addresses. Databases grow or shrink with time.

If initial number of buckets is too small, and file grows, performance will degrade due to too much overflows. If space is allocated for anticipated growth, a significant amount of space will be wasted initially (and buckets will be underfull). If database shrinks, again space will be wasted.

One solution: periodic re-organization of the file with a new hash

function

Expensive, disrupts normal operations

Better solution: allow the number of buckets to be modified dynamically.

Dynamic Hashing
Good for database that grows and shrinks in size Allows the hash function to be modified dynamically
1.Choose a hash function based on the current file size. This option will

result in performance degradation as the database grows.

2. Choose a hash function based on the anticipated size of the file at

some point in the future. Although performance degradation is avoided, a significant amount of space may be wasted initially.
3. Periodically reorganize the hash structure in response to file growth.

Such a reorganization involves choosing a new hash function, recomputing the hash function on every record in the file, and generating new bucket assignments.
This reorganization is a massive, time-consuming operation.

Indexing and Hashing Techniques
No ratings yet
Indexing and Hashing Techniques
36 pages
22-File Organization-06-09-2024
No ratings yet
22-File Organization-06-09-2024
23 pages
Database Indexing Mechanisms Explained
No ratings yet
Database Indexing Mechanisms Explained
22 pages
DINLect 1
No ratings yet
DINLect 1
69 pages
Database Storage & Indexing Guide
No ratings yet
Database Storage & Indexing Guide
41 pages
Indexing and Hashing Techniques Explained
No ratings yet
Indexing and Hashing Techniques Explained
24 pages
UNIT-IV - File Organization
No ratings yet
UNIT-IV - File Organization
10 pages
DS TM Study Material Presentations Unit-4 1TM
No ratings yet
DS TM Study Material Presentations Unit-4 1TM
22 pages
Indexing
No ratings yet
Indexing
24 pages
CS2202 IndexingHashing
No ratings yet
CS2202 IndexingHashing
83 pages
Indexing
No ratings yet
Indexing
62 pages
File Organization
No ratings yet
File Organization
11 pages
File Organization-Lec11
No ratings yet
File Organization-Lec11
15 pages
Indexing Hashing Files
No ratings yet
Indexing Hashing Files
68 pages
Reorganizing Indexed Sequential Files
No ratings yet
Reorganizing Indexed Sequential Files
77 pages
Hashing and Indexing in DBMS
No ratings yet
Hashing and Indexing in DBMS
37 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
File Organizations and Indexes
No ratings yet
File Organizations and Indexes
51 pages
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
No ratings yet
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
22 pages
L4 Indexing
No ratings yet
L4 Indexing
56 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
Dbms Unit 5 Notes
No ratings yet
Dbms Unit 5 Notes
23 pages
Chapter 3 File Organization Indexed Methods
No ratings yet
Chapter 3 File Organization Indexed Methods
31 pages
Types of Indexing Methods Explained
No ratings yet
Types of Indexing Methods Explained
60 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
Index Method1
No ratings yet
Index Method1
24 pages
Class 6
No ratings yet
Class 6
15 pages
File Organization and Storage Access Guide
No ratings yet
File Organization and Storage Access Guide
185 pages
File Organization and Indexing Methods
No ratings yet
File Organization and Indexing Methods
35 pages
IT3020 L06 Indexing
No ratings yet
IT3020 L06 Indexing
41 pages
SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing
No ratings yet
SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing
33 pages
File Organization Methods
No ratings yet
File Organization Methods
22 pages
DBMS Unit9
No ratings yet
DBMS Unit9
44 pages
DBMS Unit5
No ratings yet
DBMS Unit5
40 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
No ratings yet
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
44 pages
1 - Disk Storage - Ch13
No ratings yet
1 - Disk Storage - Ch13
31 pages
INDEXING
No ratings yet
INDEXING
10 pages
File Organization Notes
No ratings yet
File Organization Notes
21 pages
Chapter 5. Record Storage and Primary File Organization
No ratings yet
Chapter 5. Record Storage and Primary File Organization
18 pages
DBMS Indexing 5
No ratings yet
DBMS Indexing 5
63 pages
Indexes
No ratings yet
Indexes
70 pages
CH 17 Sum
No ratings yet
CH 17 Sum
9 pages
Unit Iv Indexing and Hashing: Basic Concepts
No ratings yet
Unit Iv Indexing and Hashing: Basic Concepts
35 pages
Dbms Unit III Notes
No ratings yet
Dbms Unit III Notes
27 pages
RDBMS File Organization Guide
No ratings yet
RDBMS File Organization Guide
58 pages
Presentation ON File Organisation: Submitted To: Mrs. Sonal Beniwal
No ratings yet
Presentation ON File Organisation: Submitted To: Mrs. Sonal Beniwal
23 pages
CH 12 Updated
No ratings yet
CH 12 Updated
55 pages
Indexing Files: Last Time
No ratings yet
Indexing Files: Last Time
5 pages
DBMS Unit 5
No ratings yet
DBMS Unit 5
58 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
23 pages
Unit 1 Introduction To Dbms
No ratings yet
Unit 1 Introduction To Dbms
27 pages
m5 Index PDF
No ratings yet
m5 Index PDF
60 pages
5 Data Storage and Indexing
No ratings yet
5 Data Storage and Indexing
60 pages
DBMS Unit-5 Notes
No ratings yet
DBMS Unit-5 Notes
23 pages
E-R Diagrams for Data Modeling
No ratings yet
E-R Diagrams for Data Modeling
12 pages
GOTO Statement in PL/SQL Examples
No ratings yet
GOTO Statement in PL/SQL Examples
4 pages
Components of Database Management System: What Is DBMS?
No ratings yet
Components of Database Management System: What Is DBMS?
14 pages
Social Network Site Dev with Prototype
No ratings yet
Social Network Site Dev with Prototype
4 pages
Midpoint Ellipse Drawing Algorithm
No ratings yet
Midpoint Ellipse Drawing Algorithm
3 pages
WAP of Composite Transformation
No ratings yet
WAP of Composite Transformation
3 pages
3 D Transformation: WAP To Implement 3D Clipping and 3D Transformations
No ratings yet
3 D Transformation: WAP To Implement 3D Clipping and 3D Transformations
7 pages
Star Connect Retail Reqeust Form
No ratings yet
Star Connect Retail Reqeust Form
7 pages
Employee Leave Balances Summary
No ratings yet
Employee Leave Balances Summary
1 page
BTX: Advanced PC Form Factor Guide
No ratings yet
BTX: Advanced PC Form Factor Guide
2 pages
EAI Brochure
No ratings yet
EAI Brochure
12 pages
Introduction To Computer Hardware
No ratings yet
Introduction To Computer Hardware
29 pages
09-Swing Cook Book PDF
No ratings yet
09-Swing Cook Book PDF
23 pages
Deactivation and Installation of SSQAR on 8777
No ratings yet
Deactivation and Installation of SSQAR on 8777
2 pages
Arithmetic Instructions in 68K Assembly
No ratings yet
Arithmetic Instructions in 68K Assembly
76 pages
Tracegraph Install Guide Ubuntu 13.10
No ratings yet
Tracegraph Install Guide Ubuntu 13.10
4 pages
FPGA Architecture Overview
No ratings yet
FPGA Architecture Overview
29 pages
Pipelined Notes
No ratings yet
Pipelined Notes
10 pages
Computer Applications in Management
No ratings yet
Computer Applications in Management
44 pages
Service Bulletin Trucks: Immobilizer Feature
No ratings yet
Service Bulletin Trucks: Immobilizer Feature
7 pages
Multi-AP Roaming Network Guide
100% (1)
Multi-AP Roaming Network Guide
2 pages
Manual
No ratings yet
Manual
12 pages
Glib Docs PDF
No ratings yet
Glib Docs PDF
616 pages
Microcontroller System Design Course Plan
No ratings yet
Microcontroller System Design Course Plan
3 pages
Flynn's Parallel Processing Types
No ratings yet
Flynn's Parallel Processing Types
2 pages
8085 Microprocessors Lab Manual
No ratings yet
8085 Microprocessors Lab Manual
139 pages
Pci e Error Rate
No ratings yet
Pci e Error Rate
6 pages
Advanced View of Projects Raspberry Pi List - Raspberry PI Projects
No ratings yet
Advanced View of Projects Raspberry Pi List - Raspberry PI Projects
191 pages
Simulink Modeling For An Electrical Vehicle
No ratings yet
Simulink Modeling For An Electrical Vehicle
8 pages
Asr9k Book PDF
No ratings yet
Asr9k Book PDF
708 pages
Recording Studio Talkback System Guide
100% (1)
Recording Studio Talkback System Guide
1 page
Java-Based Password Protection Tool
No ratings yet
Java-Based Password Protection Tool
3 pages
1.1 Communication Skills - I: Rationale
No ratings yet
1.1 Communication Skills - I: Rationale
25 pages
Designing A Complete Vehicle Safety and Alert System Integrated Iot
No ratings yet
Designing A Complete Vehicle Safety and Alert System Integrated Iot
70 pages
Office Protocol Guide for Staff
No ratings yet
Office Protocol Guide for Staff
3 pages
Unit 4 - Digital Circuit and Design - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Digital Circuit and Design - WWW - Rgpvnotes.in
21 pages
Comprehensive MS-DOS Command List
No ratings yet
Comprehensive MS-DOS Command List
15 pages
Programmable Temperature Controller
No ratings yet
Programmable Temperature Controller
2 pages
Restore Router 941nd
No ratings yet
Restore Router 941nd
4 pages

Indexing vs Hashing in DBMS

Uploaded by

Indexing vs Hashing in DBMS

Uploaded by

Chapter 12: Indexing and Hashing

Organization of Records in Files

Heap file organization.

There is no ordering of records.

Organization of Records in Files

Hashing file organization.

Sequential File Organization

of records in sorted order based on some searchkey.

need not be the primary key, or even a superkey.

Clustering File Organization

related records of two or more relations in each block.

satisfy the join condition by using one block read.

Integrity constraints (for example, key constraints)

In addition, many systems keep the following data on users of

Chapter 12: Indexing and Hashing

E.g., author catalog in library

Search Key - attribute to set of attributes used to look up

Index Evaluation Metrics

records whose attribute values fall in a specified range.

Deletion time: Space overhead: The additional space occupied by an index

search key value. E.g., author catalog in library.

search key specifies the sequential order of the file.

Also called clustering index

Secondary index: an index whose search key specifies an order

Dense and Sparse Index

Dense Index Files

value in the file.

Sparse Index Files

To locate a record with search-key value K we:

Sparse Index Files (Cont.)

Generally slower than dense index for locating records.

file, corresponding to least search-key value in the block.

and construct a sparse index on it.

If even outer index is too large to fit in main memory, yet

another level of index can be created, and so on.

from the file.

Indices themselves may become too large for efficient

Multilevel Index (Cont.)

Index Update: Record Deletion

key value, the search-key is deleted from the index also.

Dense indices deletion of search-key: similar to file record deletion.

Index Update: Record Insertion

Multilevel insertion (as well as deletion) algorithms are simple

extensions of the single-level algorithms

Secondary Indices Example

actual records with that particular search-key value.

value of the record.

bucket is typically a disk block).

from its search-key value using a hash function.

to the set of all bucket addresses B.

Example of Hash File Organization

The binary representation of the ith character is assumed to be the

the characters modulo 10

h(Round Hill) = 3 h(Brighton) = 3

Example of Hash File Organization

Handling of Bucket Overflows

Although the probability of bucket overflow can be reduced, it

cannot be eliminated; it is handled by using overflow buckets.

Handling of Bucket Overflows (Cont.)

together in a linked list.

pointers, into a hash file structure.

Example of Hash Index

Deficiencies of Static Hashing

of bucket addresses. Databases grow or shrink with time.

One solution: periodic re-organization of the file with a new hash

Expensive, disrupts normal operations

Better solution: allow the number of buckets to be modified dynamically.

result in performance degradation as the database grows.

You might also like