File Organization in DBMS
+
Index & Hashing in DBMS
GROUP NO 5
Introduction to File Organization
Definition: File organization refers to the way data is
stored in a database.
Importance: Efficient file organization is crucial for
quick data retrieval, efficient storage utilization, and
overall system performance.
Objectives
● To understand the different types of file organization.
● To learn the advantages and disadvantages of each type.
● To know how file organization impacts database
performance.
Types of File Organization
1. Heap (Unordered) File Organization
2. Sequential File Organization
3. Hashing File Organization
4. Clustered File Organization
5. Indexing File Organization
Heap (Unordered) File Organization
Description: Data is stored in the order it is inserted.
Advantages:
● Simple and easy to implement.
● Efficient for bulk loading of data.
Disadvantages:
● Slow retrieval as a linear search is required.
● Inefficient use of storage if many deletions occur.
Sequential File Organization
Description: Data is stored in a sequential order based on a key field.
Advantages:
● Efficient for range queries.
● Easier to implement binary search for fast retrieval.
Disadvantages:
● Insertion, deletion, and updates can be costly.
● Requires reorganization to maintain order.
Hashing File Organization
Description: Uses a hash function to determine the location of data.
Advantages:
● Very fast access for exact match queries.
● Efficient for large databases.
Disadvantages:
● Not suitable for range queries.
● Collisions can affect performance and require additional handling (e.g., chaining, open
addressing).
Clustered File Organization
● Description: Related records are grouped and stored together based
on a clustering field.
● Advantages:
○ Improves performance for related data retrieval.
○ Efficient use of I/O operations.
● Disadvantages:
○ Complexity in managing and maintaining clusters.
○ Can lead to wasted space if clustering is not properly managed.
Indexing File Organization
Description: Uses an index to quickly locate data records without searching the
entire file.
Types of Indexes:
● Single-level Index: Simple index where each entry points to a data block.
● Multi-level Index: Indexes of indexes, useful for very large datasets.
● B-tree Index: Balanced tree structure, widely used in databases.
Indexing File Organization
Advantages:
● Significant speedup in data retrieval.
● Supports both exact match and range queries.
Disadvantages:
● Additional storage for indexes.
● Overhead of maintaining indexes during insertions, deletions, and updates.
Introduction to Indexing and Hashing
Definition: Techniques used to optimize the speed of
data retrieval in a database.
Importance: Critical for enhancing the performance and
efficiency of database queries.
Objectives
● Understand the concepts of indexing and hashing.
● Learn the types and techniques of indexing and hashing.
● Identify the advantages and disadvantages of each
method.
● Explore practical use cases.
What is Indexing?
Description: A data structure that improves the
speed of data retrieval operations on a database
table.
Purpose: To quickly locate and access the data
without searching every row in a database table.
Types of Indexes
1. Primary Index
2. Secondary Index
3. Clustered Index
4. Non-Clustered Index
5. Unique Index
6. Composite Index
Primary Index
● Description: An index on a set of fields that
includes the primary key for the table.
● Advantages: Ensures uniqueness of data.
● Disadvantages: Only one primary index per table.
Secondary Index
Description: An index that is not a primary index and can be
created on non-primary key fields.
Advantages: Allows for efficient access to data based on non-
key attributes.
Disadvantages: Requires additional storage and maintenance.
Clustered Index
Description: Sorts the data rows in the table on their
key values. Only one clustered index per table.
Advantages: Improves performance of range queries.
Disadvantages: Expensive to maintain for insertions
and deletions.
Non-Clustered Index
Description: Contains a sorted list of references to the table
data, separate from the actual table.
Advantages: Multiple non-clustered indexes can exist per
table.
Disadvantages: Slower than clustered index for range
queries.
Advantages of Indexing
● Faster data retrieval.
● Efficient for range queries.
● Improves overall database performance.
Disadvantages of Indexing
● Additional storage space required.
● Overhead for index maintenance during data
modifications.
● Can slow down write operations (insertions, updates,
deletions).
What is Hashing?
Description: A technique to directly map a key to its
location in the storage, using a hash function.
Purpose: To provide constant-time access to data for
exact match queries.
Types of Hashing
1. Static Hashing
2. Dynamic Hashing
Static Hashing
Description: Fixed number of primary pages. Hash
function maps search-key values to the set of pages.
Advantages: Simple and easy to implement.
Disadvantages: Performance degrades as the dataset
grows (overflow pages).
Dynamic Hashing
● Description: The hash function is dynamically
modified to accommodate the growth of the database.
● Advantages: Scalable and handles growing datasets
efficiently.
● Disadvantages: More complex to implement and
manage.
Hash Functions
Definition: Function that converts input into a fixed-
size string of bytes.
Properties: Deterministic, uniform distribution, fast
computation, and minimal collisions.
Handling Collisions
Chaining: Each bucket in the hash table points to a
linked list of records.
Open Addressing: Searches for the next free slot
within the hash table using techniques like linear
probing, quadratic probing, or double hashing.
Advantages of Hashing
● Very fast data retrieval for exact match queries.
● Efficient use of storage space.
● Simple implementation for static hashing.
Disadvantages of Hashing
● Not suitable for range queries.
● Potential for collisions, requiring collision
resolution techniques.
● Dynamic hashing can be complex to implement.