File Organization
File Organization refers to the logical relationships among various records that
constitute the file, particularly with respect to the means of identification and access to
any specific record. In simple terms, Storing the files in a certain order is called File
Organization.
Objective of File Organization
It helps in the faster selection of records i.e. it makes the process faster.
Different Operations like inserting, deleting, and updating different records are
faster and easier.
It helps in storing the records or the data very efficiently at a minimal cost.
Types of File Organization
1. Sequential file organization: The easiest method for file organization is
sequential method. In this method files are stored one after the other in a
sequential manner. This method is fast & efficient for huge amount of data.
There are two ways to implement this method:
a) Pile File Method
In this method, records are stored in sequential order, one after another,
and they are inserted at the end of the file in the same order in which we
insert them in the table using the SQL query. In case of modification or
deletion of any record, it is first searched throughout the file, and when it
is found the updation or deletion operation is performed.
b) Sorted file method:
As the name suggests, the file in this method has to be kept in sorted order
all the time. In this method, the file is sorted after every delete, insert,
and update operation on the basis of some primary key. Insertion of the
new record is done by adding the new record at the end of the file, after
which the file is sorted in ascending or descending order based on the
requirements.
Advantages:
Sequential file organization is the simplest of all file organization
methods.
It contains a fast and efficient method for the huge amount of data.
It is simple in design. It requires no much effort to store the data.
Disadvantages:
In the case of the sorted file method, it is costlier because it has to sort the
file after every delete, update, and insert operation.
The data redundancy is high in the sequential file organization in DBMS.
2. Heap File Organization: This method is also one of the simplest methods of
file organization in DBMS. In this method, records are inserted at the end of file
into the data blocks. There is no requirement of sorting data.
Advantages:
It is a very good method of file organization for bulk insertion.
In case of a small database, fetching and retrieving of records is faster
than the sequential record.
Disadvantages:
This method is inefficient for large databases.
The problem of unused memory blocks.
3. Hash File Organization:
In hash file organization, a hash function is computed on some other attribute of
each record. The result of the hash function specifies in which block of the file
the record should be placed. Basically, a hashing function generates the address
of those data blocks using the primary key as input, and those memory locations
which are generated by these hash functions are called data blocks.
Advantages:
This method is very efficient in terms of speed and efficiency.
It is used in large databases
Disadvantages:
Memory is not efficiently used in this method.
There is no order in the arrangement of the memory addresses.
4. B+ Tree File Organization:
B+ tree file organization is the advanced method of an indexed sequential access
method. It uses a tree-like structure to store records in File. This method uses the
key-index concept, where it uses the primary key for the sorting of the records,
and the index value represents the bucket address of that particular key.
Advantages:
In this method, searching of data is more efficient and faster.
Any insert/update/delete does not affect the performance of tree.
Disadvantages:
This method is inefficient for static tables.
Indexing:
Indexing in DBMS is the process of creating a data structure, known as an index, which
allows for quick access to specific data records within a database table. Just like an
index in a book helps to find relevant information quickly, database indexes enhance
search and retrieval operations.
The index is a type of data structure. It is used to locate and access the data in a database
table quickly.
Index Structure:
Indexes can be created using some database columns.
Search Key Data Reference
The first column of the database is the search key that contains a copy of the
primary key or candidate key of the table. The values of the primary key are
stored in sorted order so that the corresponding data can be accessed easily.
The second column of the database is the data reference. It contains a set of
pointers holding the address of the disk block where the value of the particular
key can be found.
Benefits of Indexing:
1. Improved Query Performance:
Indexing accelerates query execution by reducing the time required to locate
specific data within a table, resulting in faster response times.
2. Efficient Data Retrieval:
With indexes, the DBMS can quickly narrow down the search space, leading to
more efficient data retrieval operations.
3. Enhanced Data Integrity:
Indexes can enforce uniqueness and primary key constraints, ensuring data
integrity and preventing duplicate entries.
4. Optimized Sorting and Grouping:
Indexes facilitate efficient sorting and grouping operations, enabling faster
processing of ordered and aggregated data.
5. Flexibility in Data Access:
Indexes allow for different access paths to the data, enabling the DBMS to
choose the most appropriate one based on the query.
Types of Indexing:
Indexing Methods
Primary Index Secondary Index Clustering Index
Dense Index Sparse Index
Indexing can be of the following types: -
1. Primary Index: If the index is created on the basis of the primary key of the
table, then it is known as primary indexing. These primary keys are unique to
each record and contain 1:1 relation between the records. As primary keys are
stored in sorted order, the performance of the searching operation is quite
efficient.
The primary index can be classified into two types: Dense index and Sparse
index.
a) Dense Index: The dense index contains an index record for every search key
value in the data file. It makes searching faster. In this, the number of records
in the index table is same as the number of records in the main table.
It needs more space to store index record itself. The index records have the
search key and a pointer to the actual record on the disk.
b) Sparse Index: The index record appears only for a few items in the data file.
Each item points to a block. In this method, instead of pointing to each record
in the main table, the index points to the records in the main table in a gap.
2. Secondary Index: Secondary index may be generated from a field which is a
candidate key and has a unique value in every record, or a non-key with duplicate
values.
3. Clustering Index: Clustering index is defined on an ordered data file. The data
file is ordered on a non-key field. When more than two records are stored in the
same file this type of storing is known as cluster indexing. Essentially, records
with similar properties are grouped together, and indexes for these groupings are
formed.
B Tree: B Tree is a self-balancing tree, and it is a m-way tree where m defines the order
of the tree. B Tree is a generalization of the Binary Search Tree in which a node can
have more than one key and more than two children depending upon the value of m. In
the B tree, the data is specified in a sorted order having lower values on the left subtree
and higher values in the right subtree.
Properties of B Tree: -
All the leaf nodes of the B Tree must be at the same level.
Above the leaf nodes of the B-tree, there should be no empty sub-trees.
B- tree’s height should lie as low as possible.
B+ Tree: The B+ tree is a balanced binary search tree. It follows a multi-level index
format.
Properties of B+ Tree: -
In the B+ tree, leaf nodes denote actual data pointers. B+ tree ensures that all leaf
nodes remain at the same height.
In the B+ tree, the leaf nodes are linked using a link list. Therefore, a B+ tree can
support random access as well as sequential access.
Basis of Comparison B Tree B+ Tree
Pointers All internal and leaf nodes have data Only leaf nodes have data
pointers. pointers.
Search Since all keys are not available at All keys are at leaf nodes, hence
leaf, search often takes more time. search is faster and more
accurate.
Insertion Insertion takes more time. Insertion is easier.
Deletion Deletion of the internal node is very Deletion of any node is easy.
complex.
Leaf Nodes Leaf nodes are not stored as Leaf nodes are stored as
structural linked list. structural linked list.
Access Sequential access to nodes is not Sequential access to nodes is
possible. possible.