RELATIONAL DATABASE MANAGEMENT SYSTEM
UNIT-3
Overview of Physical Database Design:
Physical database design plays a crucial role in the overall performance and efficiency of a database
system, as it involves making decisions about how the data is stored, organized, and accessed at the
physical level. The database design includes the process of logical design with the use of E-R diagram,
normalization, etc., followed by the physical design.
The Key Issues in The Physical Database Design Are:
• Insufficient storage or poor allocation of disk space for tables, indexes, and logs can lead to
performance degradation.
• Poorly designed indexes can cause slow query performance or excessive storage use.
• Uneven data distribution in partitions can lead to disruption in processing.
• Excessive normalization can slow down queries, while excessive denormalization can increase
storage requirements and data redundancy.
• Inefficient backup and recovery strategies can lead to data loss.
• Insufficient memory, CPU, or I/O bandwidth can cause performance issues.
• Poorly secured databases are vulnerable to unauthorized access and data breaches.
• Lack of proactive monitoring can lead to undetected issues like slow queries or resource
exhaustion.
• Poorly tuned queries, excessive I/O operations, and slow response times can affect database
performance.
Storage System in DBMS: - Storing a database on hard disks involves understanding how data is
physically managed on storage devices to optimize performance, ensure reliability, and manage
capacity effectively. Databases are stored in file formats, which contain records. At physical level, the
actual data is stored in electromagnetic format on some device.
These storage devices can be broadly categorized into three types –
• Primary Storage
The memory storage that is directly accessible to the CPU comes under this category.
Example: - RAM, Registers, Cache
• Secondary Storage
Secondary storage devices are used to store data for future use or as backup
Example: - Magnetic disks, Optical disks (DVD, CD), Hard disks and Magnetic tapes etc.
• Tertiary Storage:
Tertiary storage is used to store huge volumes of data. They are the slowest in speed and
mostly used to take the back up of an entire system.
Example: Optical disks and Magnetic tapes.
This Image showcases the arranging of the storage media in a hierarchy
according to its speed and cost.
• Topmost of the hierarchy is fastest in speed and more costly.
• Last one is the slowest in the speed and less costly.
Magnetic Disks: - Hard disk drives are the most common secondary storage devices in present
computer systems. These are called magnetic disks because they use the concept of magnetization to
store information.
Hard disks consist of metal disks coated with magnetizable material. These disks are placed vertically
on a spindle.
Internal Structure of HDD Internal Structure of Disk
Platters usually spin at around 15000 rpm. The arm assembly moves in or out to position a head on a
desired track which is under heads and makes a “cylinder”. Only one head reads/writes at any one time
The block/page size is a multiple of (fixed) sector size.
• Times to access (read/write) a disk block are listed as below:
• Seek time (moving arms to position disk head on track); 2-3 ms on average
• Rotational delay (waiting for block to rotate under head); 0-4 ms (15000 RPM)
• Transfer time (actually moving data to/from disk surface); 0.25 ms per 64KB page
What is a File?
A File is a collection of records. Using the primary key, we can access the records. A file is a physical
structure used to store data on disk.
What is a File Organization?
File organization in a Database Management System (DBMS) refers to how data is stored in files on
physical storage devices such as hard disks or SSDs. Proper file organization is crucial for optimizing
database performance and ensuring fast retrieval of data.
Types of File Organization
File organization contains various methods. These particular methods have pros and cons on the basis
of access or selection. Some types of File Organizations are:
i) Sequential file organization
ii) Heap file organization
iii) Hash file organization
iv) B+ Tree file organization
v) Indexed sequential access method (ISAM)
vi) Cluster file organization
i) Sequential File Organization: -
The easiest method for file Organization is the Sequential method. Records are stored in a sorted one
after another in a sequential manner. There are two ways to implement this method:
• Pile-File Method: This method is quite simple, in which we store the records in a sequence one
after the other in the order in which they are inserted into the tables.
• Sorted-File Method: In this method, As the name itself suggests whenever a new record has to
be inserted, it is always inserted in a sorted (ascending or descending) manner. The sorting of
records may be based on any primary key or any other key.
Advantages of Sequential File Organization:
• Fast and efficient method for huge amounts of data.
• It is simple in design. It requires no much effort to store the data.
• In this method, files can be easily stored in cheaper storage mechanism like magnetic tapes.
Disadvantages of Sequential File Organization:
• The sorted file method is inefficient as it takes time and space for sorting records.
• Time wastage as we have to move in a sequential manner to retrieve the particular data.
ii) Heap File Organization: -
Heap File Organization works with data blocks. The records are inserted at the end of the file into the
data blocks in the heap file organization. The ordering and sorting of records are not required when the
entries are added. The new record is put in a different block when the data block is full.
Advantages of Heap File Organization:
• If there is a large number of data which needs to load into the database at a time, then this
method is best suited.
• Fetching and retrieving records is faster than sequential records in the case of small databases.
Disadvantages of Heap File Organization:
• The problem of unused memory blocks.
• This method is inefficient for large databases.
iii) Hash File Organization: -
Hash File Organization uses the computation of hash function on some fields of the records. The output
of the hash function defines the position of the disc block where the records will be stored. When a
record is requested using the hash key columns, an address is generated, and the entire record is fetched
using that address. When a new record needs to be inserted, the hash key is used to generate the
address, and the record is then directly placed. In the case of removing and updating, the same
procedure is followed.
Advantages of Hash File Organization:
• Very fast for equality searches using this method.
• Efficient use of storage space.
Disadvantages of Hash File Organization:
• Not suitable for range queries.
• Collisions (multiple records hashing to the same location) need to be handled.
iv) B+ Tree File Organization: -
B+ tree file organization is the advanced method of an indexed sequential access method. It uses a tree-
like structure to store records in File. It is an advanced way of an indexed sequential access mechanism
is the B+ tree file organization.
Unlike a binary search tree (BST), the B+ tree can contain more than two children. All records are
stored solely at the leaf node in this method. The leaf nodes are pointed to by intermediate nodes. There
are no records in them.
Advantages of B+ Tree File Organization:
• It’s easier and faster to navigate the tree structure.
• The size of the B+ tree is unrestricted; therefore, the number of records and the structure of the
B+ tree can both expand and shrink.
• It is a very balanced tree structure. Here, each insert, update, or deletion has no effect on the
tree’s performance.
• Because all records are stored solely in the leaf nodes and ordered in a sequential linked list,
searching becomes very simple using this method.
Disadvantages of B+ Tree File Organization:
• The B+ tree file organization method is very inefficient for the static method.
v) Indexed sequential access method (ISAM): -
A combination of sequential and indexed methods. Data is stored sequentially, but an index is
maintained for faster access. In this case, records are stored in the file with the help of the primary key.
For each primary key, an index value is created and mapped to the record. This index contains the
address of the record in the file.
Advantages of ISAM:
• Because each record consists of the address of its data block in this manner, finding a record in a
large database is rapid and simple.
• Faster retrieval compared to pure sequential methods.
• Suitable for applications with a mix of sequential and random access.
Disadvantages of ISAM:
• This method requires extra space in the disk to store the index value.
• Not as efficient as fully indexed methods for random access.
• When new records are added, these files must be reconstructed in order to keep the sequence.
• When a record is erased, the space it occupied did not freed up.
vi) Cluster File Organization: -
When the two or more records are stored in the same file, it is known as clusters. These files will have
two or more tables in the same data block and the key attributes which are used to map these tables
together are stored only once. This method reduces the cost of searching for various records in different
files.
Advantages of Cluster File Organization:
• It gives the best output when the cardinality is 1 : M.
• It is basically used when multiple tables have to be joined with the same joining condition.
Disadvantages of Cluster File Organization:
• For a very large database, this approach has a low performance.
• In the case of a 1:1 cardinality, it becomes ineffective.
• If the joining condition changes, this method will no longer work.
What is Indexing in DBMS?
Indexing is a technique for improving database performance by reducing the number of disk accesses
necessary when a query is run. An index is a form of data structure. It’s used to swiftly identify and
access data and information present in a database table.
Every search key value in the data file has an index record in the single level index. It speeds up the
search process. The total number of records present in the index table and the main table are the same
in this case. It requires extra space to hold the index record. A pointer to the actual record on the disk
and the search key is both included in the index records.
Multi-Level Indexing
Another level of indexing is introduced in indexing to reduce the size of the mapping. The massive
range for the columns is chosen first in this method, resulting in a small mapping size at the first level.
Each range is then subdivided into smaller groups. Because the first level’s mapping is kept in primary
memory, fetching the addresses is faster. The second-level mapping, as well as the actual data, are kept
in secondary memory (or hard disk).