Advanced database
management system
Course Code: CSE 467
Course Teacher: Dr. Mohammed Mahmudur Rahman
Associate Professor, Department of CSE, USTC
Mapping Strategies for File records into Blocks
● A database is a collection of large amount of related data. In case of
RDBMS (Relational Database Management System), the data is
stored in the form of relations or tables. As a normal user, we see
the data stored in tables but actually this huge amount of data is
stored in the form of files in physical memory.
● A File is a collection of related records stored on the secondary
storage such as magnetic disks in binary format. There are various
strategies for mapping file records into blocks of disk:
2
1. Spanned Mapping
● In spanned mapping, the
record of a file is stored inside
the block even if it can only
be stored partially and hence,
the record is spanned over
two blocks giving it the
name Spanned Mapping.
3
2. Unspanned Mapping
● In unspanned mapping,
unlike spanned strategy,
the record of a file is
stored inside the block
only if it can be stored
completely inside it.
4
Do we specify mapping strategy
● Yes, in most systems, we specify the mapping strategy explicitly
based on requirements. The way we define the mapping strategy
depends on the technology being used (databases, ORMs, memory
management, etc.). Below are some key areas where we specify
spanned vs. unspanned mapping.
5
In databases like SQL and No SQL
Explicitly Specifying Spanned Mapping:
•Partitioning in SQL databases (PostgreSQL, MySQL, Oracle)
CREATE TABLE user_data (
id SERIAL PRIMARY KEY,
name TEXT,
email TEXT
) PARTITION BY HASH(id);
•Sharding in NoSQL (MongoDB, Cassandra, DynamoDB)
sh.shardCollection("mydb.users", { "userId": "hashed" });
if you do not specify partitioning/sharding, most databases use unspanned mapping by
default.
6
Implicit mapping in Database:
Most modern databases automatically decide mapping strategies based
on data size, indexing, and queries.
Example: PostgreSQL Automatically Uses TOAST (Tuple Oversized
Attribute Storage)
•If a row is too large to fit in a single database page, PostgreSQL implicitly
moves large data.
•No need to manually specify partitioning.
7
Storage Hierarchy
● primary storage: Fastest media but
volatile (cache, main memory).
● secondary storage: next level in
hierarchy, non-volatile, moderately fast
access time, also called on-line storage
● E.g. flash memory, magnetic disks
● tertiary storage: lowest level in
hierarchy, non-volatile, slow access
time, also called off-line storage
● E.g. magnetic tape, optical storage,
External HDD.
8
Magnetic Disk in Computer Architecture-
● In computer architecture,
• Magnetic disk is a storage device that is used to
write, rewrite and access data.
• It uses a magnetization process.
• The entire disk is divided into platters.
• Each platter consists of concentric circles called
as tracks.
• These tracks are further divided
into sectors which are the smallest divisions in
the disk.
9
Cont.
● A cylinder is formed by combining the tracks at a
given radius of a disk pack.
• There exists a mechanical arm called as Read /
Write head.
• It is used to read from and write to the disk.
• Head has to reach at a particular track and then wait
for the rotation of the platter.
• The rotation causes the required sector of the track
to come under the head.
• Each platter has 2 surfaces- top and bottom and
both the surfaces are used to store the data.
• Each surface has its own read / write head.
10
11
https://
www.youtube.com/
watch?
v=bMSXQhra6hY
12
How the Disk Controller Interacts with the Platters
1. Disk Controller’s Role
1. The disk controller is an electronic circuit that acts as an interface between the computer system and the hard drive.
2. It receives high-level commands from the computer (such as read/write requests) and translates them into low-level actions
for the hard drive.
2. Connection to Read/Write Heads, Not Platters Directly
1. Data is stored on platters, but the disk controller does not directly interact with them.
2. Instead, the disk controller controls the movement of the read/write heads, which are responsible for accessing data
on the platters.
3. The actuator arm positions the heads over the correct track on the platter.
3. Data Transfer Process
1. When a read/write request is made, the disk controller:
1. Moves the read/write heads to the correct position over the spinning platters.
2. Ensures data is read correctly by verifying it using checksums.
3. Manages error detection and bad sector remapping if necessary.
4. Modern Hard Drive Integration
1. In modern HDDs, many traditional controller functions (such as checksum verification and bad sector remapping) are
performed by embedded firmware inside the drive itself.
2. This reduces the load on the main disk controller and improves efficiency.
13
Disk subsystem
● Multiple disks connected to a computer
system through a controller
○ Controllers functionality (checksum,
bad sector remapping) often carried
out by individual disks; reduces load
on controller
● Disk interface standards families
○ ATA (AT adaptor) range of standards
(Advanced technology attachment)
○ SATA (Serial ATA)
○ SCSI (Small Computer System
Interconnect) range of standards
○ SAS (Serial Attached SCSI)
○ Several variants of each standard
(different speeds and capabilities)
14
Communication protocols of storage devices
● A. Parallel ATA (PATA / ATA)
• An older communication standard for hard drives.
• Uses a parallel connection with a wide ribbon cable (40 or 80 wires).
• Limited to two devices per cable (Master and Slave).
•Master (Primary Device)
•The Master drive is the primary device on the IDE cable. It is typically the first drive that the computer will attempt to access during
boot.
•The Master drive is the one that typically contains the operating system in most configurations.
•It is usually configured by setting a jumper (small plastic connector) on the drive to the "Master" position.
•Slave (Secondary Device)
•The Slave drive is the secondary device on the same IDE cable.
•It is typically the second drive that the computer will recognize after the Master device.
•The Slave drive can be used for additional storage or as a secondary drive for other purposes, such as storing files or running
programs.
•It is configured by setting a jumper to the "Slave" position on the drive.
• Maximum speed of 133 MB/s (ATA-133).
• Replaced by SATA due to speed and cable size limitations.
15
Communication protocols of storage devices (Cont.)
● B. Serial ATA (SATA)
• A modern replacement for ATA with serial communication (single data wire).
Each device is connected to its dedicated data ports.
• The number of devices you can connect depends on the number of SATA ports
available on the motherboard or SATA controller. Common motherboards have
anywhere from 2 to 8 SATA ports.
• Improved cable management with thinner cables (up to 1 meter).
• Supports hot swapping (plugging and unplugging drives without shutting down
the system).
• Variants:
• SATA I – 1.5 Gbps (150 MB/s)
• SATA II – 3.0 Gbps (300 MB/s)
• SATA III – 6.0 Gbps (600 MB/s)
16
Communication protocols of storage devices (Cont.)
● C. Small Computer System Interface (SCSI)
• A high-speed protocol designed for enterprise storage and
servers.
• Allows multiple devices (up to 16) to connect to a single
controller.
• Offers better multitasking and error correction than ATA/SATA.
• Used in high-performance workstations, RAID systems, and
servers.
17
Seek time specifications
Seek time specifications include-
1.Full stroke
2.Average
3.Track to Track
1. Full Stroke-
•It is the time taken by the read / write head to move across the entire width of the disk from the innermost track
to the outermost track
Average seek time = 1 / 3 x Full stroke
2. Average-
•It is the average time taken by the read / write head to move from one random track to another.
3. Track to Track-
•It is the time taken by the read-write head to move between the adjacent tracks.
18
2. Rotational Latency-
• The time taken by the desired sector to come under the read / write head is
called as rotational latency.
• It depends on the rotation speed of the spindle.
●
● Average rotational latency = 1 / 2 x Time taken for full rotation
19
3. Data Transfer Rate-
• The amount of data that passes under the read / write head in a given amount
of time is called as data transfer rate.
• The time taken to transfer the data is called as transfer time.
●
● It depends on the following factors-
1. Number of bytes to be transferred
2. Rotation speed of the disk
3. Density of the track
4. Speed of the electronics that connects the disk to the computer
20
Storage Density-
• Storage density decreases as we move from one track to another track away
from the center.
●
● Thus,
• Innermost track has maximum storage density.
• Outermost track has minimum storage density.
21
Important formula
● Disk access time
= Seek time + Rotational delay + Transfer time + Controller overhead + Queuing
delay
● Average disk access time
= Average seek time + Average rotational delay + Transfer time + Controller
overhead + Queuing delay
22
RAID (Redundant arrays of independent disk)
● RAID is a technique that makes use of a combination of multiple disks instead of using a single
disk for increased performance, data redundancy, or both. The term was coined by David
Patterson, Garth A. Gibson, and Randy Katz at the University of California, Berkeley in 1987.
● Key Evaluation Points for a RAID System
• Reliability: How many disk faults can the system tolerate?
• Availability: What fraction of the total session time is a system in uptime mode, i.e. how
available is the system for actual use?
• Performance: How good is the response time? How high is the throughput (rate of processing
work)? Note that performance contains a lot of parameters and not just the two.
• Capacity: Given a set of N disks each with B blocks, how much useful capacity is available to the
user?
● Disk fault tolerance means that a disk system is able to recover from an error
condition of some kind. Recovery is possible by mirroring, striping, and duplexing
drives and provides some level of data protection.
23