0% found this document useful (0 votes)
33 views29 pages

Lec 4

Uploaded by

p20232002567
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views29 pages

Lec 4

Uploaded by

p20232002567
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Advanced Database Systems

Spring 2025

Lecture #04:
HW & Disk Space Management

R&G: Chapters 1, 9.1, 9.3


2

DBMS: B IG P ICTURE
SQL clients interact with a DBMS SQ L C lie nt

You know how to write a SQL query

How is a SQL query executed?


Da taba se
Mana ge me nt Syste m

Da taba se
3

DBMS: Q UERY P LANNING


Parse, check, and verify the SQL query SQ L C lie nt

SELECT [Link] Q uer y Pl anning


FROM Student S JOIN Enrolled E
ON [Link] = [Link]
WHERE [Link] = ‘INF-11199’
Da taba se
Mana ge me nt Syste m
Translate into an efficient relational
query plan that can be executed

Da taba se
4

DBMS: O PERATOR E XECUTION


Execute a dataflow by operating on SQ L C lie nt
records and files
Q uer y Pl anning
sorting
π [Link] O pera tor Exe cuti on
sort-merge join
⋈ [Link] = [Link] Da taba se
Mana ge me nt Syste m
B+ tree
σ [Link] = ‘INF-11199’
scan
Student Enrolled

Da taba se
5

DBMS: F ILES & I NDEX M ANAGEMENT


Organise tables and records as SQ L C lie nt
groups of pages in a logical file
Q uer y Pl anning
sid name dept age
12344 Jones CS 18 O pera tor Exe cuti on
12355 Smith Physics 23
12366 Gold CS 21 F ile s & Index Mana ge me nt

Directory Header Header

Page1 Page2
Da taba se
Disk

Database File
6

DBMS: B UFFER M ANAGEMENT


Transfer data between disk and memory SQ L C lie nt
Buffer Pool
Q uer y Pl anning
Directory Header
O pera tor Exe cuti on
Page2
Memory F ile s & Index Mana ge me nt

Buf fe r Mana ge me nt

Directory Header Header

Page1 Page2
Da taba se
Disk
Database File
7

DBMS: D ISK S PACE M ANAGEMENT


Translate page requests into reading & SQ L C lie nt
writing physical bytes on devices
Q uer y Pl anning

01010 10100 O pera tor Exe cuti on


11101 11010
10010 10101 01111
01010 F ile s & Index Mana ge me nt
10110

Buf fe r Mana ge me nt

Di sk S pace Ma nag em e nt

Da taba se

Di sk
8

A RCHITECTURE OF A DBMS
Organised in layers SQ L C lie nt

Each layer abstracts the layer below Q uer y Pl anning


Manage complexity
O pera tor Exe cuti on
Performance assumptions
F ile s & Index Mana ge me nt
Example of good systems design
Buf fe r Mana ge me nt

Di sk S pace Ma nag em e nt

Da taba se
9

DBMS: C ONCURRENCY & R ECOVERY


Two cross-cutting modules related to SQ L C lie nt
storage and memory management
Q uer y Pl anning

O pera tor Exe cuti on

F ile s & Index Mana ge me nt


C onc urre ncy Control
Buf fe r Mana ge me nt
Re cove r y
Di sk S pace Ma nag em e nt

Da taba se
10

O UTLINE
Storage Media SQ L C lie nt

Disk Space Management Q uer y Pl anning

Buffer Management O pera tor Exe cuti on

F ile s & Index Mana ge me nt


File Layout
Buf fe r Mana ge me nt
Page Layout
Di sk S pace Ma nag em e nt
Record Layout
Da taba se
11

D ISK -O RIENTED A RCHITECTURE


Most database systems are designed for non-volatile disk storage*
The primary location of the database is on disks (HDD and/or SSD)
Data processing happens in volatile main memory
The DBMS responsible for moving data between disk and main memory

Major implications
Data stored on disk is not byte addressable. Instead, an API:
READ: transfer “page” of data from disk to RAM
WRITE: transfer “page” of data from RAM to disk
Disk reads & writes are very, very slow! ⇒ Must plan carefully!

* Volatile storage only maintains its data while the device is powered
12

W HY N OT S TORE A LL IN M AIN M EMORY ?


Costs too much
Cost of 1TB storage (2020): 50$ for HDD, 200$ for SSD, 6000$ for RAM
High-end databases today in the petabyte range!
Roughly 60% of the cost of a production system is in the disks

Main memory is volatile


Obviously important if DB stops/crashes. We want data to be saved!

Some specialised systems do store entire databases in main memory


Faster than disk-oriented but with much higher cost/GB
Suitable for small databases
13

S TORAGE H IERARCHY
Faster
CPU Registers Smaller
Volatile
CPU Caches
Byte-Addressable
DRAM

SSD
Non-Volatile
Block-Addressable HDD

Network Storage Slower


Larger
14

S TORAGE H IERARCHY Latency Capacity

CPU Registers < 1ns B

CPU Caches < 10ns KB/MB


Memory for active data
(primary storage) DRAM 100ns GB

SSD 0.1ms GB/TB


Disk for main database
(secondary storage) HDD 10ms TB

Network Storage 30ms PB


15

A NATOMY OF A D ISK
Platters rotate (say 15000 rpm)

Disk arm moves in or out to position


disk heads on a desired track
Tracks under heads make a “cylinder”

Only one head reads/writes at any one time

Block size is a multiple of (fixed) sector size


Sector = minimum storage unit (512 B or 4 KB)

Video on how disk drives work


16

A CCESSING A D ISK PAGE


Data is stored and retrieved in units called disk blocks
Block size is determined by the filesystem (usually 4 KB, sometimes up to 64 KB)

Unlike RAM, time to retrieve a block depends on its location


Time to access (read/write) a disk block:
Seek time: moving disk arm to position disk heads on track

Rotational delay: waiting for target block to rotate under a head

Transfer time: actually moving data to/from disk surface


Seagate Cheetah 15K.7 17

4 disks, 8 heads, avg. 512 KB/track, 600GB capacity


rotational speed: 15 000 rpm
average seek time: 3.4 ms
transfer rate ≈ 163 MB/s

Access time to read one block of size 8KB

Average seek time 3.40 ms

Average rotational delay 1/2 · 1/15000 min 2.00 ms

Transfer time 8KB / 163 MB/s 0.05 ms

Total access time 5.45 ms

Seek time and rotational delay dominate!


18

S EQUENTIAL VS . R ANDOM A CCESS


What about accessing 1000 blocks of size 8 KB
Random: 1000 · 5.45 ms = 5.45 s
Sequential: 3.4 ms + 2 ms + 1000 · 0.05 ms ≈ 55 ms
tracks store only 512KB ⟹ some additional (< 5 ms) track-to-track seek time

Sequential I/O orders of magnitude faster than random I/O

avoid random I/O at all cost


19

A RRANGING B LOCKS ON D ISK


‘Next’ block concept:
sequential blocks on same track, followed by
blocks on same cylinder, followed by
blocks on adjacent cylinder

Arrange file pages sequentially by ‘next’ on disk


Minimize seek and rotational delay

For a sequential scan, pre-fetch several blocks at a time!

Reading large consecutive blocks


“Amortises” seek time and rotational delay
20

S OLID S TATE D RIVES


Alternative to conventional hard disks
Data accessed in pages, internally pages are organised into blocks

Fine-grain reads (4-8 KB pages), coarse-grain writes (1-2 MB blocks)

Issues in current generation (NAND)


Write amplification: Writing data in small pages causes erasing big blocks
Limited endurance: Only 2K-3K erasures before cell failure
Wear levelling: SSD controller needs to keep moving hot write units around
Price: SSD is 2-5x more expensive than HDD
21

S OLID S TATE D RIVES


Read is fast and predictable
Single read access time: 30 µs
4KB random reads: ~500 MB/sec
Sequential reads: ~525 MB/sec

But write is not! Slower for random


Single write access time: 30 µs
4KB random writes: ~120 MB/sec
Sequential writes: ~480 MB/sec

Random access still slower than sequential access


22

SSD VS . HDD
SSD can achieve 1-10x the bandwidth (bytes/sec) of ideal HDD
Note: Ideal HDD spec numbers are hard to achieve
Expect 10-100x bandwidth for non-sequential reads

Locality matters for both


Reading/writing to “far away” blocks on HDD requires slow seek/rotation delay
Writing 2 “far away” blocks on SSD can require writing multiple much larger units
High-end flash drives are getting much better at this

And don’t forget


SSD is 2-5x more expensive than HDD
24

B OTTOM L INE
Very large DBs: relatively traditional
Disk still offers the best cost/GB by a lot
SSDs improve performance and performance variance

Smaller DB story is changing quickly


SSDs win at the low end (modest DB sizes)
Many interesting databases fit in RAM

Lots of change brewing on the HW storage tech side


Non-volatile memory likely to affect the design of future systems

We will focus on traditional RAM and disk


25

D ATABASE S TORAGE
Most DBMSs store data as one or more files on disk
Files consist of pages (loaded in memory), pages contain records

Data on disk is read & written in large chunks of sequential bytes


Block = Unit of transfer for disk read/write
Page = A common synonym for “block”
In some textbooks, “page” = a block-sized chunk of RAM

We will treat “block” and “page” as synonyms

I/O operation = read/write disk operation

Sequential pages: reading “next” page is fastest


26

S YSTEM D ESIGN G OALS


Goal: allow the DBMS to manage databases > available main memory

Disk reads/writes are expensive ⟹ must be managed carefully


Minimise disk I/O, maximise usage of data per I/O

Spatial control
Where to write pages on disk
Goal: keep pages often used together as physically close as possible on disk

Temporal control
When to read pages into memory and when to write them to disk
Goal: minimise the number of CPU stalls from having to read data from disk
27

D ISK S PACE M ANAGEMENT


Lowest layer of DBMS, manages space on disk
Map pages to locations on disk S QL C l ie nt

Load pages from disk to memory Qu e ry P l a nn in g


Save pages back to disk Op e ra t o r E xe c ut i o n

Introduces the concept of a page F i le s & Ind ex M a na g e m e nt

Typical page size: 4 – 64KB (a multiple of 4KB) B u ff e r Ma na g e m e nt


Each page has a unique identifier: page ID D i s k S pa c e Ma na g e m e n t

Higher levels call upon this layer to:


Datab ase
Allocate/de-allocate a page
Read/write a page
28

D ISK S PACE M ANAGEMENT : PAGE R EQUESTS


Disk space manager can get requests for a sequence of pages
E.g., when higher levels execute a scan operator on a relation

Such requests are best satisfied by pages stored sequentially on disk


Physical details hidden from higher levels of system
Higher levels may “safely” assume Next Page is fast, so they will
simply expect sequential runs of pages to be quick to scan

Disk space manager aims to intelligently lay out data on disk


to meet the performance expectation of higher levels as best as possible
30

D ISK S PACE M ANAGEMENT : I MPLEMENTATION


Using local filesystem (FS)
Allocate one large “contiguous” file on an empty disk
Rely on OS and FS that sequential pages in this file are physically contiguous on disk
A logical database “file” may span multiple FS files on multiple disks/machines
Disk space manager maintains a mapping from page IDs to physical locations
physical location = filename + offset within that file

The OS and other apps know nothing about the contents of these files
Only the DBMS knows how to decipher their contents
Early DBMSs in the 1980s used custom ‘filesystems’ on raw storage
31

S UMMARY
Magnetic disk and flash storage
Random access vs. sequential access (10x) S QL C l ie nt
Physical data placement is important
Qu e ry P l a nn in g

Disk space management Op e ra t o r E xe c ut i o n


Exposes data as a collection of pages F i le s & Ind ex M a na g e m e nt
Pages: block-level organisation of bytes on disk
B u ff e r Ma na g e m e nt
API to read/write pages to disk
D i s k S pa c e Ma na g e m e n t
Provides “next” locality
Abstracts device and file system details Datab ase

You might also like