Database Storage
04 Part II
Intro to Database Systems Andy Pavlo
15-445/15-645
Fall 2019 AP Computer Science
Carnegie Mellon University
2
ADMINISTRIVIA
Homework #1 is due September 11th @ 11:59pm
Project #1 will be released on September 11th
CMU 15-445/645 (Fall 2019)
3
U P C O M I N G D ATA B A S E E V E N T S
SalesForce Talk
→ Friday Sep 13th @ 12:00pm
→ CIC 4th Floor
Impira Talk
→ Monday Sep 16th @ 4:30pm
→ GHC 8102
Vertica Talk
→ Monday Sep 23rd @ 4:30pm
→ GHC 8102
CMU 15-445/645 (Fall 2019)
4
D I S K- O R I E N T E D A R C H I T E C T U R E
The DBMS assumes that the primary storage
location of the database is on non-volatile disk.
The DBMS's components manage the movement
of data between non-volatile and volatile storage.
CMU 15-445/645 (Fall 2019)
5
S L O T T E D PA G E S
Slot Array
The most common layout scheme is
called slotted pages. Header
The slot array maps "slots" to the
tuples' starting position offsets.
Tuple #4 Tuple #3
The header keeps track of:
Tuple #2 Tuple #1
→ The # of used slots
→ The offset of the starting location of the
last slot used. Fixed/Var-length
Tuple Data
CMU 15-445/645 (Fall 2019)
5
S L O T T E D PA G E S
Slot Array
The most common layout scheme is
called slotted pages. Header
The slot array maps "slots" to the
tuples' starting position offsets.
Tuple #4 Tuple #3
The header keeps track of:
Tuple #2 Tuple #1
→ The # of used slots
→ The offset of the starting location of the
last slot used. Fixed/Var-length
Tuple Data
CMU 15-445/645 (Fall 2019)
6
LO G -S T R U C T U R E D F I L E O R G A N I Z AT I O N
Page
Instead of storing tuples in pages, the
DBMS only stores log records. INSERT id=1,val=a
New Entries
INSERT id=2,val=b
The system appends log records to the DELETE id=4
file of how the database was modified: INSERT id=3,val=c
→ Inserts store the entire tuple. UPDATE val=X (id=3)
→ Deletes mark the tuple as deleted. UPDATE val=Y (id=4)
→ Updates contain the delta of just the
…
attributes that were modified.
CMU 15-445/645 (Fall 2019)
7
LO G -S T R U C T U R E D F I L E O R G A N I Z AT I O N
Page
To read a record, the DBMS scans the
log backwards and "recreates" the INSERT id=1,val=a
tuple to find what it needs. INSERT id=2,val=b
Reads
DELETE id=4
INSERT id=3,val=c
UPDATE val=X (id=3)
UPDATE val=Y (id=4)
…
CMU 15-445/645 (Fall 2019)
7
LO G -S T R U C T U R E D F I L E O R G A N I Z AT I O N
Page
To read a record, the DBMS scans the
log backwards and "recreates" the INSERT id=1,val=a
tuple to find what it needs. id=1
INSERT id=2,val=b
DELETE id=4
id=2
Build indexes to allow it to jump to id=3
INSERT id=3,val=c
locations in the log. UPDATE val=X (id=3)
id=4
UPDATE val=Y (id=4)
…
CMU 15-445/645 (Fall 2019)
7
LO G -S T R U C T U R E D F I L E O R G A N I Z AT I O N
Page
To read a record, the DBMS scans the
log backwards and "recreates" the id=1,val=a
id=2,val=b
tuple to find what it needs. id=3,val=X
id=4,val=Y
Build indexes to allow it to jump to
locations in the log.
Periodically compact the log.
CMU 15-445/645 (Fall 2019)
7
LO G -S T R U C T U R E D F I L E O R G A N I Z AT I O N
Page
To read a record, the DBMS scans the
log backwards and "recreates" the id=1,val=a
id=2,val=b
tuple to find what it needs. id=3,val=X
id=4,val=Y
Build indexes to allow it to jump to
locations in the log.
Periodically compact the log.
CMU 15-445/645 (Fall 2019)
8
T O D AY ' S A G E N D A
Data Representation
System Catalogs
Storage Models
CMU 15-445/645 (Fall 2019)
9
TUPLE STORAGE
A tuple is essentially a sequence of bytes.
It's the job of the DBMS to interpret those bytes
into attribute types and values.
The DBMS's catalogs contain the schema
information about tables that the system uses to
figure out the tuple's layout.
CMU 15-445/645 (Fall 2019)
10
D ATA R E P R E S E N TAT I O N
INTEGER/BIGINT/SMALLINT/TINYINT
→ C/C++ Representation
FLOAT/REAL vs. NUMERIC/DECIMAL
→ IEEE-754 Standard / Fixed-point Decimals
VARCHAR/VARBINARY/TEXT/BLOB
→ Header with length, followed by data bytes.
TIME/DATE/TIMESTAMP
→ 32/64-bit integer of (micro)seconds since Unix epoch
CMU 15-445/645 (Fall 2019)
11
VA R I A B L E P R E C I S I O N N U M B E R S
Inexact, variable-precision numeric type that uses
the "native" C/C++ types.
→ Examples: FLOAT, REAL/DOUBLE
Store directly as specified by IEEE-754.
Typically faster than arbitrary precision numbers
but can have rounding errors…
CMU 15-445/645 (Fall 2019)
12
VA R I A B L E P R E C I S I O N N U M B E R S
Rounding Example Output
#include <stdio.h> x+y = 0.300000
0.3 = 0.300000
int main(int argc, char* argv[]) {
float x = 0.1;
float y = 0.2;
printf("x+y = %f\n", x+y);
printf("0.3 = %f\n", 0.3);
}
CMU 15-445/645 (Fall 2019)
12
VA R I A B L E P R E C I S I O N N U M B E R S
Rounding Example Output
#include <stdio.h> x+y = 0.300000
0.3 = 0.300000
int#include
main(int<stdio.h>
argc, char* argv[]) {
float x = 0.1; x+y = 0.30000001192092895508
int main(int
float argc, char* argv[]) {
y = 0.2; 0.3 = 0.29999999999999998890
float x = =0.1;
printf("x+y %f\n", x+y);
float y = 0.2;
printf("0.3 = %f\n", 0.3);
} printf("x+y = %.20f\n", x+y);
printf("0.3 = %.20f\n", 0.3);
}
CMU 15-445/645 (Fall 2019)
13
FIXED PRECISION NUMBERS
Numeric data types with arbitrary precision and
scale. Used when round errors are unacceptable.
→ Example: NUMERIC, DECIMAL
Typically stored in a exact, variable-length binary
representation with additional meta-data.
→ Like a VARCHAR but not stored as a string
Demo: Postgres, SQL Server, Oracle
CMU 15-445/645 (Fall 2019)
14
POSTGRES: NUMERIC
# of Digits
typedef unsigned char NumericDigit;
Weight of 1st Digit typedef struct {
int ndigits;
Scale Factor int weight;
int scale;
Positive/Negative/NaN int sign;
NumericDigit *digits;
Digit Storage } numeric;
CMU 15-445/645 (Fall 2019)
14
POSTGRES: NUMERIC
# of Digits
typedef unsigned char NumericDigit;
Weight of 1st Digit typedef struct {
int ndigits;
Scale Factor int weight;
int scale;
Positive/Negative/NaN int sign;
NumericDigit *digits;
Digit Storage } numeric;
CMU 15-445/645 (Fall 2019)
15
L A R G E VA L U E S
Tuple
Most DBMSs don't allow a tuple to Header a b c d e
exceed the size of a single page.
To store values that are larger than a
page, the DBMS uses separate Overflow Page
overflow storage pages. VARCHAR DATA
→ Postgres: TOAST (>2KB)
→ MySQL: Overflow (>½ size of page)
→ SQL Server: Overflow (>size of page)
CMU 15-445/645 (Fall 2019)
16
E X T E R N A L VA L U E S T O R A G E
Tuple
Some systems allow you to store a Header a b c d e
really large value in an external file.
Treated as a BLOB type.
→ Oracle: BFILE data type
→ Microsoft: FILESTREAM data type External File
The DBMS cannot manipulate the
contents of an external file. Data
→ No durability protections.
→ No transaction protections.
CMU 15-445/645 (Fall 2019)
16
E X T E R N A L VA L U E S T O R A G E
Tuple
Some systems allow you to store a Header a b c d e
really large value in an external file.
Treated as a BLOB type.
→ Oracle: BFILE data type
→ Microsoft: FILESTREAM data type External File
The DBMS cannot manipulate the
contents of an external file. Data
→ No durability protections.
→ No transaction protections.
CMU 15-445/645 (Fall 2019)
17
S Y S T E M C ATA L O G S
A DBMS stores meta-data about databases in its
internal catalogs.
→ Tables, columns, indexes, views
→ Users, permissions
→ Internal statistics
Almost every DBMS stores their a database's
catalog in itself.
→ Wrap object abstraction around tuples.
→ Specialized code for "bootstrapping" catalog tables.
CMU 15-445/645 (Fall 2019)
18
S Y S T E M C ATA L O G S
You can query the DBMS’s internal
INFORMATION_SCHEMA catalog to get info about
the database.
→ ANSI standard set of read-only views that provide info
about all of the tables, views, columns, and procedures in
a database
DBMSs also have non-standard shortcuts to
retrieve this information.
CMU 15-445/645 (Fall 2019)
19
A C C E S S I N G TA B L E S C H E M A
List all the tables in the current database:
SELECT * SQL-92
FROM INFORMATION_SCHEMA.TABLES
WHERE table_catalog = '<db name>';
\d; Postgres
SHOW TABLES; MySQL
.tables; SQLite
CMU 15-445/645 (Fall 2019)
20
A C C E S S I N G TA B L E S C H E M A
List all the tables in the student table:
SELECT * SQL-92
FROM INFORMATION_SCHEMA.TABLES
WHERE table_name = 'student'
\d student; Postgres
DESCRIBE student; MySQL
.schema student; SQLite
CMU 15-445/645 (Fall 2019)
23
O B S E R VAT I O N
The relational model does not specify that we
have to store all of a tuple's attributes together in a
single page.
This may not actually be the best layout for some
workloads…
CMU 15-445/645 (Fall 2019)
24
WIKIPEDIA EXAMPLE
CREATE TABLE useracct ( CREATE TABLE pages (
userID INT PRIMARY KEY, pageID INT PRIMARY KEY,
userName VARCHAR UNIQUE, title VARCHAR UNIQUE,
⋮ latest INT
); ⮱REFERENCES revisions (revID),
);
CREATE TABLE revisions (
revID INT PRIMARY KEY,
userID INT REFERENCES useracct (userID),
pageID INT REFERENCES pages (pageID),
content TEXT,
updated DATETIME
);
CMU 15-445/645 (Fall 2019)
25
O LT P
SELECT P.*, R.*
On-line Transaction Processing: FROM pages AS P
→ Simple queries that read/update a small INNER JOIN revisions AS R
amount of data that is related to a single ON P.latest = R.revID
entity in the database. WHERE P.pageID = ?
This is usually the kind of application UPDATE useracct
that people build first. SET lastLogin = NOW(),
hostname = ?
WHERE userID = ?
INSERT INTO revisions
VALUES (?,?…,?)
CMU 15-445/645 (Fall 2019)
26
OLAP
SELECT COUNT(U.lastLogin),
On-line Analytical Processing: EXTRACT(month FROM
→ Complex queries that read large portions U.lastLogin) AS month
FROM useracct AS U
of the database spanning multiple entities. WHERE U.hostname LIKE '%.gov'
GROUP BY
EXTRACT(month FROM U.lastLogin)
You execute these workloads on the
data you have collected from your
OLTP application(s).
CMU 15-445/645 (Fall 2019)
Operation Complexity W O R K L O A D C H A R A C T E R I Z AT I O N
Complex
OLAP
HTAP
OLTP
Simple
Writes Reads
Workload Focus [SOURCE]
CMU 15-445/645 (Fall 2019)
28
D ATA S T O R A G E M O D E L S
The DBMS can store tuples in different ways that
are better for either OLTP or OLAP workloads.
We have been assuming the n-ary storage model
(aka "row storage") so far this semester.
CMU 15-445/645 (Fall 2019)
29
N-ARY STORAGE MODEL (NSM)
The DBMS stores all attributes for a single tuple
contiguously in a page.
Ideal for OLTP workloads where queries tend to
operate only on an individual entity and insert-
heavy workloads.
CMU 15-445/645 (Fall 2019)
30
N-ARY STORAGE MODEL (NSM)
The DBMS stores all attributes for a single tuple
contiguously in a page.
Header userID userName userPass hostname lastLogin ←Tuple #1
Header userID userName userPass hostname lastLogin ←Tuple #2
Header userID userName userPass hostname lastLogin ←Tuple #3
Header - - - - - ←Tuple #4
CMU 15-445/645 (Fall 2019)
30
N-ARY STORAGE MODEL (NSM)
The DBMS stores all attributes for a single tuple
contiguously in a page.
NSM Disk Page
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
Header - - - - -
CMU 15-445/645 (Fall 2019)
31
N-ARY STORAGE MODEL (NSM)
SELECT * FROM useracct
WHERE userName = ? Lecture 7
AND userPass = ?
Index
NSM Disk Page
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
Header - - - - -
CMU 15-445/645 (Fall 2019)
31
N-ARY STORAGE MODEL (NSM)
SELECT * FROM useracct
WHERE userName = ? Lecture 7
AND userPass = ?
Index
INSERT INTO useracct
VALUES (?,?,…?)
NSM Disk Page
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
Header userID
- userName
- userPass
- hostname
- lastLogin
-
CMU 15-445/645 (Fall 2019)
32
N-ARY STORAGE MODEL (NSM)
SELECT COUNT(U.lastLogin),
EXTRACT(month FROM U.lastLogin) AS month
FROM useracct AS U
WHERE U.hostname LIKE '%.gov'
GROUP BY EXTRACT(month FROM U.lastLogin)
CMU 15-445/645 (Fall 2019)
32
N-ARY STORAGE MODEL (NSM)
SELECT COUNT(U.lastLogin),
EXTRACT(month FROM U.lastLogin) AS month
FROM useracct AS U
WHERE U.hostname LIKE '%.gov'
GROUP BY EXTRACT(month FROM U.lastLogin)
NSM Disk Page
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
CMU 15-445/645 (Fall 2019)
32
N-ARY STORAGE MODEL (NSM)
SELECT COUNT(U.lastLogin),
EXTRACT(month FROM U.lastLogin) AS month
FROM useracct AS U
WHERE U.hostname LIKE '%.gov'
GROUP BY EXTRACT(month FROM U.lastLogin)
NSM Disk Page
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
CMU 15-445/645 (Fall 2019)
32
N-ARY STORAGE MODEL (NSM)
SELECT COUNT(U.lastLogin),
EXTRACT(month FROM U.lastLogin) AS month
FROM useracct AS U
WHERE U.hostname LIKE '%.gov'
GROUP BY EXTRACT(month FROM U.lastLogin)
NSM Disk Page
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
CMU 15-445/645 (Fall 2019)
32
N-ARY STORAGE MODEL (NSM)
SELECT COUNT(U.lastLogin),
EXTRACT(month FROM U.lastLogin) AS month
FROM useracct AS U
WHERE U.hostname LIKE '%.gov'
GROUP BY EXTRACT(month FROM U.lastLogin)
NSM Disk Page
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
Useless Data
CMU 15-445/645 (Fall 2019)
33
N-ARY STORAGE MODEL
Advantages
→ Fast inserts, updates, and deletes.
→ Good for queries that need the entire tuple.
Disadvantages
→ Not good for scanning large portions of the table and/or
a subset of the attributes.
CMU 15-445/645 (Fall 2019)
34
DECOMPOSITION STORAGE MODEL (DSM)
The DBMS stores the values of a single attribute
for all tuples contiguously in a page.
→ Also known as a "column store".
Ideal for OLAP workloads where read-only
queries perform large scans over a subset of the
table’s attributes.
CMU 15-445/645 (Fall 2019)
35
DECOMPOSITION STORAGE MODEL (DSM)
The DBMS stores the values of a single attribute
for all tuples contiguously in a page.
→ Also known as a "column store".
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
Header userID userName userPass hostname lastLogin
CMU 15-445/645 (Fall 2019)
35
DECOMPOSITION STORAGE MODEL (DSM)
The DBMS stores the values of a single attribute
for all tuples contiguously in a page.
→ Also known as a "column store".
DSM Disk Page
hostname hostname hostname hostname hostname hostname
userID lastLogin hostname hostname hostname hostname hostname hostname
hostname hostname hostname hostname hostname hostname
hostname hostname hostname hostname hostname hostname
userName
userPass
CMU 15-445/645 (Fall 2019)
36
DECOMPOSITION STORAGE MODEL
(DSM)
SELECT COUNT(U.lastLogin),
EXTRACT(month FROM U.lastLogin) AS month
FROM useracct AS U
WHERE U.hostname LIKE '%.gov'
GROUP BY EXTRACT(month FROM U.lastLogin)
CMU 15-445/645 (Fall 2019)
36
DECOMPOSITION STORAGE MODEL
(DSM)
SELECT COUNT(U.lastLogin),
EXTRACT(month FROM U.lastLogin) AS month
FROM useracct AS U
WHERE U.hostname LIKE '%.gov'
GROUP BY EXTRACT(month FROM U.lastLogin)
DSM Disk Page
hostname hostname hostname hostname hostname hostname
hostname hostname hostname hostname hostname hostname
hostname hostname hostname hostname hostname hostname
hostname hostname hostname hostname hostname hostname
CMU 15-445/645 (Fall 2019)
37
T U P L E I D E N T I F I C AT I O N
Choice #1: Fixed-length Offsets
→ Each value is the same length for an attribute.
Choice #2: Embedded Tuple Ids
→ Each value is stored with its tuple id in a column.
Offsets Embedded Ids
A B C D A B C D
0 0 0 0 0
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
CMU 15-445/645 (Fall 2019)
38
DECOMPOSITION STORAGE MODEL (DSM)
Advantages
→ Reduces the amount wasted I/O because the DBMS only
reads the data that it needs.
→ Better query processing and data compression (more on
this later).
Disadvantages
→ Slow for point queries, inserts, updates, and deletes
because of tuple splitting/stitching.
CMU 15-445/645 (Fall 2019)
39
DSM SYSTEM HISTORY
1970s: Cantor DBMS
1980s: DSM Proposal
1990s: SybaseIQ (in-memory only)
2000s: Vertica, VectorWise, MonetDB
2010s: Everyone
CMU 15-445/645 (Fall 2019)
40
CONCLUSION
The storage manager is not entirely independent
from the rest of the DBMS.
It is important to choose the right storage model
for the target workload:
→ OLTP = Row Store
→ OLAP = Column Store
CMU 15-445/645 (Fall 2019)
41
D ATA B A S E S T O R A G E
Problem #1: How the DBMS represents the
database in files on disk.
Problem #2: How the DBMS manages its memory
and move data back-and-forth from disk. ← Next
CMU 15-445/645 (Fall 2019)