ADA University
Lecture #01
DATABASE
SYSTEMS
Course Intro &
Relational Model
CSCI 3615 / Database Systems Spring 2024 Mr. Nuraddin SADILI
Overview
Course Logistics
Relational Model
Relational Algebra
CO U RSE O VERVI EW
This course is on the design andimplementation
of disk-oriented database management systems.
This is not a course on how to use a database to
buildapplications or how to administer a database.
→ See, for example, Oracle database
administration path.
CO U RSE TOPICS
Relational Databases
Relational Languages
SQL
ER Model
Designing Databases
Application development
Procedures and functions in SQL
CO U RSE LO GI STI CS
Course Policies + Schedule:
→ Refer to course syllabus on the Blackboard.
Academic Honesty:
→ Refer to ADA University honor code.
→ If you’re not sure, ask the professors.
All discussion + announcements will be on Blackboard.
TEXTBO O K
Database System Concepts, 7th Edition
Silberschatz, Korth, &Sudarshan
TEXTBO O K
Database Systems, Design,
Implementation, & Management, 13th
Edition
Coronel, Morris
GRADING RUBRIC
Attendance (5%)
Participation (10%)
Quizzes (10%)
Homework (30%)
Midterm Exam (20%)
Final Exam (25%)
PL AGIARI SM WARN I N G
The homeworks must be your own work. They are
not group assignments.
You may not copy solutions from other people or
the web.
Plagiarism will not be tolerated.
DATABASE IMAGE
Some homeworks will require a pre-installed
database. Follow the instructions in the syllabus to
download and install it on your laptop.
L ATE PO LI C Y
No late submissions will be accepted.
M I D - T ERM EXAM
Computer based (in Blackboard) or paper-based
exam on the lecture notes, homeworks and in-class
exercises. The exam will be closed notes.
Exam will be given during week 08 or 09.
FI NAL EXAM
Computer based (in Blackboard) or paper-based
exam on the lecture notes, homeworks and in-class
exercises(if any). Harder than the mid-term.
Will be at during the University's final examination
period at the end of the semester.
Dat abases
D ATABASE
Organized collection of inter-related data that
models some aspect of the real-world.
Databases are core component of the most
computer applications.
D ATABASE EXAM PLE
Create a database that models a digital music store
to keep track of artists and albums.
Things we need store:
→ Information about Artists
→ What Albums those Artists released
FL AT FI LE STRAW MAN
Store our database as comma-separatedvalue
(CSV) files that we manage in our own code.
→ Use a separate file per entity.
→ The application has to parse the files each time they want
to read/update records.
FL AT FI LE STRAW MAN
Create a database that models a digital music store.
Ar t i st (name, year, country) Al bum(name, artist, year)
"Wu Tang Clan",1992,"USA" "Enter the Wu Tang","Wu Tang Clan",1993
"Notorious BIG",1992,"USA" "St.Ides Mix Tape","Wu Tang Clan",1994
"AmeriKKKa's Most Wanted","Ice Cube",1990
"Ice Cube",1989,"USA"
FL AT FI LE STRAW MAN
Example: Get the year that Ice Cube went solo.
Ar t i st (name, year, country)
for line in file:
"Wu Tang Clan",1992,"USA" record = par se(line)
"Notorious BIG",1992,"USA" i f “Ice Cube”== record[0]:
print i nt (record[1])
"Ice Cube",1989,"USA"
FL AT FI LES: DATA I N TEGRI TY
How do we ensure that the artist is the same for
each album entry?
What if somebody overwrites the album yearwith
an invalid string?
FL AT FI LES: I M PLEM EN TAT I O N
How do you find a particular record?
What if we now want to create a new
application that uses the same database?
What if two threads try to write to the same file at
the same time?
FL AT FI LES: D U RABI LI TY
What if the machine crashes while our program is
updating a record?
What if we want to replicate the database on
multiple machines for high availability?
DATABASE MANAGEM EN T SYSTEM
A DBMSis software that allows applications to
store and analyze information in a database.
A general-purpose DBMS is designedto allow the
definition, creation, querying, update, and
administration of databases.
EARLY D BM Ss
Database applications were difficult to
buildand maintain.
Tight coupling between logical and
physical layers.
You have to (roughly) know what
queries your app wouldexecute
before you deployed the database.
Edgar F. Codd
EARLY D BM Ss
Database applications were difficult to
buildand maintain.
Tight coupling between logical and
physical layers.
You have to (roughly) know what
queries your app wouldexecute
before you deployed the database.
Edgar F. Codd
REL ATI O NAL M O D EL
Proposed in 1970 by Ted Codd.
Database abstraction to avoidthis
maintenance:
→ Store database in simple data structures.
→ Access data through high-level language.
→ Physical storage left up to
implementation.
Edgar F. Codd
DATA M O D ELS
A data model is collection of concepts for
describing the data in a database.
A schema is a description of a particular collection
of data, using a given data model.
DATA M O D EL
Relational · Most DBMSs
Key/Value
Graph
Document
Column-family
Array/ Matrix
Hierarchical
Network
DATA M O D EL
Relational
Key/Value
Graph
Document · NoSQL
Column-family
Array/ Matrix
Hierarchical
Network
DATA M O D EL
Relational
Key/Value
Graph
Document
Column-family
Array/ Matrix · Machine Learning
Hierarchical
Network
DATA M O D EL
Relational
Key/Value
Graph
Document
Column-family
Array/ Matrix
Hierarchical
· Obsolete / Rare
Network
DATA M O D EL
Relational · This Course
Key/Value
Graph
Document
Column-family
Array/ Matrix
Hierarchical
Network
REL ATI O NAL M O D EL
The relational model defines a database abstraction
based on relations to avoid maintenance overhead.
Key tenets:
→ Store database in simple data structures (relations).
→ Physical storage left up to the DBMS implementation.
→ Access data through high-level language, DBMS figures
out best execution strategy.
LEVELS OF ABSTRACTION
type instructor = record
• Physical level ID : string;
name : string;
o Describes how a record (e.g., instructor) is stored
dept_name : string;
• Logical level salary : integer;
end;
o Describes the data stored in database, and the
relationships among the data
• View level
o Application programs hide details of data types.
Views can also hide information (such as an
employee’s salary) for security purposes.
REL ATI O NAL M O D EL
An architecture for a database system:
REL ATI O NAL M O D EL
Structure: The definition of relations and their
contents.
Integrity: Ensure the database’s contents satisfy
constraints.
Manipulation:How to access andmodify a
database’s contents.
REL ATI O NAL M O D EL attributes (
or columns)
A relation is unordered set that Ar t i st (name, year, country)
contain the relationship of name year country
attributes that represent entities. Wu Tang Clan 1992 USA
tuples
Notorious BIG 1992 USA (or rows)
A tuple is a set of attribute values Ice Cube 1989 USA
(also known as its domain) in the
relation. n- ary Relation
→ Values are (normally) atomic/scalar. =
→ The special value NULLis a member Table with n columns
of every domain.
REL ATI O NAL M O D EL: KEYs
Let R = (A1, A2, …, An ) be a relation schema
Let K R
K is a SUPERKEY of R if values for K are sufficient to identify a
unique tuple of each possible relation r(R)
• Example: {ID} and {ID,name} are both superkeys of instructor.
Superkey K is a CANDIDATE KEY if K is minimal
• Example: {ID} is a candidate key for Instructor
One of the candidate keys is selected to be the PRIMARY KEY.
• Which one?
REL ATI O N AL M O D EL: PRI M ARY KEYS
Arelation’s primary key uniquely Ar t i st (name, year, country)
name year country
identifies a single tuple.
Wu Tang Clan 1992 USA
Some DBMSs automaticallycreate an Notorious BIG 1992 USA
internal primary key if you don't Ice Cube 1989 USA
define one.
Auto-generation of unique integer
primary keys:
→ SEQUENCE (SQL:2003)
→ AUTO_INCREMENT (MySQL)
→ S E R I A L (PostgreSQL)
REL ATI O N AL M O D EL: PRI M ARY KEYS
Arelation’s primary key uniquely Ar t i st (id, name, year, country)
id name year country
identifies a single tuple.
123 Wu Tang Clan 1992 USA
Some DBMSs automaticallycreate an 456 Notorious BIG 1992 USA
internal primary key if you don't 789 Ice Cube 1989 USA
define one.
Auto-generation of unique integer
primary keys:
→ SEQUENCE (SQL:2003)
→ AUTO_INCREMENT (MySQL)
→ S E R I A L (PostgreSQL)
REL ATI O N AL M O D EL: FO REI GN KEYS
A FOREIGN KEY specifies that an attribute from
one
relation has to map to a tuple in another relation.
• Referencing relation
• Referenced relation
REL ATI O N AL M O D EL: FO REI GN KEYS
Ar t i st (id, name, year, country)
id name year country
123 Wu Tang Clan 1992 USA
456 Notorious BIG 1992 USA
789 Ice Cube 1989 USA
Al bum(id, name, artists, year)
id name artists year
11 Enter the Wu Tang 123 1993
22 St.Ides Mix Tape ??? 1994
33 AmeriKKKa's Most Wanted 789 1990
REL ATI O N AL M O D EL: FO REI GN KEYS
Ar t i st (id, name, year, country)
id name year country
123 Wu Tang Clan 1992 USA
Ar t i st Al bum(artist_id, album_id) 456 Notorious BIG 1992 USA
artist_id album_id 789 Ice Cube 1989 USA
123 11
123 22 Al bum(id, name, artists, year)
789 22 id name artists year
456 22 11 Enter the Wu Tang 123 1993
22 St.Ides Mix Tape ??? 1994
33 AmeriKKKa's Most Wanted 789 1990
REL ATI O N AL M O D EL: FO REI GN KEYS
Ar t i st (id, name, year, country)
id name year country
123 Wu Tang Clan 1992 USA
Ar t i st Al bum(artist_id, album_id) 456 Notorious BIG 1992 USA
artist_id album_id 789 Ice Cube 1989 USA
123 11
123 22 Al bum(id, name, year)
789 22 id name year
456 22 11 Enter the Wu Tang 1993
22 St.Ides Mix Tape 1994
33 AmeriKKKa's Most Wanted 1990
REL ATI O N AL M O D EL: FO REI GN KEYS
Ar t i st (id, name, year, country)
id name year country
123 Wu Tang Clan 1992 USA
Ar t i st Al bum(artist_id, album_id) 456 Notorious BIG 1992 USA
artist_id album_id 789 Ice Cube 1989 USA
123 11
123 22 Al bum(id, name, year)
789 22 id name year
456 22 11 Enter the Wu Tang 1993
22 St.Ides Mix Tape 1994
33 AmeriKKKa's Most Wanted 1990
DATA MAN I PU L AT I O N L AN GU AGES (D M L)
How to store andretrieve information from a
database.
Procedural: · Relational
→ The queryspecifies the (high-level) strategy Algebra
the DBMS shoulduse to find the desired result.
Non-Procedural:
→ The queryspecifies only what data is wanted
and not how to find it.
DATA MAN I PU L AT I O N L AN GU AGES (D M L)
How to store andretrieve information from a
database.
Procedural: · Relational
→ The queryspecifies the (high-level) strategy Algebra
the DBMS shoulduse to find the desired result.
Non-Procedural: · Relational
→ The queryspecifies only what data is wanted Calculus
and not how to find it.
REL ATI O NAL ALGEBRA
Fundamental operations to retrieve σ Select
and manipulate tuples in a relation. Projection
→ Based on set algebra.
∪ Union
Each operator takes one or more ∩ Intersection
relations as its inputs and outputs a
new relation. Difference
→ We can “chain” operators together to create × Product
more complex operations. ⋈ Join
REL ATI O NAL ALGEBRA: SELECT
R(a_i d, b_i d)
Choose a subset of the tuples from a a_id b_id
a1 101
relation that satisfies a selection a2 102
predicate. a2 103
→ Predicate acts as a filter to retain only a3 104
tuples that fulfill its qualifying σa_id='a2' ( R) σa_id='a2'∧ b_id>102(R)
requirement. a_id b_id a_id b_id
→ Can combine multiple predicates using a2 102 a2 103
conjunctions / disjunctions. a2 103
SELECT * FROM R
Syntax: σpredicate(R) WHERE a_id='a2' AND b_id>102;
REL ATI O NAL ALGEBRA: SELECT
Selection predicate can be one of the followings:
• Comparisons (=, , >, , <, )
Several predicates can be combined using one of the
following connectives:
• (and), (or), (not)
REL ATI O NAL ALGEBRA: PRO J EC T I O N
R(a_i d, b_i d)
a_id b_id
Generate a relation with tuples that a1 101
contains only the specified attributes. a2 102
→ Can rearrange attributes’ordering. a2 103
→ Can manipulate the values. a3 104
Πb_id-100,a_id( σa_id='a2'( R) )
Syntax: A1,A2,…,A n (R) b_id-100 a_id
2 a2
3 a2
SELECT b_id-100, a_id
FROM R WHERE a_id = 'a2';
REL ATI O NAL ALGEBRA: U N I O N
R(a_i d, b_i d) S(a_i d, b_i d)
a_id b_id a_id b_id
Generate a relation that contains all
a1 101 a3 103
tuples that appear in either only one a2 102 a4 104
or both input relations. a3 103 a5 105
(R ∪ S)
Syntax: (R ∪ S) a_id b_id
a1 101
a2 102
(SELECT * FROM R) a3 103
UNION ALL a3 103
(SELECT * FROM S); a4 104
a5 105
REL ATI O NAL ALGEBRA: I N TERSEC T I O N
R(a_i d, b_i d) S(a_i d, b_i d)
a_id b_id a_id b_id
Generate a relation that contains only
a1 101 a3 103
the tuples that appear in both of the a2 102 a4 104
input relations. a3 103 a5 105
(R ∩ S)
Syntax: (R ∩ S) a_id b_id
a3 103
(SELECT * FROM R)
INTERSECT
(SELECT * FROM S);
REL ATI O N AL ALGEBRA: D I FFEREN C E
R(a_i d, b_i d) S(a_i d, b_i d)
a_id b_id a_id b_id
Generate a relation that contains only
a1 101 a3 103
the tuples that appear in the first and a2 102 a4 104
not the second of the input relations. a3 103 a5 105
(R – S)
Syntax: (R – S) a_id b_id
a1 101
a2 102
(SELECT * FROM R)
EXCEPT
(SELECT * FROM S);
REL ATI O NAL ALGEBRA: PRO D U C T
R(a_i d, b_i d) S(a_i d, b_i d)
a_id b_id a_id b_id
Generate a relation that contains all
a1 101 a3 103
possible combinations of tuples from a2 102 a4 104
the input relations. a3 103 a5 105
(R × S)
Syntax: (R ×S) R.a_id
a1
R.b_id
101
S.a_id
a3
S.b_id
103
a1 101 a4 104
a1 101 a5 105
SELECT * FROM R CROSS JOIN S; a2 102 a3 103
a2 102 a4 104
a2 102 a5 105
SELECT * FROM R, S; a3 103 a3 103
a3 103 a4 104
a3 103 a5 105
REL ATI O NAL ALGEBRA: J O I N
R(a_i d, b_i d) S(a_i d, b_i d)
Generate a relation that contains all a_id b_id a_id b_id
tuples that are a combination of two a1 101 a3 103
a2 102 a4 104
tuples (one from each input relation) a3 103 a5 105
with a common value(s) for one or
more attributes. (R ⋈ S)
a_id b_id
a3 103
Syntax: (R ⋈ S)
SELECT * FROM R NATURAL JOIN S;
SELECT * FROM R JOIN S USING (a_id, b_id);
REL ATI O NAL ALGEBRA: EXTRA O PERATO RS
Rename (ρ)
Assignment (R←S)
Duplicate Elimination (δ)
Aggregation (γ)
Sorting(τ)
Division (R÷S)
REL ATI O NAL M O D EL: Q U ERI ES
The relational model is independent of any query
language implementation.
SQL is the de facto standard.
for line in file:
SELECT year FROM artists
record = par se(line)
WHERE name = "Ice Cube“;
i f “Ice Cube”== record[0]:
print i nt (record[1])
DATA MODELS
Relational
Key/Value
Graph
Document / Object ← Leading Alternative
Wide-Column / Column-family
Array / Matrix / Vectors
Hierarchical
Network
Multi-Value
DOCUMENT DATA MODEL
Embed data hierarchy into a single object.
Artist R1(id,…)
⨝
ArtistAlbum R2(artist_id,album_id)
⨝
Album R3(id,…)
DOCUMENT DATA MODEL
Embed data hierarchy into a single object.
Artist R1(id,…)
⨝
ArtistAlbum R2(artist_id,album_id)
⨝
Album R3(id,…)
DOCUMENT DATA MODEL
Embed data hierarchy into a single object.
Application Code {
class Artist { int "name": "GZA",
Artist id; String "year": 1990,
"albums": [
name; int
{
year; "name": "Liquid Swords",
Album albums[]; "year": 1995
} },
class Album { {
int id; String "name": "Beneath the Surface",
Album name; int year; "year": 1999
}
}
]
}
CO N CLU SI O N
Databases are ubiquitous.
Relational algebra defines the primitives for
processing queries on a relational database.
We will see relational algebra again when we talk
about query optimization + execution.