CSE 544
Principles of Database
Management Systems
Fall 2016
Lecture 3 –Schema Normalization
CSE 544 - Fall 2016 1
Projects
• We have 27 teams
• Impossible to discuss projects at office hours tomorrow
• Instead, sign up on doodle for a 10’ slot on Monday.
CSE 544 - Fall 2016 2
Database Design
• The relational model is great, but how do I design my
database schema?
CSE 544 - Fall 2016 3
Outline
• Conceptual db design: entity-relationship model
• Problematic database designs
• Functional dependencies
• Normal forms and schema normalization
CSE 544 - Fall 2016 4
Database Design Process
Conceptual SQL
Refinement Files
Modeling Tables
Conceptual Schema
ER diagrams Physical Schema
Relations
CSE 544 - Fall 2016 5
Conceptual Schema Design
name
Conceptual Model: Patient patien_of Doctor
zip name dno
Relational Model:
plus FD’s
(FD = functional dependency)
Normalization:
Eliminates anomalies
CSE 544 - Fall 2016 6
Entity-Relationship Diagram
name
since
dno
pno
Patient patient_of Doctor
zip specialty name
Attributes Entity sets Relationship sets
name Patient patient_of 7
Entity-Relationship Model
• Typically, each entity has a key
Many One
• ER relationships can include multiplicity
– One-to-one, one-to-many, etc.
– Indicated with arrows
• Can model multi-way relationships
• Can model subclasses
• And more...
CSE 544 - Fall 2016 8
Subclasses to Product
Relations Name Price Category
name category Gizmo 99 gadget
Camera 49 photo
price
Toy 39 gadget
Product
Sw.Product Name platforms
isa isa
Gizmo unix
Software Product Educational Product Ed.Product
Age
platforms Age Group Name
Group
Gizmo toddler
Other ways to convert are possible Toy retired
CSE 544 - Fall 2016 9
General approach to Translating
Diagram into Relations
Normally translate as follows:
• Each entity set becomes a relation
• Each relationship set becomes a relation
– Except many-one relationships. Can combine them with entity set.
One bad way to translate our diagram into relations
• PatientOf (pno, name, zip, dno, since)
• Doctor (dno, dname, specialty)
CSE 544 - Fall 2016 10
Outline
• Conceptual db design: entity-relationship model
• Problematic database designs
• Functional dependencies
• Normal forms and schema normalization
CSE 544 - Fall 2016 11
Problematic Designs
• Some db designs lead to redundancy
– Same information stored multiple times
• Problems
– Redundant storage
– Update anomalies
– Insertion anomalies
– Deletion anomalies
CSE 544 - Fall 2016 12
Problem Examples
PatientOf
pno name zip dno since Redundant
1 p1 98125 2 2000 If we update
1 p1 98125 3 2003 to 98119, we
get inconsistency
2 p2 98112 1 2002
3 p1 98143 1 1985
What if we want to insert a patient without a doctor?
What if we want to delete the last doctor for a patient?
Illegal as (pno,dno) is the primary key, cannot have nulls
CSE 544 - Fall 2016 13
Solution: Decomposition
Patient PatientOf
pno name zip pno dno since
1 p1 98125 1 2 2000
2 p2 98112 1 3 2003
3 p1 98143 2 1 2002
3 1 1985
Decomposition solves the problem,
but need to be careful…
CSE 544 - Fall 2016 14
Lossy Decomposition
Patient PatientOf
pno name zip name dno since
1 p1 98125 p1 2 2000
2 p2 98112 p1 3 2003
3 p1 98143 p2 1 2002
p1 1 1985
Decomposition can cause us to lose information!
CSE 544 - Fall 2016 15
Schema Refinement Challenges
• How do we know that we should decompose a relation?
– Functional dependencies
– Normal forms
• How do we make sure decomposition does not lose info?
– Lossless-join decompositions
– Dependency-preserving decompositions
CSE 544 - Fall 2016 16
Outline
• Conceptual db design: entity-relationship model
• Problematic database designs
• Functional dependencies
• Normal forms and schema normalization
CSE 544 - Fall 2016 17
Functional Dependency
• A functional dependency (FD) is an integrity constraint
that generalizes the concept of a key
• An instance of relation R satisfies the FD: X ® Y
– if for every pair of tuples t1 and t2
– if t1.X = t2.X then t1.Y = t2.Y
– where X, Y are two nonempty sets of attributes in R
• We say that X determines Y
• FDs come from domain knowledge
CSE 544 - Fall 2016 18
FD Example
An FD holds, or does not hold on an instance:
EmpID Name Phone Position
E0045 Smith 1234 Clerk
E3542 Mike 9876 Salesrep
E1111 Smith 9876 Salesrep
E9999 Mary 1234 Lawyer
EmpID à Name, Phone, Position
Position à Phone
but not Phone à Position
CSE 544 - Fall 2016 19
FD Terminology
• FD’s are constraints
– On some instances they hold
– On others they do not
• If every instance of R will be one in which a given FD will
hold, then we say that R satisfies the FD
– If we say that R satisfies an FD F, we are stating a constraint on R
• FDs come from domain knowledge
CSE 544 - Fall 2016 20
Decomposition Problems
• FDs will help us identify possible redundancy
– Identify redundancy and split relations to avoid it.
• Can we get the data back correctly ?
– Lossless-join decomposition
• Can we recover the FD’s on the ‘big’ table from the FD’s
on the small tables?
– Dependency-preserving decomposition
– So that we can enforce all FDs without performing joins
CSE 544 - Fall 2016 21
Outline
• Conceptual db design: entity-relationship model
• Problematic database designs
• Functional dependencies
• Normal forms and schema normalization
CSE 544 - Fall 2016 22
Normal Forms
• Based on Functional Dependencies
– 2nd Normal Form (obsolete)
– 3rd Normal Form We only discuss
– Boyce Codd Normal Form (BCNF) these two
• Based on Multivalued Dependencies
– 4th Normal Form
• Based on Join Dependencies
– 5th Normal Form
CSE 544 - Fall 2016 23
BCNF
A simple condition for removing anomalies from relations:
A relation R is in BCNF if:
If A1, ..., An ® B is a non-trivial dependency in R ,
then {A1, ..., An} is a superkey for R
BCNF ensures that no redundancy can be detected
using FD information alone
CSE 544 - Fall 2016 24
Our Example
PatientOf
pno name zip dno since
1 p1 98125 2 2000
1 p1 98125 3 2003
2 p2 98112 1 2002
3 p1 98143 1 1985
pno,dno is a key, but pno ® name, zip
BCNF violation so we decompose
CSE 544 - Fall 2016 25
Decomposition in General
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)
R1 = projection of R on A1, ..., An, B1, ..., Bm
R2 = projection of R on A1, ..., An, C1, ..., Cp
Theorem If A1, ..., An à B1, ..., Bm
Then the decomposition is lossless
Note: don’t necessarily need A1, ..., An à C1, ..., Cp
CSE 544 - Fall 2016 26
BCNF Decomposition Algorithm
Repeat
choose A1, …, Am ® B1, …, Bn that violates BCNF condition
split R into
R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [rest])
continue with both R1 and R2
Until no more violations
Lossless-join decomposition: Attributes common to R1 and
R2 must contain a key for either R1 or R2
CSE 544 - Fall 2016 27
BCNF and Dependencies
Unit Company Product
FD’s: Unit ® Company; Company, Product ® Unit
So, there is a BCNF violation, and we decompose.
CSE 544 - Fall 2016 28
BCNF and Dependencies
Unit Company Product
FD’s: Unit ® Company; Company, Product ® Unit
So, there is a BCNF violation, and we decompose.
Unit Company
Unit ® Company
Unit Product
No FDs
In BCNF we lose the FD: Company, Product ® Unit
CSE 544 - Fall 2016 29
3NF
A simple condition for removing anomalies from relations:
A relation R is in 3rd normal form if :
Whenever there is a nontrivial dep. A1, A2, ..., An ® B for R,
then {A1, A2, ..., An } is a super-key for R,
or B is part of a key.
CSE 544 - Fall 2016 30
3NF Discussion
• 3NF decomposition v.s. BCNF decomposition:
– Complex: see book
• Tradeoffs
– BCNF = no anomalies, but may lose some FDs
– 3NF = keeps all FDs, but may have some anomalies
CSE 544 - Fall 2016 31
Summary
• Database design is not trivial
– Use ER models
– Translate ER models into relations
– Normalize to eliminate anomalies
• Normalization tradeoffs
– BCNF: no anomalies, but may lose some FDs
– 3NF: keeps all FDs, but may have anomalies
– Too many small tables affect performance
CSE 544 - Fall 2016 32