Data Normalization
• Primarily a tool to validate and improve a
logical design so that it satisfies certain
constraints that avoid unnecessary
duplication of data
• The process of decomposing relations with
anomalies to produce smaller, well-
structured relations
Tanvir Ahmed 1
Well-Structured Relations
• A relation that contains minimal data redundancy
and allows users to insert, delete, and update rows
without causing data inconsistencies
• Goal is to avoid anomalies
– Insertion Anomaly – adding new rows forces user to
create duplicate data
– Deletion Anomaly – deleting rows may cause a loss of
data that would be needed for other future rows
– Modification Anomaly – changing data in a row
forces changes to other rows because of duplication
General rule of thumb: a table should not pertain to
more than one entity type
Tanvir Ahmed 2
Example – Figure 5.2b
Question – Is this a relation? Answer – Yes: unique rows and no multivalued
attributes
Question – What’s the primary key? Answer – Composite: Emp_ID,
Course_Title
Tanvir Ahmed 3
Anomalies in this Table
• Insertion – can’t enter a new employee without
having the employee take a class
• Deletion – if we remove employee 140, we lose
information about the existence of a Tax Acc class
• Modification – giving a salary increase to
employee 100 forces us to update multiple records
Why do these anomalies exist?
Because we’ve combined two themes (entity types)
into one relation. This results in duplication, and an
unnecessary dependency between the entities
Tanvir Ahmed 4
Tanvir Ahmed 5
Tanvir Ahmed 6
Tanvir Ahmed 7
Tanvir Ahmed 8
Tanvir Ahmed 9
Tanvir Ahmed 10
Tanvir Ahmed 11
Tanvir Ahmed 12
Functional Dependencies and Keys
• Functional Dependency: The value of one
attribute (the determinant) determines the
value of another attribute
• Candidate Key:
– A unique identifier. One of the candidate keys will
become the primary key
• E.g. perhaps there is both credit card number and SS# in
a table…in this case both are candidate keys
– Each non-key field is functionally dependent on
every candidate key
Tanvir Ahmed 13
5.22 -Steps in
normalization
Tanvir Ahmed 14
First Normal Form
• No multivalued attributes
• Every attribute value is atomic
• Fig. 5-2a is not in 1st Normal Form
(multivalued attributes) it is not a relation
• Fig. 5-2b is in 1st Normal form
• All relations are in 1st Normal Form
Tanvir Ahmed 15
Second Normal Form
• 1NF plus every non-key attribute is fully
functionally dependent on the ENTIRE
primary key
– Every non-key attribute must be defined by the
entire key, not by only part of the key
– No partial functional dependencies
• Fig. 5-2b is NOT in 2nd Normal Form (see fig
5-23b)
Tanvir Ahmed 16
Fig 5.23(b) – Functional Dependencies
in EMPLOYEE2
Dependency on entire primary key
EmpID CourseTitle Name DeptName Salary DateCompleted
Dependency on only part of the key
EmpID, CourseTitle DateCompleted
EmpID Name, DeptName, Salary
Therefore, NOTTanvir
inAhmed
2nd Normal Form!! 17
Getting it into 2nd Normal Form
• See p193 – decomposed into two separate relations
Both are full
functional
EmpID Name DeptName Salary
dependencies
EmpID CourseTitle DateCompleted
Tanvir Ahmed 18
Third Normal Form
• 2NF PLUS no transitive dependencies
(one attribute functionally determines a
second, which functionally determines a
third)
• Fig. 5-24, 5-25
Tanvir Ahmed 19
Figure 5-24 -- Relation with transitive dependency
(a) SALES relation with simple data
Tanvir Ahmed 20
Figure 5-24(b) Relation with transitive dependency
CustID Name
CustID Salesperson BUT
CustID Region
CustID Salesperson Region
All this is OK Transitive dependency
(2nd NF) (not 3rd NF)
Tanvir Ahmed 21
Figure 5.25 -- Removing a transitive dependency
(a) Decomposing the SALES relation
Tanvir Ahmed 22
Figure 5.25(b) Relations in 3NF
Salesperson Region
CustID Name
CustID Salesperson
Now, there are no transitive dependencies…
Both relations are in 3rd NF
Tanvir Ahmed 23
Other Normal Forms
(from Appendix B)
• Boyce-Codd NF
– All determinants are candidate keys…there is no determinant
that is not a unique identifier
• 4th NF
– No multivalued dependencies
• 5th NF
– No “lossless joins”
• Domain-key NF
– The “ultimate” NF…perfect elimination of all possible
anomalies
Tanvir Ahmed 24