IS202 Data Management
Chapter 4
Part 2: Normalization
Motivation for Data Normalization
Why an entity is mapped to a relation?
Why a many-to-many relationship has to
be mapped to a new relation? Why not use
one relation to capture all the information?
Motivation for Data Normalization
Question Is this a relation?
Question Whats the
primary key?
Anomalies in this Table
Insertion cant enter a new employee without
having the employee take a class
Deletion if we remove employee 140, we lose
information about the existence of a Tax Acc class
Modification giving a salary increase to
employee 100 forces us to update multiple
records
Why do these anomalies exist?
Well-Structured Relations
A relation that contains minimal data redundancy
and allows users to insert, delete, and update
rows without causing data inconsistencies
Goal is to avoid anomalies
Insertion Anomaly adding new rows forces user to
create duplicate data
Deletion Anomaly deleting rows may cause a loss of
data that would be needed for other future rows
Modification Anomaly changing data in a row forces
changes to other rows because of duplication
General rule of thumb: a table should not pertain to
more than one entity type
5
Data Normalization
A tool to validate and improve a logical design
so that it satisfies certain constraints that
avoid unnecessary duplication of
data
The problems of having duplication of data
Waste of space
Difficulty in consistency control
Any other?
The process of decomposing relations with
anomalies to produce smaller, wellstructured relations
Data Normalization
Data Normalization
Functional Dependency
1st Normal Form (1NF)
2nd Normal Form (2NF)
3rd Normal Form (3NF)
Figure 4.22 -Steps in normalization
First Normal Form
No multivalued attributes
Every attribute value is atomic
All relations are in 1st Normal Form
Figure 4-2 (a) Table with multivalues attributes
Fig. 4-2a is not a relation (multivalued
attributes).
It is not in 1st Normal Form (1NF).
10
Figure 4-2b Eliminating multivalued attributes
EMPLOYEE2 relation
Fig. 4-2b is in 1st Normal form
11
Second Normal Form
1NF plus every non-key attribute is fully
functionally dependent on the ENTIRE
primary key
Every non-key attribute must be defined by
the entire key, not by only part of the key
No partial functional dependencies
12
STUDENT
True of False (based on data in the table only)
The value of Name determines that of Nationality
The value of Name determines that of Email
The value of Email determines that of AveGPA
The value of Nationality determines that of Email
The value of ID determines that of Nationality
13
Functional Dependencies and Keys
Candidate Key:
An attribute (or a combination of multiple attributes) that
can uniquely identify a row.
One of the candidate keys is chosen as identifier (in ER
stage), which becomes the Primary Key (in LD stage)
E.g. for a student, both email address and student id are unique.
However, only one of them becomes the identifier (and
subsequently the primary key)
Non-key attributes: attributes that are not candidate keys
Functional Dependency: The value of one attribute,
or of combination of attributes, (the determinant)
determines the value of another attribute
Each non-key field is functionally dependent on every
candidate key
14
Student
True or False (assume each student has a unique email
address):
Name is a non-key attribute:
Email is a candidate key:
Name is a candidate key:
Name depends on ID:
Name depends on Email :
Email depends on AveGPA :
Nationality depends on Email :
AveGPA depends on Name :
15
Representing Functional
Dependencies
Student
Graphical representation: a link is issued from
the determinant pointing to the other attribute
For example: ID is the determinant, and Name is
functionally dependent on ID
Text representation:
ID -> Name, AveGPA, Nationality
Email -> Name, AveGPA, Nationality
16
Partial Functional Dependency
A function dependency in which one or
more non-key attributes are functionally
dependent on part (but not in all) of the
primary key is defined as a partial
functional dependency.
17
Fig 4.23(b) Functional Dependencies in EMPLOYEE2
Dependency on entire primary key
Dependencies on part of the key (partial functional dependencies)
EmpID Name, DeptName, Salary
CourseID Course_Title
EmpID, CourseID DateCompleted
Partial dependencies => NOT in 2nd Normal Form!!
18
Getting it into 2nd Normal Form
Decompose the relation into three separate
relations
Emp
EmpID
Emp_Course
Course
Name DeptName Salary
EmpID
CourseID
No partial
functional
dependencies
DateCompleted
CourseID CourseTitle
19
Third Normal Form
2NF PLUS no transitive
dependencies
A transitive dependency is a
functional dependency between two
(or more) non-key attributes.
20
Example -- Relation with transitive dependency
(a) Customer_Order Relation with simple data
21
Example -- Relation with transitive dependency
CUSTOMER_ORDER
BUT
Order_ID Order_Date
Order_ID Customer_ID
Order_ID Customer_Name
Order_ID Customer_Address
Customer_ID Customer_Name
Customer_ID Customer_Address
Transitive Dependency: Not 3rd NF
All this is OK
(2nd NF)
22
Order_ID Customer_ID
Customer_ID Customer_Name, Customer_Address
Now, there are no transitive dependencies
Both relations are in 3rd NF
23
Another Example
Fig 4-26 Invoice relation (1NF)
(Pine Valley Furniture Company)
24
2NF: Remove Partial Functional
Dependency
OrderID OrderDate, CustomerID, CustomerName, CustomerAddress
ProductID ProductDescription, ProductFinish, UnitPrice
OrderID, ProductID OrderQuantity
Therefore, NOT in 2nd Normal Form
25
Getting it into the Second Normal Form
Fig 4-28 Removing Partial Dependencies
No partial functional dependency, and all three
relations are in 2NF
26
3NF: Remove Transitive Dependency
CustomerID CustomerName, CustomerAddress
Therefore, CUSTOMER_ORDER relation is
NOT in 3rd Normal Form
Getting it into the Third Normal Form
Fig 4-29 Removing Transitive Dependencies
Transitive dependencies are removed.
28
In Class Exercise 5.7
PRODUCT
Product
_ID
Product_Description
Product_Finish
Standard_Price
Product_Line_Id
1 End Table
Cherry
$175.00
2 Coffer Table
Natural Ash
$200.00
3 Computer Desk
Natural Ash
$175.00
4 Entertainment Center
Walnut
$650.00
5 Writers Desk
Cherry
$325.00
6 8-Drawer Desk
White Ash
$750.00
7 Dining Table
Natural Ash
$800.00
Walnut
$500.00
Computer Desk
Find all the function dependencies.
Is this in 2NF? If not, get it into 2NF.
Is this in 3NF? If not, get it into 3NF.
29
Data Normalization Summary
1st Normal Form
no multivalued attributes, and every attribute value is
atomic
All relations are in 1st Normal Form
2nd Normal Form
1NF + every non-key attribute is fully functionally
dependent on the ENTIRE primary key
Decomposing the relation into two new relations
3rd Normal Form
2NF PLUS no transitive dependencies
Decomposing the relation into two new relations
30
Other Normal Forms
(from Appendix B)
Boyce-Codd NF
All determinants are candidate keysthere is no
determinant that is not a unique identifier
4th NF
No multivalued dependencies
5th NF
No lossless joins
Domain-key NF
The ultimate NFperfect elimination of all
possible anomalies
31
In Class Exercise 5.8
Transitive Dependency Removal
INVOICE
32
In Class Exercise 5.9
The structure and sample data are provided for following table. Break it into
relations in 3NF (assumption: Dept_Manager must be a Emp who has a unique
Emp_Code, and Emp_Educ is a multi-valued attribute).
33
In Class Exercise 5.10
Consider the following
relation definition and
sample data:
PROJECT (ProjectID,
EmployeeName,
EmployeeSalary)
ProjectID: name of a
work project
EmployeeName: name
of an employee
EmployeeSalary: salary
of the employee
PROJECT
ProjectID Employee
Name
Employee
Salary
100A
Jones
64K
100A
Smith
51K
100B
Smith
51K
200A
Jones
64K
200B
Jones
64K
200C
Parks
28K
200C
Smith
51K
200D
Parks
28K
34
In Class Exercise (contd)
Assuming that all of the functional dependencies
and constraints are apparent in this data, which
of the following statements is true?
ProjectID EmployeeName
ProjectID EmployeeSalary
(ProjectID, EmployeeName)
EmployeeSalary
EmployeeName EmployeeSalary
EmployeeSalary ProjectID
EmployeeSalary (ProjectID,
EmployeeName)
35
In Class Exercise (contd)
What is the key of PROJECT?
Are all non-key attributes (if any)
dependent on all of the key?
In what normal form is PROJECT?
Describe one modification anomaly from
which PROJECT suffers.
Is ProjectID a determinant?
Is EmployeeName a determinant?
36
In Class Exercise (contd)
Is (ProjectID, EmployeeName) a
determinant?
Is EmployeeSalary a determinant?
Does this relation contain a partial
dependency? If so, what is it?
Redesign this relation to eliminate the
modification anomalies.
37
What we have learned
Key concepts:
Functional dependency
Partial functional dependency
Transitive dependency
Data normalization
1NF: must be a relation
2NF: 1NF + no partial functional dependency
3NF: 2NF + no transitive dependency
38
What you need to do
Review Chapter 4 (part 2)
Concepts
Recommendation
Introduction to Normalization
Normalization Example: Pine Valley
Furniture Company
A Final Step for Defining Relational
Keys
: must read, : good for you to read
39