Topic 9
Database Systems
Unit 9 :
Database
Normalization
IT1768 Database Systems
Topic 9
Unit 9 : Database Normalization
Objective :
At the end of this unit, you should be able to:
Understand why and when to normalize tables.
Understand why redundant data can cause
update anomalies.
Define functional dependency and apply it in
normalization.
Perform normalization to place tables in third
normal form (3NF).
IT1768 Database Systems
Topic 9
Introduction to Normalization
Normalization is a design process whereby a
primitive data table (relation) is decomposed into
more simpler tables that more precisely fit the
relational model.
Normalization aims to remove redundant data in
order to eliminate update anomalies.
IT1768 Database Systems
Topic 9
Introduction to Normalization
Objective of Normalization
To avoid redundant data from tables (in order to
increase the integrity of the database design and to
maximize flexibility of data storage).
To help eliminate data anomalies (Insert, update,
Delete)
To produce controlled redundancies to link
tables.
IT1768 Database Systems
Topic 9
Introduction to Normalization
When to use Normalization technique ?
It is used :
when a set of tables is already available but
there has been no time to build an ER model.
to validate the design of existing tables when
user is unsure if the ER model is correct.
If the application of database is simple.
Used as a bottom-up approach to design the
database for the application.
IT1768 Database Systems
Topic 9
Redundant Data
Figure 1- Table with redundant data
It is unnecessary to store an employee's home address in
more than one record or one table. With duplicate data, it
occupies extra storage space and when all values (say
address for the same staff) are not updated completely, .
The value (in this case, the address) for that staff in one
table does not match the corresponding value (address)
of the same staff in another table.
IT1768 Database Systems
Topic 9
Redundant Data removed
Figure 2 Tables without redundant data
IT1768 Database Systems
After deploying
Normalization
technique, it yields
multiple tables with
redundant data
removed.
Topic 9
Database Anomalies
- Insertion
- Update
- Deletion
IT1768 Database Systems
Topic 9
Database Anomalies
An un-normalized table :
Database anomalies are the problems in relations
that occur due to redundancy of information in the
relations.
These anomalies affect the process of inserting,
deleting and modifying of data in the relations.
IT1768 Database Systems
Topic 9
Database Anomalies
Some important information may be lost if a relation
is updated that contains database anomalies.
It is important to remove these anomalies in order
to perform different processing on the relations
without difficulties.
3 Types of database anomalies
o Insertion anomaly
o Deletion anomaly
o Update anomaly
IT1768 Database Systems
10
Topic 9
Anomalies in un-Normalized table
Observation: PRO_NUM intended to be primary key
IT1768 Database Systems
Table entries invite data inconsistencies
Table displays data anomalies:
Update anomaly occurs when
Modifying JOB_CLASS
Insertion anomaly occurs when
Adding a new project: Need to assign an employee.
Deletion anomaly occurs if
Employees (103, 114) deleted, other vital data lost
11
Topic 9
Anomalies in Un-Normalized Table
null
null
20
Bay Shore
120
null
Bill MCan
Administrator
60.75
18.5
null
null
null
null
Observation: PRO_NUM intended to be primary key
IT1768 Database Systems
Table entries invite data inconsistencies
Table displays data anomalies:
Update anomaly occurs when
Modifying JOB_CLASS
Insertion anomaly occurs when
Adding a new project: Need to assign an employee.
Deletion anomaly occurs if
Employees (103, 114) deleted, other vital data lost
12
Topic 9
Insertion Anomalies
An insertion anomaly is a failure to place
information about a new database entry into all the
places in the database where information about
that new entry needs to be stored.
The insertion anomaly occurs when a new record is
inserted in the relation. In this anomaly, the user
cannot insert ALL the fact about an entity until he
has an additional fact about other entity.
In a properly normalized database, information
about a new entry needs to be inserted into only
one place in the database.
IT1768 Database Systems
13
Topic 9
Insertion Anomalies
Base on the relation below, insertion anomalies occur when
inserting a new staff member will force user to create branch data.
e.g. Add a new staff member SG99
Inserting a new branch (is not allowed) if there is no staff member for
the branch yet.
e.g. Add a new branch B008
SG99
Michael
Designer
30000 null
null
null
null
null
null
33 Clarence Road
IT1768 Database Systems
B008
14
Topic 9
Insertion Anomalies - Example
In the following table, if a new activity Golf of fee $500
is to be added, it has to wait until some new member
registers for Golf.
e.g. Add a new staff member M250
Mem_ID
IT1768 Database Systems
Fee
M100
Activity
Skiing
M150
Swimming
50
M175
Squash
100
M200
Swimming
50
M200
Squash
100
null
Golf
500
200
15
Topic 9
Insertion Anomalies - Example
To resolve insertion anomalies, it is to break-up the
relation into two separate relations; such as :
- Member (Key : Mem_ID)
- Fee (Key : Activity)
Mem_ID Activity
Activity
Fee
M100
Skiing
Skiing
5200
M150
Swimming
Swimming
50
M175
Squash
Squash
100
M200
Swimming
M200
Squash
Table : Fee
Table : Member
IT1768 Database Systems
16
Topic 9
Deletion Anomalies
A deletion anomaly is a failure to remove
information about an existing database entry
when it is time to remove that entry.
The deletion anomaly occurs when a record is
deleted from the relation. In this anomaly, the
deletion of fact about an entity automatically
deleted the fact of another entity.
In a properly normalized database, information
about an old, to-be-gotten-rid-of entry needs to
be deleted from only one place in the database.
IT1768 Database Systems
17
Topic 9
Deletion Anomalies
Deleting rows may cause a loss of data that would
be needed for other future rows.
e.g. Deletion of staff member SA9 will result in branch
details of B007 being lost.
IT1768 Database Systems
18
Topic 9
Modification (Update) Anomalies
The modification anomaly occurs when the record
is updated in the relation. In this anomaly, the
modification in the value of specific attribute
requires modification in all records in which that
value occurs.
IT1768 Database Systems
19
Topic 9
Modification (Update) Anomalies
A same record can be expressed in multiple records;
therefore updates to the table may result in logical
inconsistencies. For example; there could be two
different addresses in the Employee table when
updating the address of an employee who had
shifted his residence.
Employee
EmpID EmpAddress
Contact
E-55
Blk 16 Bedok South
64453638
E-56
12 Clementi Ave 1
66138899
Old record
E-56
180 AMK Ave 5
64563782
Updated record
IT1768 Database Systems
20
Topic 9
Update Anomaly Example 2
As shown in the example below, changing data in a
row forces changes of data in the duplicated rows.
e.g. Update address of B003
Need to update
all 3 records.
Failing to update
all affected
records will lead
to inconsistency
in the database.
Normalization and Update Anomalies
Normalization aims to remove redundant data, which
helps to eliminate update anomalies.
IT1768 Database Systems
21
Topic 9
Normalization
Normalization vs. Normalized Data
All anomalies are highly undesirable, their
occurrence constitutes corruption of the
database. Normalized data do not contain data
redundancies and therefore, are less
susceptible to data corruption.
Normalization is the activity, the process that
leads to a normalized data structure to achieve
normalized data.
IT1768 Database Systems
22
Topic 9
Normalization
Normal Forms
Normalization consists of a series of rules that must
be applied in steps to reach a higher level of
normalization. These levels are called normal forms.
There are many normal forms:
1NF -> 2NF -> 3NF -> BCNF -> 4NF -> 5NF
In general, the IT industry considers
normalization to the 3NF an acceptable level
to remove redundancy.
Normal forms higher than the 3NF deal with more
subtle anomalies.
IT1768 Database Systems
23
Topic 9
Repeating Group in Tables
A repeating group is an attribute, or group of
attributes, that occurs with multiple values for a
single occurrence of another attribute(s).
Project
ProjNum
Student
ProjLocation
EmpNo
PS-025
Sentosa
EE-03
073248K
Obama
165
PS-025
Sentosa
ED-15
074565L
Bush
172
PS-025
Sentosa
EP-21
076386E
Raymond
148
PC-344
Clementi
ES-26
075473P
Bryan
182
PC-344
Clementi
EE-32
075412M
Vanessa
163
Project table with repeating group
(ProjNum, ProjLocation)
AdminNum StudName
Height
Student table with no repeating groups
In normalization, 1NF eliminates repeating groups
IT1768 Database Systems
24
Topic 9
Repeating Group
The repeating group for Supplier in the table below
failed to distinguish PartNo and PDesc in the
database.
Part_Inventory
PartNo
PDesc
SuppNo
SName
SAddr
Cost
P1
Printer
S1
S2
Bright
SuperGood
Jurong
200
Senkang 220
P2
Speaker
S1
S2
Bright
SuperGood
Jurong
300
Senkang 350
Multiple occurrences of vendors for each single occurrence of part.
Repeating group for (SuppNo, sName, sAddr, cost)
IT1768 Database Systems
25
Topic 9
Repeating Group Remedy
(option1)
In order to create a table that is in first normal form we must
extract the repeating groups and place them in a separate
tables.
Part-Inventory
PartNo
PDesc
SuppNo
SName
SAddr
Cost
P1
Printer
S1
Bright
Jurong
200
P1
Printer
S2
SuperGood
Senkang
220
P2
Speaker
S1
Bright
Jurong
300
P2
Speaker
S2
SuperGood
Senkang
350
Part-Inventory(PartNo, Pdesc, SuppNo, SName, SAddr, Cost)
Splitting into two relations, (PartNo, Pdesc, SuppNo, Cost) eliminates repeating
groups and (SuppNo, SName, Saddr) contains no repeating group also.
IT1768 Database Systems
26
Topic 9
Repeating Group - Remedy
(option 2)
In order to create a table that is in first normal form we must
extract the repeating groups and place them in a separate
tables.
Part-Line (PartNo, PDesc, SuppNo, Cost )
PartNo
PDesc
SuppNo Cost
P1
Printer
S1
200
P1
Printer
S2
220
P2
Speaker
S1
300
P2
Speaker
S2
350
Splitting of the relation
into two tables
eliminates repeating
groups
Supplier (SuppNo, SName, SAddr )
SuppNo
IT1768 Database Systems
SName
SAddr
S1
Bright
Jurong
S2
SuperGood
Senkang
27
Topic 9
Functional Dependency
Functional dependency describes the relationship
between attributes (columns) in a single relation.
An attribute is functionally dependant on another if
we can use the value of one attribute to determine
the value of another.
For example:
STUDENT( StudNum, StudName, StudAddress )
o StudNum determines ( ) StudName and StudAddress.
o StudName, StudAddress are functionally dependent on
StudNum.
In normalization, 2NF accounts for functional dependency
in the relation.
IT1768 Database Systems
28
Topic 9
Functional Dependency
Example for further illustration:
StudNum
StudName
StudAddress
023344C
Kenny Ang
28 Woodland Rd
023817F
Lawrence Teo
211 Tampines Ave 7
028893E
Jenny Huang
11 Ang Mo Kio Ave 5
StudNum --> StudName, StudAddress
In schema, STUDENT ( StudNum, StudName, StudAddress )
StudNum is a key field. It functionally determines StudName
and StudAddress.
In other words, StudName and StudAddress are functionally
dependent on StudNum which is the key field.
Two types of functional dependency: Full and Partial
IT1768 Database Systems
29
Topic 9
Types of Functional Dependency
a) Full dependency (Full functional dependency)
Every Non-key column is functionally dependent
on all parts of the primary key (Whole key).
Example:
Part_Supplier
PartNo
SuppNo
Cost
P1
S1
200
P1
S2
220
P2
S1
300
P2
S2
350
IT1768 Database Systems
PartNo, SuppNo --> Cost
In schema,
Part_Supplier( PartNo, SuppNo , Cost )
30
Topic 9
Types of Functional Dependency
b) Partial dependency
There is Non-key column depend on some part of
the primary key.
Example:
Part_Supplier
Whole key : PartNo, SupplierNo
PartNo SuppNo SName
SAddr
Cost
P1
S1
Bright
Jurong
200
P1
S2
SuperGood
Senkang
220
P2
S1
Bright
Jurong
300
P2
S2
SuperGood
Senkang
350
SuppNo --> Sname, SAddr
IT1768 Database Systems
In schema,
Supplier( SuppNo, Sname, SAddr)
31
Topic 9
Types of Functional Dependency
Transitive dependency
There is Non-key column depend on another
Non-key column.
Example :
A Non-key attribute
Part
PartNo
PartDesc
Storeman
SContact
P1
Printer
Ken
98553344
P2
Speaker
Ken
98553344
P3
Ribbon
Alvin
97334400
P4
Cartridge
Alvin
97334400
In schema,
Storeman --> SContact Store-Person( Storeman, Scontact)
In normalization, 3NF accounts for transitive dependency in the relation.
IT1768 Database Systems
32
Topic 9
Normalization Rules
First Normal Form Rule, (1st Normal Form, 1NF)
Description
The table must express a set of unordered, twodimensional (2-D) tables.
The table cannot contain repeating groups (No two rows in
a table can be identical)
PartNo
Cost
P1
200
P1
P1
220
P1
Repeating groups
PartNo SuppNo
SName
SAddr
Cost
S1
Bright
Jurong
200
S2
SuperGood Senkang 220
No repeating groups
Part-Supplier( PartNo, SupNo, SName, SAddr, Cost)
IT1768 Database Systems
33
Topic 9
Normalization Rules
Second Normal Form Rule, (2nd Normal Form, 2NF)
Description
The table must be in 1NF. Every non-key attribute must be
functionally dependant on all of the key attribute(s).
PartNo
SuppNo
Cost
P1
S1
200
P1
S2
220
Fully dependant on the whole key attributes
2NF : Full Functional dependant :
PartNo, SuppNo --> Cost
Part-Supp( PartNo, SuppNo, Cost)
IT1768 Database Systems
34
Topic 9
Normalization Rules
Second Normal Form Rule:
(Partially functional Dependent)
PartNo Supplier_No Supplier_Name
Supplier_Addr
P1
S1
Ken
98553344
P2
S2
Ken
98553344
Partially dependant on the key attribute
2NF : Partial Functional dependant :
Supplier_No --> Supplier_Name, Supplier_Addr
Supplier( Supplier_No, Supplier_Name, Supplier_Addr)
IT1768 Database Systems
35
Topic 9
Normalization Rules
Third Normal Form Rule, (3rd Normal Form, 3NF)
Description
The table must be in 2NF and it contains no transitive
dependency. i.e. No non-key column transitively dependant
on another non-key column.
Inventory
PartNo
PartDesc
Storeman SContact
P1
Printer
Ken
98553344
P2
Speaker
Ken
98553344
3NF : Transitive dependant : Storeman --> SContact
Store-person( storeman, sContact)
IT1768 Database Systems
36
Topic 9
Normalization Rules
Normalization Process : Summary of Schemas
Part
No
PDesc
Supp
No
SName
SAddr
P1
Printer
S1
Bright
Jurong
P1
Printer
S2
P2
Speaker
Storeman
Scontact
200
Ken
98553344
SuperGood Senkang
220
Ken
98553344
S1
Bright
300
Alvin
97334400
P2
Speaker
S2
SuperGood Senkang
350
Alvin
97334400
Jurong
Cost
Inventory( PartNo, PDescr, SuppNo, SName, SAddr, Cost, Storeman, Scontact)
Part( PartNo, PDesc, Storeman)
Part-Supp( PartNo, SuppNo, Cost)
Supplier( SuppNo, SName, SAddr)
Store-person( Storeman, Scontact)
IT1768 Database Systems
37
Topic 9
Normalization Process
Table with
repeating groups
Remove repeating
groups
1NF
Handle functional
dependencies
.
2NF
3NF
IT1768 Database Systems
Handle transitive
dependencies
38
Topic 9
Normalization : Illustration Example 1
Relation Schema :
PROJECT ( Proj_Num, Proj_Name, Emp_Num, Emp_Name,
JOB_CLASS, CHG_HOUR, HOURS)
Its intended to use PROJ_NUM and EMP_NUM as the whole key attributes
IT1768 Database Systems
39
Topic 9
Normalization : 1NF
1NF Summary:
Define all key attributes by underlining them.
No repeating groups in table
All attributes dependent on primary key
PROJECT (PROJ_NUM, PROJ_NAME, EMP_NUM, EMP_NAME, JOB_CLASS,
CHG_HOUR, HOURS) OR
PROJECT (PROJ_NUM, PROJ_NAME)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS)
IT1768 Database Systems
40
Topic 9
Normalization : 2NF
2NF Summary:
In 1NF
Includes no partial dependencies
- No attribute dependent on a
portion of primary key
Still possible to exhibit transitive
dependency
- Attributes may be functionally
dependent on nonkey attributes
PROJECT (PROJ_NUM, PROJ_NAME)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
ASSIGN (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)
IT1768 Database Systems
41
Topic 9
Normalization : 3NF
Create separate table(s) to eliminate
transitive functional dependencies
PROJECT (PROJ_NUM, PROJ_NAME)
ASSIGN (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)
JOB (JOB_CLASS, CHG_HOUR)
3NF Summary:
In 2NF
Contain no transitive dependencies
IT1768 Database Systems
42
Topic 9
Normalized :
Illustration Example 2
Example of Un-normalized table:
Relational Schema:
ORDERS (OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState,
ItemID, ItemDescription, ItemQty, ItemPrice)
Item_total_price and Order_total_price are derived values, they can
be derived as follows:
o Item_total_price = ItemPrice*ItemQty and
o Order_total_price = Item_total_price1 + Item_total_price2 + .
IT1768 Database Systems
43
Topic 9
Removing the repeating groups
Relational Schema:
ORDERS (OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState,
ItemID, ItemDescription, ItemQty, ItemPrice)
The database schema now satisfies the two
requirements of 1st Normal Form atomicity and
uniqueness. Thus it fulfills the most basic of
creation of a relational database.
IT1768 Database Systems
44
Topic 9
Functional dependency with key attributes
ORDERS (OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState,
ItemID, ItemDescription, ItemQty, ItemPrice)
ORDERS
(OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState, ItemID, ItemDescription, ItemQty, ItemPrice)
Next, we test the table for functional dependency (FD); this
consists of full functional dependency and partial functional
dependency.
Full FD are attribute(s) dependents on all key attributes (in
this case, OrderID and ItemID together.
Partial FD refers attributes that are dependents on any of
the key attributes alone.
IT1768 Database Systems
45
Topic 9
Functional dependency with key attributes
ORDERS (OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState,
ItemID, ItemDescription, ItemQty, ItemPrice)
ORDERS
(OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState, ItemID, ItemDescription, ItemQty, ItemPrice)
ORDERS (OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState)
ORDERS_ITEMS (OrderID, ItemID, ItemQty)
ITEMS
(ItemID, ItemDescription, ItemPrice)
IT1768 Database Systems
46
Topic 9
Functional dependency with key attributes
ORDERS (OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState)
ORDERS_ITEMS (OrderID, ItemID, ItemQty)
ITEMS
(ItemID, ItemDescription, ItemPrice)
IT1768 Database Systems
47
Topic 9
Transitive Functional Dependency
ORDERS
(OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState, ItemID, ItemDescription, ItemQty, ItemPrice)
Next, we test the table for transitive functional dependency ;
i.e. any non-key attributes dependent on other non-key
attribute.
In this case, CustName, CustAddr, CustCity, CustState are
dependent on CustID. All are non-key attributes.
CUSTOMER (CustID, CustName, CustAddr, CustCity, CustState)
IT1768 Database Systems
48
Topic 9
Un-normalized to Normalized tables
ORDERS (OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState,
ItemID, ItemDescription, ItemQty, ItemPrice)
ORDERS (OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState)
ORDERS_ITEMS (OrderID, ItemID, ItemQty)
ITEMS
(ItemID, ItemDescription, ItemPrice)
CUSTOMER (CustID, CustName, CustAddr, CustCity, CustState)
IT1768 Database Systems
49
Topic 9
Finally - Normalized tables
IT1768 Database Systems
50
Topic 9
Design Trade-offs
De-Normalization
Sometimes, the relations could have a very low update rate
or very high query rate and the performance is
unsatisfactory. It is not worth the cost of normalizing it.
One example is as follows:
CUSTOMER (CustNumber, CustName, City, State, Zip)
Key : CustNumber
The Normalized relations are as follows:
CUSTOMER (CustNumber, CustName, Zip)
CODES ( Zip, City, State )
The two tables above are in normal form, but they most likely do not represent a
better design. The un-normalized table is probably better because it is easier to
process and the disadvantages of duplicating the City and State data are not very
important.
IT1768 Database Systems
51
Topic 9
Summary
Explain data redundancy and database
anomalies
Introduce normalization process
1NF to remove repeating groups in relations
2NF with Functional dependency among key
and non-key attributes
3NF with transitive dependency among nonkey attributes
IT1768 Database Systems
52