Chapter 14
Normalization
Ahmed M. Zeki
ITIS 213
Objectives
The purpose of normalization.
How normalization can be used when designing a relational database.
The potential problems associated with redundant data in base relations.
The concept of functional dependency, which describes the relationship
between attributes.
The characteristics of functional dependencies used in normalization.
How to identify functional dependencies for a given relation.
How functional dependencies identify the primary key for a relation.
How to undertake the process of normalization.
How normalization uses functional dependencies to group attributes into
relations that are in a known normal form.
How to identify the most commonly used normal forms, namely First Normal
Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF).
The problems associated with relations that break the rules of 1NF, 2NF, or
3NF.
How to represent attributes shown on a form as 3NF relations using
normalization.
2 / 56
Introduction
The main objective of DB design is to create
an accurate representation of:
◦ data
◦ relationships between the data
◦ constraints on the data that is pertinent to the
enterprise
To do so we used DB design techniques:
◦ ER modeling
◦ Normalization
3 / 56
Introduction
Attributes describe some property of the data
or of the relationships between the data that
is important to the enterprise.
Normalization examines the relationships
(called functional dependencies) between
attributes.
It uses a series of tests (normal forms) to help
identify the optimal grouping for these
attributes to ultimately identify a set of
suitable relations that supports the data
requirements of the enterprise.
4 / 56
The Purpose of Normalization
Normalization is a technique for producing a
set of relations with describe properties,
given the data requirements of an enterprise.
Purpose:
◦ To identify a suitable set of relations that support
the data requirements of an enterprise.
5 / 56
The Purpose of Normalization
Characteristics of a suitable set of relations:
◦ The minimal number of attributes necessary to
support the data requirements of the enterprise;
◦ Attributes with a close logical relationship are
found in the same relation;
◦ Minimal redundancy with each attribute
represented only once with the important exception
of attributes that form all or part of foreign keys
6 / 56
The Purpose of Normalization
Benefit of a DB that has a suitable set of
relations:
◦ The DB will be easier for the user to access
◦ Easier to maintain the data
◦ Take up minimal storage space
7 / 56
Data Redundancy and Update
Anomalies
Major aim of RDB design is to group
attributes into relations to minimize data
redundancy.
benefit:
◦ Updates to the data stored in the DB are achieved
with a minimal number of operations reducing
the opportunities for data inconsistencies.
◦ Reduction in the file storage space required by the
base relations thus minimizing costs.
8 / 56
Data Redundancy and Update
Anomalies
RDB relies on the existence of a certain
amount of data redundancy.
This redundancy is in the form of copies of
PKs acting as FKs in related relations.
StaffBranch is the
alternative format of the
Staff and Branch
relations, which contains
redundancy.
9 / 56
Problems Associated with Data
Redundancy
Relations that have redundant data may have
problems called update anomalies, which are
classified as:
◦ Insertion anomalies
◦ Deletion anomalies
◦ Modification anomalies
10 / 56
Insertion Anomalies
Types:
1. Every time we add a new staff into the
StaffBranch, we must include the correct branch.
May lead to inconsistency !
While Staff and Branch relations don’t suffer from this
potential inconsistency
2. To insert details of a new branch that currently
has no staff into the StaffBranch relation, it is
necessary to enter nulls into the attributes for
staff, such as staffNo. But staffNo is the PK
filling it with nulls will violates entity integrity,
which is not allowed!
While Staff and Branch relations don’t suffer from this
problem
11 / 56
Deletion Anomalies
Deleting a tuple from StaffBranch which is the
last member of staff located at a branch
the details about that branch are lost from DB
◦ While Staff and Branch relations don’t suffer from
this problem
12 / 56
Modification Anomalies
Updating the address of a branch we must
update the tuples of all staff located at that
branch, if not inconsistency
Using Staff and Branch instead of StaffBranch
is called decomposition of a larger relation
into smaller relations
13 / 56
Modification Anomalies
Prosperities of decomposition:
◦ Loosless-join property: ensures that any instance of
the original relation can be identified from
corresponding instances in the smaller relations
◦ Dependency preservation property: ensures that a
constraint on the original relation can be
maintained by simply enforcing some constraint on
each of the smaller relations.
14 / 56
Functional Dependencies (FD)
Describe the relationship between attributes
in a relation.
◦ Ex: If A and B are attributes of relation R, B is
functionally dependent on A (represented as A
B), if each value of A is associated with exactly one
value of B.
The reverse of FD is determinant, i.e “A is a
determinant of B”
15 / 56
Functional Dependencies (FD)
FD is a property of the meaning (semantics)
of the attributes in a relation. The semantics
indicate how attributes relate to one another,
and specify the FDs between attributes.
When a FD is present, the dependency is
specified as a constraint between the
attributes.
16 / 56
Functional Dependencies (FD)
A B means
◦ for a given value of A we find only one value of B
(i.e. when two tuples have the same value of A they
must have the same value of B)
◦ but for a given value of B there may be several
values of A
17 / 56
Ex1: FD
For a given staffNo we can
determine the position of that staff
◦ Ex: SL21 Manager but Manager ?
i.e staffNo functionally determines position (1:1)
but not the opposite (1:*)
In Normalization we are
interested to identify FDs
between attributes of a
relation that have a 1:1
relationship between the
attributes that makes up the
determinant (left) and the
attributes (right). 18 / 56
Ex2 FD that holds for all time
Given one staffNo, we can
determine the sName of the staff
◦ The relationship between staffNo and sNAme is 1:1
i.e for each staffNo there is only one name
Also given one staff name we can determine
the staffNo, but this is true in this example
but may not be true all the time.
◦ The relationship between sNAme and staffNo is 1:*
i.e there can be several staffNo associated with a
name
19 / 56
Ex2: FD that Holds for All Time
So we have to clearly understand the purpose
of each attribute in that relation.
◦ Ex: the purpose of staffNo is to uniqely identify
each staff, but the purpose of sName is to hold the
names of staff.
FD is a property of a relational schema
(intension) and not a property of a particular
instance of the schema (extension)
20 / 56
Full FD
The determinants should have the minimal
number of attributes necessary to maintain
the FD with the attributes on the right hand-
side.
If A and B are attributes in a relation, B is fully
FD on A if B is FD on A but not on any proper
set of A.
◦ a FD A B is a full FD if removal of any attribute
from A result in the dependency no longer existing.
◦ a FD A B is a partially dependency if there is
some attribute that can be removed from A and yet
the dependency still holds.
21 / 56
Ex3: Full FD
Consider the following FD
that exists in the Staff relation
staffNo, sName branchNo
It is partial dependency because each value of
(staffNo, sName) is associated with a single
value of branchNo, but it is not a full FD
because branchNo is also FD on a subset of
(Staff, sName), namely staffNo.
So staffNo branchNo is a full FD
22 / 56
Summary: Characteristics of FDs we
use in Normalization
There is a 1:1 relationship between the
attribute(s) on the left side and those on the
right side of the FD (not the other direction).
They hold for all time
The determinant has the minimal number of
attributes necessary to maintain the
dependency with the attribute(s) on the right
side. i.e there must be a full FD between the
attribute(s) on the left and right sides of the
dependency.
23 / 56
Transitive Dependency (TD)
Its existence in a relation can potentially
cause the types of update anomaly.
Describes a condition where A, B, and C are
attributes of a relation such that if A → B and
B → C, then C is transitively dependent on A
via B (provided that A is not functionally
dependent on B or C).
24 / 56
Ex4: of TD
staffNo sName, position, salary, branchNo, bAddress
branchNo bAddress
The TD branchNo bAddress exists on
staffNo via branchNo.
i.e staffNo FD the bAddress via branchNo and
neither branchNo nor bAddress FDs staffNo.
25 / 56
Identifying FDs
If the meaning of the attributes and
relationships are well understood the
identification of FD should be simple.
If information is not available use common
sense.
26 / 56
EX5: Identify FDs
Examine the semantic of the attributes in the
StaffBranch
Assume that the position held and the branch
determine the salary.
staffNo sName, postion, salary, branchNo, bAddress
branchNo bAddress
bAddress branchNo
branchNo, position salary
bAddress, position salary
27 / 56
Ex6: absence of Info, using sample
data
28 / 56
Ex6: absence of Info, using sample
data
Assume that the data values in the table are
representative of all possible values that can
be held by attributes A, B, C, D, and E.
Examine the Sample relation and identify
when values in one column are consistent
with the presence of particular values in other
columns. Go from left to right.
Then look at combinations of columns, i.e
when values in 2 or more columns are
consistent with the appearance of values in
other columns.
29 / 56
Ex6: absence of Info, using sample
data
When a appears in A, z appears in C, and
when e appears in A, r appears in C.
Hence there is 1:1 relationship between A & C
i.e A functionally determines C (fd1)
:
:
A C (fd1)
C A (fd2)
B D (fd3)
A, B E (fd4)
30 / 56
Identifying the PK for a Relation
using FD
Main purpose of identifying a set of FDs for a
relation is to specify the set of integrity
constraints that must hold on a relation.
◦ Most important one is the PK
31 / 56
Ex7: Identifying the PK for a Relation
using FD
Refer to Ex5, we have found 5 FDs
◦ Identify attributes (or group of them) that uniquely
identifies each tuple in this relation.
◦ The only candidate key of the StaffBranch relation
and therefore the PK is staffNo, as all other
attributes are FD on staffNo.
◦ If a relation has more than one candidate key, we
identify the one that is to act as the PK for the
relation.
◦ All attributes that
are not part of the
PK should be FD on
the key.
32 / 56
Ex8: Identifying PK for Sample
Relation
Refer to Ex6. We have found 4 FDs
Examine each determinant for each FD to
identify the candidate key.
A suitable determinant must functionally
determine the other attributes. The only one
that functionally determine all the other
attributes is (A,B).
i.e. the attributes that make
up the determinant (A,B) can
determine all the other
attributes in the relation either separately as
A or B or together as (A,B). 33 / 56
The Process of Normalization
Normalization is a formal technique of
analyzing relations based on their PK (or
candidate keys) and FDs.
Involves a series of rules to test individual
relations.
When requirement is not met, the relation
violating the requirement must be
decomposed into relations that individually
meet the requirement of normalization.
34 / 56
The Process of Normalization
All NF are based on the concept of FDs
except 1NF.
NF beyond 3NF are very rare.
As normalization proceeds, the relations
become progressively more restricted
(stronger) in format and also less vulnerable
to update anomalies.
35 / 56
The Process of Normalization
It is a bottom-up approach
The output of 1NF in
some cases is already
in 2NF.
36 / 56
Unnormalized Form (UNF)
A table that contains one or more repeating
groups.
37 / 56
First Normal Form (1NF)
A relation in which the intersection of each
row and column contains one and only one
value.
38 / 56
UNF to 1NF
Identify and remove repeating groups.
◦ Repeating groups: is an attribute, or group of
attributes, within a table that occurs with multiple
values for a single occurrence.
39 / 56
UNF to 1NF
Two approaches to remove repeating groups:
◦ Flattening the table: entering appropriate data in
the empty columns of rows containing the
repeating data. i.e we fill in the blanks by
duplicating the non-repeating data, where
required.
◦ Placing the repeating data, along with a copy of the
original key attribute(s), in a separate relation.
Sometimes the unnormalized table may contain more
than one repeating group, or repeating groups within
repeating groups. In such cases, this approach is
applied repeatedly until no repeating groups remain.
40 / 56
UNF to 1NF
Both approaches are correct and in 1NF, but
the 1st introduces more redundancy than the
original UNF. While the 2nd produces more
tables with less redundancy than in the
original UNF table.
i.e. approach 2 moves the original UNF table
further along the normalization process than
approach 1.
41 / 56
Ex9: 1NF
John Kay is leasing a property
Assume that a client rents a
given property only once and
can’t rent more than one
property at any one time.
Unnormalized Table
1. Identify the key attribute 2. Identify the repeating group
42 / 56
Ex9: 1NF
UNF 1NF using approach 1:
43 / 56
Ex9: 1NF
rentFinish is not appropriate as
a component of a candidate
Identify the FDs key as it may contain nulls
Use FDs to identify candidate keys:
◦ clientNo, propertyNo chose this as PK)
◦ clientNo, rentStart
◦ propertyNo, rentStart 44 / 56
Ex9: 1NF
UNF 1NF using approach 2:
Remove repeating group by placing them
along with the original key attribute (clientNo)
in a separate table.
With the help of the FDs, identify
the PK for both relations.
Now both tables are in 1NF.
45 / 56
Second Normal Form (2NF)
The 1NF produced (from both approaches)
contains significant amount of redundancy.
2NF is based on the concept of full FD.
2NF applies to relations with composite keys.
A relation with a single-attribute PK is already
in at least 2NF.
A relation not in 2NF may suffer from the
update anomalies.
2NF is a relation that is in 1NF and every non
PK attribute is fully FD on the PK, i.e. no PDs.
46 / 56
2NF
Normalization of 1NF relation to 2NF involves
the removal of PD by placing them in a new
relation along with a copy of their
determinant.
47 / 56
Ex10: 2NF
Rewriting the FDs
fd1 clientNo, propertyNo rentStart, rentFinish
(PK)
fd2 clientNo cName
(PD)
fd3 propertyNo pAddress, rent, ownerNo, oName
(PD)
fd4 ownerNo oName
(TD)
fd5 clientNo, rentStart propertyNo, pAddress, rentFinish,
rent, ownerNo, oName (CK)
fd6 prepertyNo, rentStart clientNo, cName, rentFinish
(CK)
48 / 56
Ex10: 2NF
Use those FDs
to test whether
the ClientRental relation is in 2NF by
identifying the presence of any PDs on the
PK.
Note that:
fd2 clientNo cName
(PD)
fd3 propertyNo pAddress, rent, ownerNo, oName
(PD)
i.e it is not in 2NF
49 / 56
Ex10: 2NF
To transform it
to 2NF:
◦ Create new
tables so that the
non-PK
attributes are
removed along with a copy of the part of the PK on
which they are fully FD.
Client (clientNo, cName)
Rental (clientNo, propertyNo, rentStart, rentFinish)
PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName)
50 / 56
Third Normal Form (3NF)
2NF still suffers from update
anomalies.
◦ Eg. Updating the name of Tony Shaw, we need to
do it twice in the PropertyOwner table
This problem is because the TD
3NF is a relation that is in the 1NF and 2NF
and in which no non-PK attribute is TD on the
PK.
To transform 2NF to 3NF remove the TD by
placing the attributes in a new relation along
with a copy of the determinant.
51 / 56
Ex11. 3NF
From Ex10:
Client
fd2 clientNo cName
(PK)
Client & Rental relations have no TD,
hence already in 3NF
Rental
fd1 clientNo, propertyNo rentStart, rentFinish
(PK)
Fd5’ clientNo, rentStart PropertyNo, rentFinish
(CK)
Fd6’ prepertyNo, rentStart clientNo, rentFinish
(CK)
Remove the TD by creating two new
PropertyOwner relations PropertyForRent & Owner
fd3 propertyNo pAddress, rent, ownerNo, oName 52 / 56
Ex11. 3NF
Both tables are in 3NF because no more TD
53 / 56
Summary
The ClientRental table has been
transformed by normalization into
4 relations in 3NF
Client (clientNo, cName)
Rental (clientNo, propertyNo, rentStart, rentFinish)
PropertyForRent (propertyNo, pAddress, rent, ownerNo)
Owner (ownerNo, oName)
54 / 56
General Definitions of 2NF & 3NF
Those definitions take into account other
candidate keys of a relation if exist.
2NF: a relation that is in 1NF and every non-
candidate key attribute is fully FD on any
candidate key.
3NF: a relation that is in 1NF and 2NF and in
which no non-candidate-key attribute is TD
on any candidate key.
i.e when using those definitions, we must be
aware of PDs and TDs on all candidate keys
and not just the PK.
55 / 56
General Definitions of 2NF & 3NF
The general definitions may reveal some
hidden redundancy.
Following the general definitions may
increase the complexity.
Tradeoff is possible.
In most of the cases the same decomposition
of relations will be produced in both cases.
56 / 56