Mansoura University
Faculty of Computers and Information
Department of Information System
[IS211p] Database System
Dr. Amira Rezk
NORMALIZATION
Formal tool for analysis of relational
schemas
Enables us to detect and describe some
of the design problems in precise terms INTRODUCTION
Theory of functional dependency
DEFINITION OF FUNCTIONAL DEPENDENCY
Constraint between two sets of attributes from the database
Property of semantics or meaning of the attributes
Legal relation states
Satisfy the functional dependency constraints
DEFINITION OF FUNCTIONAL DEPENDENCY (CONT’D.)
Given a populated relation
Cannot determine which FDs hold and which do not Unless meaning of and relationships among
attributes known
Can state that FD does not hold if there are tuples that show violation of such an FD
The following FDs may hold :
B → C; C → B; {A, B} → C; {A, B} → D; and {C, D} → B.
However, the following do not hold
A → B , B → A , D→C
NORMAL FORMS BASED ON PRIMARY KEYS
Normalization process
Approaches for relational schema design
Perform a conceptual schema design using a conceptual model then map conceptual design into a set
of relations
Design relations based on external knowledge derived from existing implementation of files or forms
or reports
NORMALIZATION OF RELATIONS
Takes a relation schema through a series of tests
Certify whether it satisfies a certain normal form
Proceeds in a top-down fashion
Definition. The normal form of a relation refers to the highest normal form condition
that it meets, and hence indicates the degree to which it has been normalized
Normal form tests
NORMALIZATION OF RELATIONS (CONT’D.)
Properties that the relational schemas should have:
Nonadditive join property
Extremely critical
Dependency preservation property
Desirable but sometimes sacrificed for other factors
PRACTICAL USE OF NORMAL FORMS
Normalization carried out in practice
Resulting designs are of high quality and meet the desirable properties stated previously
Pays particular attention to normalization only up to 3NF, BCNF, or at most 4NF
Do not need to normalize to the highest possible normal form
Definition. Denormalization is the process of storing the join of higher normal form relations as a base
relation, which is in a lower normal form.
DEFINITIONS OF KEYS AND ATTRIBUTES PARTICIPATING IN KEYS
Definition of superkey and key
Candidate key
If more than one key in a relation schema
One is primary key
Others are secondary keys
Definition. An attribute of relation schema R is called a prime attribute of R if it is a
member of some candidate key of R. An attribute is called nonprime if it is not a
prime attribute—that is, if it is not a member of any candidate key.
FIRST NORMAL FORM
Part of the formal definition of a relation in the basic (flat) relational model
Only attribute values permitted are single atomic (or indivisible) values
Techniques to achieve first normal form
Remove attribute and place in separate relation
Expand the key
Use several atomic attributes
FIRST NORMAL FORM (CONT’D.)
Does not allow nested relations
Each tuple can have a relation within it
To change to 1NF:
Remove the attribute that violates 1NF and place it in a separate relation along with the primary key of the
original relation. The primary key of this relation is the combination of the P.K of original relation and the
removed attribute.
Expand the key so that there will be a separate tuple in the original relation for each value In this case, the
primary key becomes the combination → redundancy
If a maximum number of values is known for the attribute replace the attribute by set of atomic attributes →
null
SECOND NORMAL FORM
Based on concept of full functional dependency
Versus partial dependency
Definition. A relation schema R is in 2NF if every nonprime attribute A in R is fully functionally
dependent on the primary key of R.
Second normalize into a number of 2NF relations
Nonprime attributes are associated only with part of primary key on which they are fully functionally
dependent
THIRD NORMAL FORM
Based on concept of transitive dependency
Definition. According to Codd’s original definition, a relation schema R is in 3NF if it
satisfies 2NF and no nonprime attribute of R is transitively dependent on the primary
key.
Problematic FD
Left-hand side is part of primary key
Left-hand side is a nonkey attribute
GENERAL DEFINITIONS OF SECOND
AND THIRD NORMAL FORMS
GENERAL DEFINITIONS OF SECOND
AND THIRD NORMAL FORMS (CONT’D.)
Prime attribute
Part of any candidate key will be considered as prime
Consider partial, full functional, and transitive dependencies with respect to all candidate keys of a
relation
GENERAL DEFINITION OF SECOND NORMAL FORM
Definition. A relation schema R is in
second normal form (2NF) if every
nonprime attribute A in R is not
partially dependent on any key of R.
GENERAL DEFINITION OF THIRD NORMAL FORM
Definition. A relation schema R is in third normal form (3NF) if, whenever a nontrivial
functional dependency X→A holds in R, either (a) X is a superkey of R, or (b) A is a
prime attribute of R.
Alternative Definition. A relation schema R is in 3NF if every nonprime attribute of R
meets both of the following conditions:
It is fully functionally dependent on every key of R.
It is nontransitively dependent on every key of R.
QUIZ
Consider the following relation:
CAR_SALE(Car#, Date_sold, Salesperson#, Commission%, Discount_amt)
Assume that a car may be sold by multiple salespeople, and hence {Car#,Salesperson#} is the primary
key.
Additional dependencies are :
Date_sold → Discount_amt and Salesperson# → Commission%
Based on the given primary key, is this relation in 1NF, 2NF, or 3NF? Why or why not? How would you
successively normalize it completely?
ANSWER
CAR_SALE(Car#, Date_sold, Salesperson#, Commission%, Discount_amt)
Date_sold → Discount_amt and Salesperson# → Commission%
this relation is in 1NF? Why or why not?
2NF: (Car#, Salesperson#, Date_Sold, Discount_amt), (Salesperson#,
Commission%)
3NF: (Car#, Salesperson#, Date_Sold), (Date_Sold, Discount_amt), (Salesperson#,
Commission%)