MODULE 3
NORMALIZATION
Introduction
Normalization is a process of organizing data in a database to reduce redundancy and
inconsistency, while improving data integrity and efficiency. It involves breaking down large
tables into smaller, more manageable tables, linked through relationships.
Advantages
• Reduces Data Redundancy:
• Improves Data Integrity:
• Simplifies Database Design:
• Optimizes Performance:
Disadvantage
Performance decreases when normalizing the table to higher normal forms like 4NF
and 5NF
1NF disallows the multivalued attributes.
Informal design guideline for relational Databases
It is used as measure to determine the quality of relation schema design.
1. Semantics of the relation Attributes
2. Reducing Redundant information in tuples and update anamolies
3. Reducing Null values in tuples
4. Does not allow generation of Spurious tuples
Functional Dependencies
Given a relation R, a set of attributes X in R is said to functionally determine another
attribute Y, also in R, (written X Y ) if and only if each X value is associated with at
most one Y value.
XY
Here X—determinant
Y – Dependent
Example SSN Ename
{SSN,Pno} Hours
Fig: diagrammatic notation for displaying FDs
Types of Functional Dependencies in DBMS
1. Trivial functional dependency
In Trivial Functional Dependency, a dependent is always a subset of the determinant.
i.e. If X → Y and Y is the subset of X, then it is called trivial functional dependency.
Example 1 :
• ABC -> AB
• ABC -> A
• ABC -> ABC
2. Non-Trivial functional dependency
In Non-trivial functional dependency, the dependent is strictly not a subset of the
determinant. i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial
functional dependency.
Example:
ABCX
3. Transitive functional dependency
In transitive functional dependency, dependent is indirectly dependent on
determinant. i.e. If a → b & b → c, then according to axiom of transitivity, a → c. This is a
transitive functional dependency.
Example:
enrol_no → dept
dept → building_no.
enrol_no → building_no is a valid Transitive functional dependency.
4. Single valued FD
Single-valued dependency occurs when one attribute uniquely determines another
attribute within the same relation
Example
Student_ID → Name is a functional dependency. For a given Student_ID, there's only
one corresponding Name
5. Multivalued functional dependency
A multivalued functional dependency, on the other hand, means that a set of attributes
determines a set of values for another attribute, and these values are mutually independent.
Example
For a dependency A -> B, if for a single value of A, multiple values of B exist, then
the table may have a multi-valued dependency.
Normalization of Relations
The normalization process, as first proposed by Codd (1972a), takes a
relation schema through a series of tests to certify whether it satisfies a certain
normal form.
Initially, Codd proposed three normal forms, which he called first, second,
and third normal form
All these normal forms are based on a single analytical tool: the functional
dependencies among the attributes of a relation
A fourth normal form (4NF) and a fifth normal form (5NF) were proposed,
based on the concepts of multivalued dependencies and join dependencies,
respectively
Normalization of data can be considered a process of analyzing the given
relation schemas based on their FDs and primary keys to achieve the desirable
properties of
(1)minimizing redundancy and
(2) minimizing the insertion, deletion, and update anomalies
Denormalization is the process of storing the join of higher normal form
relations as a base relation, which is in a lower normal form.
Definitions of Keys and Attributes Participating in Keys
Super Key in DBMS is a group of one or more attributes in a table that can
uniquely identify every row in that table. It ensures no two rows have the same
combination of values for those attributes.
Example:
Prime Attributes:
These are the attributes that belong to a candidate key.
They are crucial for uniquely identifying rows in a table.
Examples include Student ID in a Student relation or Employee ID in Employee relation.
Non-Prime Attributes:
These are attributes that are not part of any candidate key.
They can still hold useful information about the data, but they don't uniquely identify
tuples.
Examples include Name, Address, or Phone Number in a Student relation.
First Normal Form:
A relation /table is seid to be in 1NF if it satisfies the following condition
Each column in a table must contain atomic values
i.e No multivalued attributes
Ex:
Example 2:
Second Normal Form
It is based on the concept of full functional dependency
RULE 1: Relation must be in 1NF
RULE 2: All non prime attributes are fully functional dependent on PK
No partial dependency exists (every non-prime attribute (non-key attribute)
must depend on the entire primary key)
2NF 1NF+ No Partial Dependency
Example:
Example
Third Normal Form
Definition. a relation schema R is in 3NF if it satisfies 2NF and no
nonprime attribute of R is transitively dependent on the primary key
• 3NF 2NF+No Transitive Dependency
Boyce-Codd Normal Form
Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF,
but it was found to be stricter than 3NF
Every relation in BCNF is also in 3NF; however, a relation in 3NF is not
necessarily in BCNF
Definition. A relation schema R is in BCNF if whenever a nontrivial functional
dependency X → A holds in R, then X is a superkey of R
every relation in BCNF is also in 3NF; however, a relation in 3NF is not
necessarily in BCNF.
• Ex: XYZ
BCNF3NF+NonTrivial FD
Candidate keys are (student, teacher) and (student, subject).
Two of the highest levels of database normalization are the fourth normal form
(4NF) and the fifth normal form (5NF). Multivalued dependencies are handled by 4NF,
whereas join dependencies are handled by 5NF.
For a dependency A -> B, if for a single value of A, multiple values of B exist, then
the table may have a multi-valued dependency.
Multivalued Dependency and Fourth Normal Form
A relation R is in 4NF if and only if the following conditions are satisfied:
1. It should be in the Boyce-Codd Normal Form (BCNF).
2. The table should not have any Multi-valued Dependency.
4NF BCNF+No multivalued Dependency
Join Dependencies and Fifth Normal Form
Join Dependency arises when the attributes in one relation are dependent on
attributes in another relation, which means certain rows will exist in the table if there is
the same row in another table.
Multiple tables are joined to create a single table where one of the attributes is
common in the sub-tables.
Join dependency on a database is denoted by:
Types of Join Dependency
There are two types of Join Dependencies:
Lossless Join Dependency: It means that whenever the join occurs between the
tables, then no information should be lost, the new table must have all the content in
the original table.
Lossy Join Dependency: In this type of join dependency, data loss may occur at some
point in time which includes the absence of a tuple from the original table or duplicate
tuples within the database.
Importance
Data Integrity:
Query Optimization:
Fifth normal form (project-join normal form)
A Relation is said to be in 5NF if both conditions are satisfied.
1) Relation should be already in 4NF
2) It cannot be further non-loss decomposed (JoindDependency should not be present)
The Fifth Normal Form (5NF) is also known as the Project-Join Normal Form
(PJNF).
5NF gets satisfied when the table is broken down into as many parts as possible
to avoid data redundancy.
5NF 4NF+ No join dependency
newly composed table from R1 and R2 and the original table, an extra tuple is
added that did not exist in the original data, This breaks the second rule of 5NF
i.e. non-loss decomposition.
This type of unwanted tuple is known as Spurious tuple.
Here we will decompose the given table in another relation R3 where it will have
2 columns i.e. subject and teacher.
Re-composed table and the original table, there is no loss of data.
Here all the tables, R1, R2 and R3 had a natural join which resulted in the table R.
After the natural join, the original table is retained as it is. There is no loss of the data.
Given Table R1, R2 and R3 are in the Fifth Normal Form(5NF).