0% found this document useful (0 votes)
8 views66 pages

Unit 4 Normalization

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views66 pages

Unit 4 Normalization

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

UNIT-IV

Schema Refinement(Normalization)

Syllabus:
Purpose of Normalization or schema Refinement, Concept of
Functional dependency, Normal forms based on functional dependency
Lossless join and dependency preserving decomposition, (1NF, 2NF
and 3NF ), Concept of Surrogate key, Boyce Codd normal form(BCNF).

1
Normalization
• Normalization is a process of decomposing relations
without anomalies to produce smaller and well structures
relations.
• The main aim of normalization is to eliminate
redundancy and inconsistency.

Schema refinement involves improving a database


schema to eliminate redundancy, reduce anomalies
(insertion, update, deletion), and ensure data consistency
by organizing data into tables and establishing
relationships, often through normalization.

2
Purpose of Normalization
• Eliminate anomalies(insert, update and delete)
• Avoid duplication.
• Increase flexibility and stability.

Advantages:
• Improved data consistency
• Simplified database design
• Improved query performance
• Easier database maintenance

3
TYPES OF NORMALFORMS
• FIRST NORMAL FORM(1NF)
• SECOND NORMAL FORM(2NF)
• THIRD NORMAL FORM(3NF)
• BCNF(BOYCE CODD NORMAL FORM)

• A relation in BCNF is
also in 3NF , a relation in
3NF is also in 2NF and a
relation in 2NF is also in
1NF.
• A relation in BCNF is
considered fully
normalized.
4
Types of Normal Forms
First Normal Form (1NF): This is the most basic level of
normalization. In 1NF, each table cell should contain only a
single value, and each column should have a unique name.
The first normal form helps to eliminate duplicate data and
simplify queries.

Second Normal Form (2NF): 2NF eliminates redundant data


by requiring that each non-key attribute be dependent on the
primary key. This means that each column should be directly
related to the primary key, and not to other columns.

Third Normal Form (3NF): 3NF builds on 2NF by requiring


that all non-key attributes are independent of each other. This
means that each column should be directly related to the
primary key, and not to any other columns in the same table. 5
Boyce-Codd Normal Form (BCNF): BCNF is a stricter
form of 3NF that ensures that each determinant in a table is
a candidate key. In other words, BCNF ensures that each
non-key attribute is dependent only on the candidate key.

Fourth Normal Form (4NF): 4NF is a further refinement of


BCNF that ensures that a table does not contain any multi-
valued dependencies.

Fifth Normal Form (5NF): 5NF is the highest level of


normalization and involves decomposing a table into
smaller tables to remove data redundancy and improve data
integrity.

6
Functional dependency

What is functional dependency ?

• Functional dependency is a relationship that exists when one


attribute uniquely determines another attribute.

• In a relation R, X and Y are attributes. Attribute Y is


functionally dependent on attribute X if each value of X
determines EXACTLY ONE value of Y. This is represented
as: X -> Y this type of relation is called functional
dependency.

7
Functional dependency

Example:

• Reports(Student#, Course#, Course Name, Iname,


Room#, Marks, Grade)
• In this relation, {Student#, Course#} together can
determine exactly one value of Marks.
• This can be symbolically represented as
{Student#, Course#} Marks

8
Functional dependency

9
Functional dependency cont…

Types of functional dependencies

Functional dependencies are classified in to


1. Full functional dependency.
2. Partial functional dependency.
3. Transitive functional dependency.
4. Trival functional dependency.

10
Functional dependency cont…

1. Full Functional Dependency:-

In a given relation R, X and Y are attributes. Y is fully


functionally dependent on attribute X only if it is not
functionally dependent on sub-set of X. However X may be
composite in nature.
Ex:
 x,y,z are attributes then x,y attributes are pair then xyz.
 z is does not dependent on subset of xy.

11
Example:
 In the above example, Marks is fully functionally
dependent on {Student#, Course#} and not on sub-set of
{Student#, Course#}.
 This means Marks cannot be determined either by
Student# or by Course#.
 It can be determined using Student# and Course#
together.
 Hence, Marks is fully functionally dependent on
{Student#, Course#}.

12
Functional dependency cont…
2.Partial Functional Dependency:
In a given relation R, X and Y are attributes. Attribute Y is
partially dependent on the attribute X only if it is dependent on
subset of attribute X. However X may be composite in nature.
Ex:
 x,y,z are attributes then x,y attributes are pair then xyz.
 z is dependent on subset of xy.
( xz yz)
Example:
In the above example, CourseName, IName, Room# are
partially dependent on {Student#, Course#} because Course# alone
determines the CourseName, IName, Room#.
13
Functional dependency cont…
3.Transitive Functional Dependency:-

In a given relation R, X and Y, Z are attributes. Attribute x


determines exact one value of y then y determines exact one
value of z then z is functionally dependent on attribute x.

Ex:
 x,y,z are attributes then x,y attributes are
xy
yz
xz
 z is transitively functional dependent on x.
14
Transitive Functional Dependency
Example

15
Functional dependency cont…
4.Trival Functional Dependency:

In a given relation R, X and Y are attributes. XY attribute


determines a set and derives their own subset values.
Ex: xyx (or) xyy (or)xy  xy
Example:
{Student#, Course#}  Student#
{Student#, Course#}  Course#

16
Attribute closure

 The set of all those attributes which can be


functionally determined from an attribute set is
called as closure of that attribute set.
 Closure of an attribute set {X} is denoted as {x}+.

Steps to find closure of an attribute set:

1. Add the attributes contained in the attribute set for


which closure is being calculated to the result set.

2. Recursively add the attributes to the result set


which can be functionally determined from the
attributes already contained in the result set.

17
Attribute closure

Example:Consider a relation R(A,B,C,D,E,F,G) with the


functional dependencies
A->BC
BC->DE
D->F
CF->G

Finding the Closure of Attribute A:


A+ = {A}
= {ABC} ∵A->BC
= {ABCDE} ∵BC->DE
= {ABCDEF} ∵D->F
= {ABCDEFG} ∵CF->G

18
Attribute closure

Example:Consider a relation R(A,B,C,D,E,F,G) with the


functional dependencies
A->BC
BC->DE
D->F
CF->G

Finding Closure of Attribute set BC:


BC+ ={BC}
={BCDE} ∵BC->DE
={BCDEF} ∵D->F
={BCDEFG} ∵CF->G

19
Normalization
 Normalization is a systematic approach of decomposing
tables to eliminate data redundancy and undesirable
characteristics like Insertion, Update and Deletion anomalies

 Include two properties after decomposition of a relation.

 Lossless join property- which guarantees that the


generation of unwanted tuples will not occur.
 Dependency preservation property - This ensures that
each functional dependency is represented in some individual
relation resulting after decomposition.

20
Normalization
 Candidate key: It is a minimal set of attributes (columns) that can
uniquely identify each row (tuple) in a table, and it's a potential
candidate for becoming the primary key.

 Key or Prime attribute - An attribute of relation schema R is


called a prime attribute of R if it is a member of some
candidate key of R.

 Non (key)prime attribute - An attribute is called nonprime if it


is not a prime attribute—that is, if it is not a member of any
candidate key.

21
First Normal Form (1NF)
Definition of 1 NF: A relation R is said to be in the first
normal form if and only if all the attributes of the relation R
are atomic in nature.

Example:
• Consider the Department table given in figure. It is not in
1NF because one of its attributes Dlocations is non-
atomic as it contains more than one value in 1st row.
• To make it 1NF compliant, create a separate row for
each value of Dlocations of 1st row as shown in figure.

22
23
Second Normal Form (2NF)

Definition of 2NF: A Relation is said to be in Second


Normal Form (2NF) if and only if:
• It is in First normal form (1NF)
• No partial dependency exists between non-key attributes
and key attributes

24
Second Normal Form (2NF)

• As an example, consider the EMP_PROJ schema shown


below;
• For this table, the key is {Ssn, Pnumber}
• The functional dependencies are as follows;

It is clear from these functional dependencies that the


table has partial dependencies
(Ssn->Ename, Pnumber->Pname, Pnumber-
>Plocation) 25
Second Normal Form (2NF)

26
Example1:Consider a relation R(A,B,C,D,E,F) with the functional
dependencies
A->BCDEF
BC->ADEF
B->F
D->E Find this relation is in 2NF or not
Solution:
• Candidate keys are {A,BC}
• So, prime attributes are {A,B,C}
• Non-prime attributes are {D,E,F}
• In the third functional dependency (B->F) partial
dependency (non-prime attribute can’t be derived
from subset of candidate key) exists.
• Therefore, the given relation R is not in 2NF So,
the relation R can be divided into two relations
R1(A,B,C,D,E) and R2(B,F).
27
Finding candidate keys:

Closure of A (A)+ =ABCDEF


• (A)+ determines all the attributes of Relation, {A} is
candidate key or Superkey.
Closure of BC (BC)+ =BCADEF
• (BC)+ determines all the attributes of Relation, {BC} is
candidate key or Superkey.
Closure of B (B)+ =BF
(B)+ determines only few attributes of Relation, so {B} is
not candidate key or Superkey.

28
Third Normal Form (3NF)
Third Normal Form (3NF):

• A Relation is said to be in Third Normal


Form (3NF) if and only if:
• It is in Second Normal Form (2NF)
• No transitive dependency exists between
non-key attributes and key attributes
Third Normal Form (3NF):
• Example: Consider a relation student
(rollno, game, feestructure)
Rollno Game Feestructure
1 Basketball 500
2 Basketball 500
3 Basketball 500
4 Cricket 600
5 Cricket 600
6 Cricket 600
7 Tennis 400
Third Normal Form (3NF):
Functional Dependencies are:
{rollno -> game,
rollno -> feestructure,
game -> fee}

• Rollno+= {rollno, game, feestructure}


• rollno is primary key
Third Normal Form (3NF):

• The above student table is in 1NF because


there are no multivalue attributes.
• Student table is also in 2NF because all
non-key attributes are fully functional
dependent on the primary key (rollno).
Third Normal Form (3NF):

• But the table is not in 3NF because there


is transitive dependency i.e. game->
feestructure.
• feestructure has transitive/indirect
dependency on rollno via game.
Third Normal Form (3NF):

• Decomposition for 3NF


• In order to satisfy 3 NF rules the student table
should be divided into
• So divide the student table into
R1(game, feestructure) and
R2 (rollno, game),
R3(rollno,fee) smaller tables.
Example for 3NF
• Example: Consider a relation student
(sid,sname,city,pin)
Sid Sname City Pin
1 A MTM 123
2 B GDV 456
3 C GDL 895
4 D ELR 758
5 E MTM 123
Example for 3NF
• For student relation consider ‘sid’ is
primary key.remaining attributes are non-
prime attributes.
• The table consists of atomic values, so the
table is in 1 NF.
• Sid is the only primary key of the table, we
can’t have any proper subset of sid,so
obviously relation is in 2 NF.
Example for 3NF
• For the relation to be in 3NF, there should be no
transitive dependency.
• The FD’s in the table are
Sid--> sname
Sid-->city
Sid-->pincode
Pincodecity
• From the FD’s there exists a traansitive
dependency
• Sidpincode, pincodecity, sidcity
Example for 3NF
 pincode and city both are non prime
attributes so the relation is not in 3 NF.
To convert the relation to 3 NF , Decompose
the reation into
student(sid,sname,pincode)
Postaldetails(pincode,city name)
Example: Consider a relation R(A,B,C,D,E)
with the functional dependencies find this
relation is in 3NF or not.
A->BCDE
BC->ADE
D->E

Solution:
• Finding candidate keys :
Closure of A (A)+ =ABCDEF
(A)+ determines all the attributes of Relation,
{A} is candidate key or Superkey.
40
Closure of BC (BC)+ =BCADEF
(BC)+ determines all the attributes of Relation, {BC} is
candidate key or Superkey.

Closure of D (D)+ =DE


(D)+ determines only few attributes of Relation, so {D} is not
candidate key or Superkey.

41
 Candidate keys are {A,BC}
 So, Prime attributes are {A,B,C}
 Non-prime attributes are {D,E}
 FD1,FD2 satisfies 3NF rule (i.e every non prime attribute
can derived from prime attribute).
 FD3(D->E) not satisfies 3NF rule(L.H.S must be
candidate key or R.H.S must be prime attribute then this
is in 3NF)
 Therefore, the given relation R is not in 3NF So, the
relation R can be divided into two relations R1(A,B,C,D)
and R2(D,E).

42
Boyce-Codd Normal Form (BCNF)

• A relation is said to be in Boyce Codd Normal Form (BCNF)


-> it is in 3NF.
-> if and only if all the determinants are candidate keys.
• BCNF relation is a strong 3NF, but not every 3NF relation is
BCNF.

What is candidate key?


Candidate keys are minimal super keys
What is super key?
A Super Key is a set of one or more attributes that are
taken collectively and can identify all other attributes uniquely.
Boyce-Codd Normal Form
(BCNF):
Boyce-Codd Normal Form
(BCNF):
This table has two candidate keys –
{Student#, Course#} and {EmailID, Course#}.
This table is in 3NF because it has no partial
and transitive dependencies between key
attributes and non-key attributes.
But this table is not in BCNF because all
determinants are not candidate keys.
Boyce-Codd Normal Form
(BCNF):
This can be observed from the following FDs;
{Student#, Course#} --> Marks
{EmailID, Course#} -->Marks
Student# -->EmailID
EmailID-->Student#
– the determinants of first two FDs are
candidate keys
– but the determinants of the last two FDs are
not candidate keys.
– Thus it is violating the BCNF condition.
Boyce-Codd Normal Form
(BCNF):
To make this table into BCNF , we need to
decompose the Result table into two tables;
Another Example
Find this Relation is in BCNF or Not?
Student Teacher Subject

• Consider a relation R with attributes (student, subject,


teacher).
• FD’s are
• (student, Teacher) -> subject
• (student, subject) -> Teacher
• Teacher -> subject
• Candidate keys are (student, teacher) and (student,
subject).
• The above relation is in 3NF [since there is no transitive
dependency].
• The above relation is not in BCNF, because in the FD
(teacher->subject), teacher is not a key.
• Converts this relation into BCNF we decompose the
relation into two relations R1(Teacher, subject) and
R2(student, Teacher).
Example: Consider a relation R(A,B,C,D) with the
functional dependencies
A->BCD
BC->AD
D->B Find the relation is in BCNF or not?

Solution:
Finding candidate keys :
• Closure of A (A)+ =ABCD
• (A)+ determines all the attributes of Relation,
• {A} is candidate key or Superkey.

50
Closure of A (BC)+ =ABCD
(BC)+ determines all the attributes of Relation, {BC} is candidate key or
Superkey.
Closure of D (D)+ =BD
(D)+ determines some attributes of Relation, {D} is candidate key or
Superkey.
• Candidate keys are {A,BC}
• So, prime attributes are {A,B,C}
• Non-prime attributes are {D}
• FD1,FD2 are satisfied BCNF rule( All the determinants are candidate
keys)
• The third functional dependency (D->B) Violates BCNF
condition(because D is not a candidate key).
• Therefore, the given relation R is not in BCNF
• So, the relation R can be divided into two relations R1(A,C,D) and
R2(B,D).

51
Example:
Identify the minimal key for the relational scheme R(A, B,
C, D, E) with functional dependencies F = {A->B, B->C,
AC->D}.
A) A B) AE C) BE D) CE
Closure of A (A)+ =ABCD
(A)+ determines some attributes of Relation, {A} is not candidate key or
Superkey.
Closure of AE (AE)+ =ABCDE
(AE)+ determines all the attributes of Relation, so {AE} is candidate key or
Superkey.
Closure of BE (BE)+ =BCE
(BE)+ determines some attributes of Relation, {BE} is not a candidate key or
Superkey.
Closure of CE (CE)+ =CE
(CE)+ determines some attributes of Relation, {CE} is not a candidate key or
Superkey.

Here candidate keys are {AE} that is the minimal key


52
Example: The best normal form of relation scheme R(A, B,
C, D) along with the set of functional dependencies F =
{AB->C, AB->D, C->A, D->B} is
a) BCNF b)3NF c)2NF d)1NF

Solution:
• In order to find the best normal form
• Find candidate keys initially ,C.K are AB,CD,BC,AD
• A,B,C,D all are prime attributes
• Rule for BCNF:All the determinants are candidate keys.
• Rule for 3NF: Either L.H.S is superkey or R.H.S is prime
attribute
• Rule for 2NF: A part of candidate key is detemining non –
prime attribute then that FD is in Partial dependency .So it is
not in 2NF.
• This Relation is in 3NF

AB->C AB->D C->A D->B


BCNF YES YES NO NO
53
3NF YES YES YES YES
Example:
A relation R={A,B,C,D,E,F,G} is given with
the following set of functional dependencies:
F={AD→E, BE→F, B→C, AF→G}. Which of
the following is a candidate key?

A) A B) AB C) ABC D) ABD

54
Example:
Consider a relational schema R= (A, B, C, D, E, F, G, H) on
which of the following functional dependencies hold: {A->B,
BC->D, E->C, D->A}.
What are the candidates keys for R?
A)AE, BE
B) AEH, BEH, DEH
C) AEH, BEH, BCH
D) AE, BE, DE

55
Example:
The relational schema
student_performance(name, courseno,
rollno, grade) has the following functional
independencies. The highest normal form of
this relation is______.
{name, courseno}->grade
{rollno, courseno}->grade
name->rollno
rollno->name
Ans: 3NF

56
Example: Identify the highest normal form from relation R(
A,B,C,D,E,F,G,H) functional dependencies are ABC-
>DE,E->GH,H->G,G->H,ABCD->EF

BCNF: L.H.S must be candidate key


3NF: L.H.S must be candidate key or R.H.S is primary
attribute.
2NF: subset of candidate key in L.H.S then it is partial
dependency.

By identifying all the above rules this example satisfies


2NF.

57
Relational Decomposition Rules
• When a relation in the relational model is not in appropriate
normal form then the decomposition of a relation is required.
• In a database, it breaks the table into multiple tables.

• If the relation has no proper decomposition, then it may lead


to problems like loss of information.

• Decomposition is used to eliminate some of the problems of


bad design like anomalies, inconsistencies, and redundancy.

• Decomposition allows us to recover original relation after


decomposition.
Types of Decomposition
Lossless Decomposition
• If the information is not lost from the relation that is decomposed,
then the decomposition will be lossless.

• The lossless decomposition guarantees that the join(union) of


relations will result in the same relation as it was decomposed.

• Let us consider a relation X. In case we decompose this relation into


relation X1 and relation X2. This decomposition will be referred
to as a lossless decomposition in case it satisfies these
statements

1. If we union the sub relations X1 and X2, then it should


consist of all the attributes available before the decomposition in the
original relation X.

2. The intersections of X1 and X2 can never be Null. There


must be a common attribute in the sub relation. This common attribute
must consist of some unique data/information.
Here, the common attribute needs to be the super key of the sub relations,
either X1 or X2.
Example:
Dependency Preserving
• In the dependency preservation, at least one decomposed
table must satisfy every dependency.
• If a relation R is decomposed into relation R1 and R2, then the
dependencies of R either must be a part of R1 or R2 or must
be derivable from the combination of functional dependencies
of R1 and R2.
• (F1 ∪ F2 ∪ … ∪ Fm)+ = F+.
• F’ covers F so this relation satisfies dependency preservation
Surrogate Key:
A surrogate key is a system-generated, unique identifier assigned to an entity or
object in a database, typically used when natural keys are not suitable or when a
more stable, unchanging identifier is needed.

Uses of Surrogate Key:

Natural keys are not suitable: When the natural key is complex, prone to
change, or not unique.
Data Warehouse: Surrogate keys are essential for building and maintaining data
warehouses.
Performance Optimization: When performance is critical and simple, unique
identifiers are needed.
Maintaining Data Integrity: When you need to ensure that data is uniquely
identified and that changes to the data do not affect the primary key.

You might also like