0% found this document useful (0 votes)
114 views9 pages

Understanding Database Normalization

The document discusses normalization and normal forms in database design. It defines normalization as organizing data to minimize redundancy and anomalies. There are five normal forms from 1NF to 5NF, each addressing different types of dependencies and anomalies. The document provides examples to illustrate each normal form and how to decompose relations to satisfy the normal forms.

Uploaded by

Sachin Kondawar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views9 pages

Understanding Database Normalization

The document discusses normalization and normal forms in database design. It defines normalization as organizing data to minimize redundancy and anomalies. There are five normal forms from 1NF to 5NF, each addressing different types of dependencies and anomalies. The document provides examples to illustrate each normal form and how to decompose relations to satisfy the normal forms.

Uploaded by

Sachin Kondawar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Normalization

o Normalization is the process of organizing the data in the database.


o Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate the undesirable characteristics like Insertion,
Update and Deletion Anomalies.
o Normalization divides the larger table into the smaller table and links them using
relationship.
o The normal form is used to reduce redundancy from the database table.

Types of Normal Forms


There are the four types of normal forms:

Normal Description
Form

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are
fully functional dependent on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency


exists.

4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no
multi-valued dependency.
5NF A relation is in 5NF if it is in 4NF and not contains any join

EMP_ID dependency and joiningEMP_PHONE


EMP_NAME should be lossless. EMP_STATE

14 John 7272826385, UP
9064738238

First Normal
20 Harry
Form (1NF)
8574783832 Bihar

o A relation will be 1NF if it contains an atomic value.


12 Sam 7390372389, Punjab
o It states that an attribute of a table cannot hold
8589830302 multiple values. It must hold only
single-valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and
their combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute


EMP_PHONE.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab


8589830302
The decomposition of the EMPLOYEE table into 1NF has been shown below:

 Example 1 – Relation STUDENT in table 1 is not in 1NF because of


multi-valued attribute STUD_PHONE. Its decomposition into 1NF has
been shown in table 2.

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

 Example 2 –

 ID Name Courses
 ------------------
 1 A c1, c2
 2 E c3
 3 M C2, c3
In the above table Course is a multi valued attribute so it is not in 1NF.
Below Table is in 1NF as there is no multi valued attribute
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3

2. Second Normal Form –

To be in second normal form, a relation must be in first normal form and


relation must not contain any partial dependency. A relation is in 2NF if it
has No Partial Dependency, i.e., no non-prime attribute (attributes which
are not part of any candidate key) is dependent on any proper subset of any
candidate key of the table.
Partial Dependency – If the proper subset of candidate key determines
non-prime attribute, it is called partial dependency.
 Example 1 – Consider table-3 as following below.
 STUD_NO COURSE_NO COURSE_FEE
 1 C1 1000
 2 C2 1500
 1 C4 2000
 4 C3 1000
 4 C1 1000
 2 C5 2000
{Note that, there are many courses having the same course fee. }
Here,
COURSE_FEE cannot alone decide the value of COURSE_NO or
STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of
COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of
STUD_NO;
Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong to
the one only candidate key {STUD_NO, COURSE_NO} ;
But, COURSE_NO -> COURSE_FEE , i.e., COURSE_FEE is dependent
on COURSE_NO, which is a proper subset of the candidate key. Non-
prime attribute COURSE_FEE is dependent on a proper subset of the
candidate key, which is a partial dependency and so this relation is not in
2NF.
To convert the above relation to 2NF,
we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO
COURSE_FEE
1 C1 C1
1000
2 C2 C2
1500
1 C4 C3
1000
4 C3 C4
2000
4 C1 C5
2000

 NOTE: 2NF tries to reduce the redundant data getting stored in memory.
For instance, if there are 100 students taking C1 course, we dont need to
store its Fee as 1000 for all the 100 records, instead once we can store it
in the second table as the course fee for C1 is 1000.
 Example 2 – Consider following functional dependencies in relation R
(A, B , C, D )
 AB -> C [A and B together determine C]
BC -> D [B and C together determine D]
In the above relation, AB is the only candidate key and there is no partial
dependency, i.e., any proper subset of AB doesn’t determine any non-
prime attribute.

3. Third Normal Form –

A relation is in third normal form, if there is no transitive dependency for


non-prime attributes as well as it is in second normal form.
A relation is in 3NF if at least one of the following condition holds in
every non-trivial function dependency X –> Y
1. X is a super key.
2. Y is a prime attribute (each element of Y is part of some candidate key).

Transitive dependency – If A->B and B->C are two FDs then A->C is
called transitive dependency.
 Example 1 – In relation STUDENT given in Table 4,
FD set: {STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE,
STUD_STATE -> STUD_COUNTRY, STUD_NO -> STUD_AGE}
Candidate Key: {STUD_NO}
For this relation in table 4, STUD_NO -> STUD_STATE and
STUD_STATE -> STUD_COUNTRY are true. So
STUD_COUNTRY is transitively dependent on STUD_NO. It
violates the third normal form. To convert it in third normal form, we
will decompose the relation STUDENT (STUD_NO, STUD_NAME,
STUD_PHONE, STUD_STATE, STUD_COUNTRY_STUD_AGE)
as:
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE,
STUD_STATE, STUD_AGE)
STATE_COUNTRY (STATE, COUNTRY)
 Example 2 – Consider relation R(A, B, C, D, E)
A -> BC,
CD -> E,
B -> D,
E -> A
All possible candidate keys in above relation are {A, E, CD, BC} All
attribute are on right sides of all functional dependencies are
prime.

4. Boyce-Codd Normal Form (BCNF) –


A relation R is in BCNF if R is in Third Normal Form and for every FD,
LHS is super key. A relation is in BCNF iff in every non-trivial
functional dependency X –> Y, X is a super key.
 Example 1 – Find the highest normal form of a relation
R(A,B,C,D,E) with FD set as {BC->D, AC->BE, B->E}
Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of its
subset can determine all attribute of relation, So AC will be
candidate key. A or C can’t be derived from any other
attribute of the relation, so there will be only 1 candidate key
{AC}.
Step 2. Prime attributes are those attribute which are part of
candidate key {A, C} in this example and others will be non-
prime {B, D, E} in this example.
Step 3. The relation R is in 1st normal form as a relational
DBMS does not allow multi-valued or composite attribute.
The relation is in 2nd normal form because BC->D is in 2nd
normal form (BC is not a proper subset of candidate key AC)
and AC->BE is in 2nd normal form (AC is candidate key) and
B->E is in 2nd normal form (B is not a proper subset of
candidate key AC).
The relation is not in 3rd normal form because in BC->D
(neither BC is a super key nor D is a prime attribute) and in
B->E (neither B is a super key nor E is a prime attribute) but
to satisfy 3rd normal for, either LHS of an FD should be
super key or RHS should be prime attribute.
So the highest normal form of relation will be 2nd Normal
form.
 Example 2 –For example consider relation R(A, B, C)
A -> BC,
B ->
A and B both are super keys so above relation is in BCNF.
Key Points –
1. BCNF is free from redundancy.
2. If a relation is in BCNF, then 3NF is also also satisfied.
3. If all attributes of relation are prime attribute, then the relation is
always in 3NF.
4. A relation in a Relational Database is always and at least in 1NF
form.
5. Every Binary Relation ( a Relation with only 2 attributes ) is always
in BCNF.
6. If a Relation has only singleton candidate keys( i.e. every
candidate key consists of only 1 attribute), then the Relation is
always in 2NF( because no Partial functional dependency
possible).
7. Sometimes going for BCNF form may not preserve functional
dependency. In that case go for BCNF only if the lost FD(s) is not
required, else normalize till 3NF only.
8. There are many more Normal forms that exist after BCNF, like 4NF
and more. But in real world database systems it’s generally not
required to go beyond BCNF.

Exercise 1: Find the highest normal form in R (A, B, C, D, E) under


following functional dependencies.
ABC --> D
CD --> AE
Important Points for solving above type of question.
1) It is always a good idea to start checking from BCNF, then 3 NF
and so on.
2) If any functional dependency satisfied a normal form then there is
no need to check for lower normal form. For example, ABC –> D is in
BCNF (Note that ABC is a superkey), so no need to check this
dependency for lower normal forms.
Candidate keys in the given relation are {ABC, BCD}
BCNF: ABC -> D is in BCNF. Let us check CD -> AE, CD is not a
super key so this dependency is not in BCNF. So, R is not in BCNF.
3NF: ABC -> D we don’t need to check for this dependency as it
already satisfied BCNF. Let us consider CD -> AE. Since E is not a
prime attribute, so the relation is not in 3NF.
2NF: In 2NF, we need to check for partial dependency. CD which is a
proper subset of a candidate key and it determine E, which is non-
prime attribute. So, given relation is also not in 2 NF. So, the highest
normal form is 1 NF.

You might also like