0 ratings0% found this document useful (0 votes) 60 views29 pagesRDBMS Normalization
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Normalization is a process of organizing the
data in database to avoid data redundancy,
insertion anomaly, update anomaly &
deletion anomaly. Let’s discuss about
anomalies first then we will discuss normal
forms with examples.
Anomalies in DBMS lar t
There are three types of anomalies that
occur when the database is not normalized.
These are - Insertion, update and deletion
anomaly. Let’s take an example to
understand this.
Example: Suppose a manufacturing
company stores the employee details in a
table named employee that has four
attributes: emp_id for storing employee's id,
emp_name for storing employee’s name,
emp_address for storing employee's address
and emp_dept for storing the department
details in which the employee works. At
some point of time the table looks like this:lemp_idjemp_name |jemp_address jemp_dept
101 = Rick Delhi D001
401 ~~“ Rick Dethi D002
123 |Maggie Agra D890
166 Glenn Chennai D900
166 Glenn Chennai Do04
The above table is not normalized. We will
see the problems that we face when a table
is not normalized.Update anomaly: In the above table we
have two rows for employee Rick as he
belongs to two departments of the
company. If we want to update the address
of Rick then we have to update the same in
two rows or the data will become
inconsistent. If somehow, the correct
address gets updated in one department
but not in other then as per the database,
Rick would be having two different
addresses, which is not correct and would
lead to inconsistent data.
Insert anomaly: Suppose a new employee
joins the company, who is under training
and currently not assigned to any
department then we would not be able to
insert the data into the table if emp_dept
field doesn't allow nulls.
Delete anomaly: Suppose, if at a point of
time the company closes the department
D890 then deleting the rows that are
having emp_dept as D890 would also delete
the information of employee Maggie since
she is assigned only to this department.
To overcome these anomalies we need to
normalize the data. In the next section we
will discuss about normalization.Normalization
Here are the most commonly used normal
forms:
e First normal form(1NF)
¢ Second normal form(2NF)
¢ Third normal form(3NF)
e Boyce & Codd normal form (BCNF)
First normal form (INF)
As per the rule of first normal form, an
attribute (column) of a table cannot hold
multiple values. It should hold only atomic
values.
Example: Suppose a company wants to store
the names and contact details of its
employees. It creates a table that looks like
this:emp_idlemp_namelemp_addressjlemp_mobile
101 [Herschel |New Delhi |8912312390
8812121212
102 Jon Kanpur
9900012222!
103 |Ron Chennai 7778881212)
9990000123
104 Lester Bangalore
8123450987
Two employees (Jon & Lester) are having
two mobile numbers so the company stored
them in the same field as you can see in the
table above.This table is not in INF as the rule says
“each attribute of a table must have atomic
(single) values”, the emp_mobile values for
employees Jon & Lester violates that rule.
To make the table complies with 1NF we
should have the data like this:
lemp_idiemp_name|
101
102
102
103
104
104
Herschel
lon
on
Ron
Lester
Lester
lemp_address
New Delhi
Kanpur
Kanpur
Chennai
Bangalore
Bangalore
lemp_mobile
8912312390
8812121212
9900012222,
7778881212
9990000123)
8123450987Second normal form (2NF)
A table is said to be in 2NF if both the
following conditions hold:
© Table is in NF (First normal form)
e No non-prime attribute is dependent
on the proper subset of any candidate
key of table.
An attribute that is not part of any
candidate key is known as non-prime
attribute.
Example: Suppose a school wants to store
the data of teachers and the subjects they
teach. They create a table that looks like
this: Since a teacher can teach more than
one subjects, the table can have multiple
rows for a same teacher.eacher_id — jsubject teacher ae
4114 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 (Chemistry |40
Candidate Keys: {teacher_id, subject}
Non prime attribute: teacher_age
The table is in 1 NF because each attribute
has atomic values. However, it is not in 2NF
because non prime attribute teacher_age Is
dependent on teacher_id alone which is a
proper subset of candidate key. This violates
the rule for 2NF as the rule says “no non-
prime attribute is dependent on the proper
subset of any candidate key of the table”.
To make the table complies with 2NF we
can break it in two tables Like this:
teacher_details table:eacher_id eacher_age
411 38
222 bs
333 to
teacher_subject table:
eacher_id subject
414 Maths
114 Physics
222 Biology
333 Physics
335 Chemistry
Now the tables comply with Second normal
form (2NF).Third Normal form (3NF)
A table design is said to be in 3NF if both
the following conditions hold:
¢ Table must be in 2NF
Transitive functional dependency of
non-prime attribute on any super key
should be removed.
An attribute that is not part of any
candidate key is known as non-prime
attribute.
In other words 3NF can be explained like
this: A table is in 3NF if it is in 2NF and for
each functional dependency X-> Y at least
one of the following conditions hald:
¢ Xis a super key of table
° Yis a prime attribute of table
An attribute that is a part of one of the
candidate keys is known as prime attribute.Example: Suppose a company wants to store
the complete address of each employee,
they create a table named employee_details
that looks like this:
emp_idemp.namelemp zemp sateemp cise
1001 Jon p200s UP hare Dz
1002 [Ajeet b2008 N {enna IM-
1006 jLora 282007 |TN [renna Ur
1101 [Lilly 292008 |UK Pauri [Br
1201 [Steve p22999 be Gwalior |Re
Super keys: {emp_id}, {emp_id, emp_name},
{emp_id, emp_name, emp_zip}...so on
Candidate Keys: {emp_id}
Non-prime attributes: all attributes except
emp_id are non-prime as they are not part
of any candidate keys.Here, emp_state, emp_city & emp_district
dependent on emp_zip. And, emp_zip is
dependent on emp_id that makes non-prime
attributes (emp_state, emp_city &
emp_district) transitively dependent on
super key (emp_id). This violates the rule of
3NF.
To make this table complies with 3NF we
have to break the table into two tables to
remove the transitive dependency:
employee table:
mp_id lemp_name lemp_zip
1001 ohn 282005
14002 Ajeet 222008
1006 Lora ps2007
1104 Lilly p200e
14201 Steve 229991001 ohn 282005
1002 jeet 222008
1006 Lora 282007
1101 Ly 1292008
1201 Steve ba2558
employee_zip table:
lemp_zip jemp_state Jemp_city jemp_district
1282005 vp gra Dayal Bagh
222008 i Chennai |M-City
1282007 ry Chennai |Urrapakkam
292008 |UK Pauri Bhagwan
222999 IMP Gwalior |Ratan
|
|Boyce Codd normal form
(BCNF)
It is an advance version of 3NF that’s why it
is also referred as 3.5NF. BCNF is stricter
than 3NF.A table complies with BCNF if it is
in 3NF and for every functional dependency
X->Y,X should be the super key of the table.
Example: Suppose there is a company
wherein employees work in more than one
department. They store the data like this:
Erp_loemp,ratinalifemp cert dept_typeldept_no_of_em
Production
1001 Austrian land [D001 1200
planning
1001 Austrian stores [D001 250
design and
1002 [American echnical [D134 LOO
support
Purchasing
[1002 jAmerican [D134 600
department
Functional dependencies in the table above:
emp_id-> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}Candidate key: {emp_id, emp_dept}
The table is not in BCNF as neither emp_id
nor emp_dept alone are keys.
To make the table comply with BCNF we can
break the table in three tables Like this:
emp_nationality table:
jemp_id lemp_nationality
11001 ‘Austrian
11002 merican
emp_dept table:
lemp_dept \dept_type|dept_no_of_emp
Production and
IDOO1 200
planning
stores IDOO1 250
\design and
[D134 1100
[technical support
Purchasing
ID134 (600
(departmentemp_dept_mapping table:
lemp_id jemp_dept
10014 Production and planning
1001 stores
1002 = |design and technical support
1002 Purchasing department
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the
functional dependencies left side part is a
key.Fourth normal form (4NF)
o Arelation will be in 4NF if it is in Boyce
Codd normal form and has no multi-
valued dependency.
o Fora dependency A - B, if for a single
value of A, multiple values of B exists,
then the relation will be a multi-vabued
dependency.
Example
STUDENTSTU_ID COURSE HOBBY
| 21 | Computer Dancing |
| 21 | Math Singing |
| 34 “Chemisty Dancing |
74 | iojoay Wl eteket |
59 | Physics Hockey |
The given STUDENT table is in 3NF, but the
COURSE and HOBBY are two independent
entity. Hence, there is no relationship between
COURSE and HOBBY.
In the STUDENT relation, a student with
STULID, 21 contains two courses, Computer
and Math and two hobbies, Dancing and
Singing. So there is a Multi-valued dependency
on STULID, which leads to unnecessary
repetition of data.So to make the above table into 4NF, we can
decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 | Computer |
21 | Math : - 7
34 7 | Chemistry
74 “Biology
Sys |
STUDENT_HOBBY21 Dancing
21 Singing
34 Dancing
74 Cricket
59 HockeyFifth normal form (SNF)
© Arelation is in 5NF if it is in 4NF and not
contains any join dependency and
joining should be lossless.
o 5NF is satisfied when all the tables are
broken into as many tables as possible
in order to avoid redundancy.
© SNF is also known as Project-join
normal form (PJ/NF).
Example
SUBJECT LECTURER SEMESTER
|
Math | Akash Semester 2
Computer | Anshika | Semester 1 |
| Computer | John Semester 1 |
{ Math John | Semester 1 |
|In the above table, John takes both Computer
and Math class for Semester 1 but he doesn't
take Math class for Semester 2. In this case,
combination of all these fields required to
identify a valid data.
Suppose we add a new Semester as Semester
3 but do not know about the subject and who
will be taking that subject so we leave Lecturer
and Subject as NULL. But all three columns
together acts as a primary key, so we can't
leave other two columns blank.
So to make the above table into 5NF, we can
decompose it into three relations P1, P2 & P3:P1
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
Computer Anshika
Computer John
Math John
Math Akash
Chemistry PraveenP3
SEM LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 PraveenFunctional dependency in
DBMS
BY CHAITANYA SINGH | FILED UNDER: DBMS.
The attributes of a table is said to be
dependent on each other when an attribute
of a table uniquely identifies another
attribute of the same table.
For example: Suppose we have a student
table with attributes: Stu_ld, Stu_Name,
Stu_Age. Here Stu_Id attribute uniquely
identifies the Stu_Name attribute of student
table because if we know the student id we
can tell the student name associated with it.
This is known as functional dependency and
can be written as Stu_Id->Stu_Name or in
words we can say Stu_Name is functionally
dependent on Stu_lId.
Formally:
If column A of a table uniquely identifies the
column B of same table then it can
represented as A->B (Attribute B is
__ functionally dependent on attribute A)Multivalued dependency in
DBMS
BY CHAITANYA SINGH | FILED UNDER: DBMS.
Multivalued dependency occurs when there
are more than one independent multivalued
attributes in a table.
For example: Consider a bike manufacture
company, which produces two colors (Black
and white) in each model every year.
bike_model manuf_year color
M1001 2007 Black
M1001 2007 Red
M2012 2008 Black
M2012 2008 Red
M2222 2009 Black
M2222 2009 RedM1001 2007 Black
M1001 2007 Red
bou2 boos Black
Maou2 boos Red
Moa poos Black
Moa22 boos Red
Here columns manuf_year and color are
independent of each other and dependent
on bike_model. In this case these two
columns are said to be multivalued
dependent on bike_model. These
dependencies can be represented like this:
bike_model ->> manuf_year
bike_model ->> colorSr.No.
Key
Normalization
Implementation Normalization is used to remove redundant data from
Focus
Number of
Tables
Memory
‘consumption
Data integrity
Where to use
the database and to store non-redundant and consistent
data into it
Normalization mainly focuses on clearing the database
from unused dota and to reduce the data redundancy
‘and inconsistency.
During Normalization as data is reduced so a number of
tables are deleted from the database hence tables are
lesser in number.
Normalization uses optimized memory and hence faster
inperformance.
Normalization maintains data integrity ie. any addition
‘or deletion of data from the table will not create any
mismatch in the relationship ofthe tables.
Normalization is generally used where number of
insert/update/delete operations are performed and joins
of those tables are not expensive.
Denormalization
Denormalization is used to combine multiple table data
into one so that it can be queried quickly.
Denormalization on the other hand facus on to achieve
the faster execution of the queries through introducing
redundancy.
‘On another hand during Denormalization data is
integrated into the same database and hence ¢ number of
tables to store shat data increases in number.
(On the other hand, Denormalization introduces some sort
of wastage of memory.
Denormalization does not maintain any data integrity.
On the other hand Denormalization is used where joins
‘are expensive and frequent query is executed on the
tables.Definition - What does Denormalization
mean? n d
Denormalization is a strategy that database
managers use to increase the performance of a
database infrastructure. It involves adding
redundant data to a normalized database to
reduce certain types of problems with database
queries that combine data from various tables
into a single table. The definition of
denormalization is dependent on the definition
of normalization, which is defined as the
process of organizing a database into tables
correctly to promote a given use.