0% found this document useful (0 votes)
41 views15 pages

Bca DBMS Unit3

The document discusses normalization in databases, emphasizing its importance in minimizing redundancy and maintaining data integrity through various normal forms (1NF, 2NF, 3NF, BCNF). It explains functional dependencies, their properties, and the types of anomalies that can arise from improper data organization. Additionally, it outlines the advantages and disadvantages of normalization, as well as the rules governing functional dependencies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views15 pages

Bca DBMS Unit3

The document discusses normalization in databases, emphasizing its importance in minimizing redundancy and maintaining data integrity through various normal forms (1NF, 2NF, 3NF, BCNF). It explains functional dependencies, their properties, and the types of anomalies that can arise from improper data organization. Additionally, it outlines the advantages and disadvantages of normalization, as well as the rules governing functional dependencies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

UNIT-3

NORMALIZATION
AND
FUNCTIONAL
DEPENDENCY

Normalization, Need of Normalization, Anomalies associated


with Normalization, Functional Dependencies and its
Properties, Normal form (1NF, 2NF, 3NF, BCNF).
DBMS UNIT-3 Danish Ahmad Khan

DBMS Normalization:
 Normalization is the process of organizing the data in the database.
 Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate undesirable characteristics like
Insertion, Updation and Deletion Anomalies.
 Normalization divides the largest table into smaller and links them using
relationships.
 The normal form is used to reduce redundancy from the database table.
Need of Normalization:
1) Data Redundancy: Redundant data occurs when the same information is
stored in multiple places within a database. This redundancy wastes storage
space and makes it challenging to maintain data consistency. By
normalizing the data, redundant information is eliminated, and data is
stored in a more efficient and concise manner.
2) Data Integrity: Normalization helps to ensure data integrity by
minimizing inconsistencies and anomalies in the data. Anomalies, such as
update anomalies (inconsistencies due to partial updates), insertion
anomalies (difficulties in inserting new data), and deletion anomalies (loss
of unrelated data when deleting specific information), can be reduced or
eliminated through the normalization process.
3) Query Efficiency: Normalized data models often lead to improved query
efficiency. When data is properly organized into tables, queries can be
written more efficiently, and indexes can be used effectively to retrieve the
required information. This results in faster query execution times and
improved overall performance.
4) Flexibility and Scalability: Normalization provides flexibility and
scalability to a database design. As new requirements arise or data changes
over time, a normalized database structure allows for easier modifications
and adaptations without impacting the entire system. This flexibility makes
it easier to accommodate changes, additions, or enhancements to the
database without requiring significant restructuring.
5) Data Consistency and Accuracy: Normalization helps maintain data
consistency and accuracy by reducing the likelihood of conflicting or
contradictory information. By storing data in a normalized form, updates
or modifications to data need to be made in only one place, ensuring that
the changes are consistent throughout the database.

[2]
DBMS UNIT-3 Danish Ahmad Khan

6) Simplified Data Maintenance: A normalized database structure


simplifies data maintenance tasks. With a well-designed schema, updates,
inserts, and deletions can be performed with minimal effort and without
introducing inconsistencies or errors.
Data Modification Anomaly:
a) Insertion Anomaly:
When one cannot insert a new tuple into a relationship due to lack of data.
b) Updation Anomaly:
When an update of a single data value requires multiple rows of data to be
updated.
c) Deletion Anomaly:
This anomaly refers to the situation where the deletion of data results in
the unintended (not planned) loss of some other important data.
Functional Dependency:
 The functional dependency is a relationship that exists between two
attributes. It typically exists between the primary key and non-key attribute
within a table.
(Primary Key) X → Y (Another Attribute)
 The left side of FD is known as a determinant, the right side of the
production is known as a dependent.
Types of Functional Dependency:

Functional
Dependency

Non- Trivial
Trivial Functional
Functional
Dependency
Dependency
1) Trivial functional dependency:
 A → B has trivial functional dependency if B is a subset of A.
 The following dependencies are also trivial like:
A → A, B → B
[3]
DBMS UNIT-3 Danish Ahmad Khan

2) Non-trivial functional dependency:


 A → B has a non-trivial functional dependency if B is not a subset
of A.
 When A intersection B is NULL, then A → B is called as complete
non-trivial.
Properties of Functional Dependencies:
1) Determinancy: A functional dependency expresses a determinancy
relationship between two sets of attributes. It states that the value of one
set of attributes determines the value of another set of attributes. In other
words, given the values of a determinant set, the dependent set's values can
be uniquely determined.
2) Uniqueness: Functional dependencies ensure that each combination of
values in the determinant set corresponds to a unique combination of
values in the dependent set. This uniqueness property helps maintain data
integrity and avoids data anomalies.
3) Irreducibility: A functional dependency is considered irreducible or
minimal if removing any attribute from either the determinant or dependent
set would result in a loss of information. Irreducible functional
dependencies are preferred to avoid redundancy and ensure a compact
representation of relationships.
4) Transitivity: Functional dependencies can exhibit transitivity, meaning
that if there is a functional dependency A → B and B → C, then there is
also a functional dependency A → C. This property allows for inferring
additional dependencies based on the existing ones.
5) Closure: The closure of a set of attributes with respect to a set of functional
dependencies represents all the attributes that can be determined using
those dependencies. It is a critical concept in database normalization, as it
helps identify all the functional dependencies that hold within a relation.
6) Preservation: Functional dependencies should be preserved during
operations like data insertion, deletion, and modification to maintain the
integrity of the data. The preservation property ensures that the
dependencies that held before the operation still hold after the operation.
NORMAL FORM:
 Normalization works through a series of stages called Normal Forms.
 The normal form applies to individual relations. The relation is said to be
in particular normal form if it satisfies constraints.

[4]
DBMS UNIT-3 Danish Ahmad Khan

Normal
Condition Decomposition
Form

1NF Eliminate Repeating groups R


2NF Eliminate Partial Functional Dependency R11 R12
3NF Eliminate Transitive Dependency R21 R22 R23
4NF Eliminate Multivalued Dependency R31 R32 R33 R34
5NF Eliminate Join Dependency R41 R42 R43 R44 R45

Normal
Description
Form
1NF A relation is in 1NF if it contains an atomic value.
A relation will be in 2NF if it is in 1NF and all non-key attributes
2NF
are fully functional dependent on the primary key.
A relation will be in 3NF if it is in 2NF and no transition
3NF
dependency exists.
A stronger definition of 3NF is known as Boyce Codd’s Normal
BCNF
Form.
A relation will be in 4NF if it is in BCNF and has no multivalued
4NF
dependency.
A relation will be in 5NF if it is in 4NF and doesn’t contain any
5NF
join dependency.

Advantages:
 Normalization helps to minimize data redundancy.
 Greater overall database organization.
 Data consistency within the database.
 More flexible database design.
Disadvantages:
 We cannot start creating the database before knowing what the user/client
needs.
 The performance degrades when normalizing the relations to higher
normal form i.e.; 4NF and 5NF.
 It is very time consuming and difficult to normalize relations of a higher
degree.

[5]
DBMS UNIT-3 Danish Ahmad Khan

First Normal Form (1NF):


 A relation will be in 1NF if it contains an atomic value.
 It states that an attribute of a table cannot hold multiple values. It must hold
only single valued attribute.
 INF disallows the multi-valued attribute, composite attribute and their
combinations.
ID NAME PHONE STATE
101 A 9876543210 UP
8765432109
102 B UK
7654321098
103 C 9876543211 Gujrat
Decompose into INF:
ID NAME PHONE STATE
101 A 9876543210 UP
102 B 8765432109 UK
102 B 7654321098 UK
103 C 9876543211 Gujrat

Second Normal Form (2NF):


 In the 2NF, relational must be in 1NF.
 All non-key attributes are fully functional dependent on the primary key.
Faculty_Id Subject Age
101 A 29
101 B 29
102 C 35
103 D 31
103 E 31

Decompose in 2NF:
Faculty_Id Age
101 29
102 35
103 31

[6]
DBMS UNIT-3 Danish Ahmad Khan

Faculty_Id Subject
101 A
101 B
102 C
103 D
103 E

Third Normal Form (3NF):


 A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
 3NF is used to reduce the data duplication. It is also used to achieve the
data integrity.
 If there is no transitive dependency for non-prime attributed, then the
relation must be in 3NF.
X (Super Key) → Y (Prime Attribute)
Pin
Id Name State City
(Assume)
101 A 1234 UP Bareilly
102 B 2345 UP Lucknow
103 C 3456 UK Dehradun
104 D 4567 Maharashtra Mumbai
105 E 5678 Delhi New Delhi

Candidate Key: Id
Non-Prime Attribute: Name, Pin, State, City
Super Key: {Id}, {Id, Name}, {Id, Name, Pin} ... and so on …
Here,
State and City depends on Pin
Pin depends on Id.
So, Non-prime attribute (State & City) transitively dependent on Super Key (Id).
It violates the rules of 3NF, we have to move City & State to new table
(decomposition) with Pin as primary key.
Id Name Pin
101 A 1234
102 B 2345
[7]
DBMS UNIT-3 Danish Ahmad Khan

103 C 3456
104 D 4567
105 E 5678

Pin State City


1234 UP Bareilly
2345 UP Lucknow
3456 UK Dehradun
4567 Maharashtra Mumbai
5678 Delhi New Delhi

Boyce Codd Normal Form (BCNF):


 It is an advance version of 3NF.
 It is stricter than 3NF.
 A table is in BCNF if every functional dependency
X → Y, where X is Super Key of the table
 For BCNF, the table should be in 3NF and for every functional dependency
(FD), Left hand Side is super key.
Id Country Dept Dept_Type Dept_No
101 India Developer D1 1
101 India Designer D1 2
102 Australia Stores D2 3
102 Australia Tester D2 4

Candidate Key: {Id, Dept}


Functional Dependency: Id → Country and Dept→ Dept_Type, Dept_No
The table is not in BCNF because neither “Dept” nor “Id” alone are keys.
To convert the above table in BCNF, we decompose it into tables.

Id Country Dept Dept_Type Dept_No

101 India Developer D1 1

102 Australia Designer D1 2

Stores D2 3

[8]
DBMS UNIT-3 Danish Ahmad Khan

Tester D2 4

Dept_Type Dept_No
D1 1
D1 2
D2 3
D2 4

3NF BCNF

For any Functional Dependency, for a


There should be no transitive
relational A→B, A should be a super
dependency.
key of relation.

It is weaker than BCNF. It is comparatively stronger than 3NF.

The functional dependencies are The functional dependencies are


already in 1NF and 2NF. already in 1NF, 2NF and 3NF.

Redundancy is high. Redundancy is low.

It is easy to achieve. It is difficult to achieve.

Lossless decomposition can be Lossless decomposition is hard to


achieved by 3NF. achieve in BCNF.

Inference Rules for Functional Dependencies:


1. Reflexivity Rule:
If α is a set of attributes and β ⊆ α then α→β holds.
2. Augmentation Rule:
If α→β holds and γ is a set of attributes then γα→γβ holds.

[9]
DBMS UNIT-3 Danish Ahmad Khan

3. Transitivity Rule:
If α→β holds and β→γ holds, then α→γ holds.
4. Complementation Rule:
If α→→β holds, then α→→ {R- (α∪β)} holds.
5. Multivalued Augmentation Rule:
If α→→β holds and γ⊆R and δ⊆ γ holds then γα→→ γβ holds.
6. Multivalued Transitivity Rule:
If α→→β holds, then β→→γ holds then α→→ γ-β holds.
7. Replication Rule:
If α→→β holds and γ⊆β and there is a δ such that δ⊆R and γ∩β = 𝟇 and
δ→γ then α→γ holds.
8. Union Rule:
If α→β holds and α→ γ holds then α→βγ holds.
9. Decomposition Rule:
If α→βγ holds, then α→β holds and α→γ holds.
10.Pseudo-Transitivity Rule:
If α→β holds and βγ→ δ holds then γα→δ holds.
Fully Functional Dependency:
A relational schema R and functional dependency X→Y, Y is fully dependent on
X, if there is no Z where Z is proper subset of Y. So,
Z→Y
Partial Functional Dependency:
A relational schema R with functional dependency F defined an attribute of R and
K as candidate key if X is proper subset of K and if X→A then A is partial
functional dependent on K.
Example:

Name Course Marks Contact RollNo Course_Dept

Candidate Key: Name, Course (Prime Attribute)


Marks is FULLY DEPENDENT on Candidate key.
Contact and RollNo are PARTIALLY FUNCTIONAL DEPENDENT on
Candidate Key.

[10]
DBMS UNIT-3 Danish Ahmad Khan

Inclusion Dependency:
 Multivalued dependency and join dependency can be used to guide
database design although they both are less common than functional
dependencies.
 Inclusion dependencies are quite common. They typically show little
influence on designing of the database.
 The inclusion dependency is a statement in which some columns of a
relation are contained in other columns.
 Inclusion dependencies are defined in order to formalize two types of
interrelation constraints:
a) The foreign key (or referential integrity) constraint cannot be
specified as a functional or multivalued dependency because it
relates attributes across relations.
b) The constraint between two relations that represent a class/subclass
relationship also has no formal definition in terms of the functional,
multivalued, and join dependencies.
Lossless Decomposition:
A decomposition {R1, R2…. Rn} of a relation R is called a lossless decomposition
for R if natural join of R1, R2…Rn produces exactly the relation R.
Condition to show that decompositions are lossless using FD set:
1) Union of attributes of R1 and R2 must be equal to attribute of R. Each
attribute of R must be either in R1 or in R2.
Att(R1) ∪ Att(R2) = Att(R)
2) Intersection of attribute of R1 and R2 must not be NULL.
Att(R1) ∩ Att(R2) ≠ 𝟇
3) Common attribute must be a key for at least one relation.
Att(R1) ∩ Att(R2) → Att(R1)
Att(R1) ∩ Att(R2) → Att(R2)
Canonical Cover:
A canonical cover of a set of functional dependencies F is a simplified set of
functional dependencies that has the same closure as the original set F.
Question- Compute canonical cover for the following:
R = (A, B, C)
F = {A→BC, B→C, A→B, AB→C}

[11]
DBMS UNIT-3 Danish Ahmad Khan

Solution:
In given data, two functional dependencies with same set of attributes:
(A→BC and A→B)
Then, F = {A→BC, B→C, AB→C}
Here, AB→C is Extraneous Attribute. So, AB→C removed…
Then, F = {A→BC, B→C}
Here, A→BC is Extraneous Attribute. So, B→C removed…
Then, F = {A→BC}
By transitivity Rule , F = {A→B, B→C}
Minimal Cover:
A minimal cover of a set of functional dependencies F is a minimal set of
functional dependencies Fmin that is equivalent to F.
Algorithm to find minimal cover F for set of Functional Dependency E:
1) Set F: = E.
2) Replace each functional dependency
X → {A1, A2, ..., An}
in F by the n functional dependencies
X →A1, X →A2, ..., X → An.
3) For each functional dependency X →A in F for each attribute B that is an
element of X
if {{F – {X →A}} ∪ {(X – {B}) →A}} is equivalent to F,
then replace X → A with (X – {B}) → A in F.
4) For each remaining functional dependency X → A in F
if {F – {X → A}} is equivalent to F,
then remove X → A from F.
Multivalued Dependency (MVD):
 This dependency occurs when two or more independent multivalued facts
about the same attribute occur within the same relation.
 Multivalued dependency is denoted by
X→→Y
where X and Y are both subset of R.
 A MVD consist of atleast two attributes that are dependent on the third
attribute i.e., it always required atleast three attributes.

[12]
DBMS UNIT-3 Danish Ahmad Khan

 A MVD X→Y in relation R is called a trivial if:


a) X is a subset of Y
b) X ∪ Y = R
Mobile_Model Manufacture Color
M001 2020 Black
M002 2021 White
M002 2021 Black
M003 2022 White
M003 2022 Black
Here, Color and Manufacture dependent on Mobile_Model. So,
Mobile_Model →→ Manufacture
Mobile_Model →→ Color
Join Dependency (JD):
 Join dependency is a further generalization of multivalued dependencies.
 If the join of sub relation R1 and R2 is equal to relation R then we can say
that join dependency exist.
 A join dependency (R1, R2…Rn) specified on relation schema R is a trivial
join dependency if one of the relation schema Ri in join dependency (R1,
R2…Rn) is equal to R.
 This dependency is called trivial because it has the lossless join property
for any relation state r of R and hence doesn’t specify any constraint on R.
Fourth Normal Form (4NF):
 A relation will be in 4NF if it is in BCNF and has no multivalued
dependency.
 For a dependency A→B if for a single value of A, multiple value of B
exists, then the relation will be a multivalued dependency.
Student_Id Subject Hobby
101 Computer Playing Cricket
101 Maths Singing
102 Physics Dancing
103 Chemistry Dancing
104 Biology Playing Hockey

Here, Subject and Hobby are two independent entities. So, no relation between
Subject and Hobby.

[13]
DBMS UNIT-3 Danish Ahmad Khan

Student_Id = 101 contains two subjects (Computer & Maths) and two different
hobbies (Playing Cricket & Singing). So, there is multivalued dependency.
On Decomposing:
Student_Id Course Student_Id Hobby
101 Computer 101 Playing Cricket
101 Maths 101 Singing
102 Physics 102 Dancing
103 Chemistry 103 Dancing
104 Biology 104 Playing Hockey

Fifth Normal Form (5NF):


 A relation is in 5NF if it is in 4NF and not contain any join dependency
and joining should be lossless.
 This form is also known as Project Join Normal Form (PJ/NF).
 5NF is satisfied when all the tables are broken into as many as possible in
order to avoid redundancy.
Subject Faculty Semester
Computer A 1
Computer B 1
Maths B 1
Maths C 2
Physics D 1
Faculty B takes both Computer and Maths Subject for Semester 1 but he/she
doesn’t take Maths Subject for Semester 2. In this case, Combination of all these
field required to identify a valid data.
On Decomposing:
Semester Subject Subject Faculty Semester Faculty
1 Computer Computer A 1 A
1 Maths Computer B 1 B
1 Physics Maths B 1 B
2 Maths Maths C 2 C
Physics D 1 D

[14]
DBMS UNIT-3 Danish Ahmad Khan

Alternate Approaches to Database Design:


An alternate approach to database design is dangling tuples:
Dangling Tuples:
1) Tuples that “disappear” in computing a join are known as dangling tuples.
a) Let r1(R1), r2(R2), ..., rn (Rn) be a set of relations.

b) A tuple t of relation Ri is a dangling tuple if tuple t is not in the


relation:

πRi (r1 ⋈ r2 ⋈ ... ⋈ rn)

2) The relation r1 ⋈ r2 ⋈ ... ⋈ rn is called a universal relation since it involves


all the attributes in the “universe” defined by

R1 ∪ R2 ∪ ... ∪ Rn

3) If dangling tuples are allowed in the database, instead of decomposing a


universal relation, we may prefer to synthesize a collection of normal form
schemas from a given set of attributes.

***

[15]

You might also like