E - Notes DBMS Unit3
E - Notes DBMS Unit3
VANIYAMBADI
PG and Department of Computer Applications
I MCA – Semester – I
E-Notes (Study Material)
UNIT-III: Function Dependencies & Normalization for Relational Database-Informal design guidelines for
relation schemas, functional Dependencies, Normal forms based on primary keys(1NF,2NF,3NF &
BCNF),Lossless join & dependency preserving decomposition, multivalued dependencies, join
dependencies(4NF & 5NF), Denormalization.
Learning Objectives:
[Link] main objective of this course is to enable students to the fundamental concepts of
database analysis and design.
2. To recognize the importance of database analysis and design in the implementation of
any Database application and to understand the process of drawing the ER-Diagrams.
[Link] also gives the knowledge of the roles of transaction processing and concurrency
control.
Course Outcome:
[Link] the basic concepts of database management systems
[Link] SQL to find solutions to a broad range of queries
3. Apply normalization techniques to improve database design
[Link] a given database application scenario to use ER model for conceptual design
of the database
Overview:
Function Dependencies & Normalization for Relational Database-
Informal design guidelines for relation schemas,
functional Dependencies,
Normal forms based on primary keys(1NF,2NF,3NF & BCNF),
Lossless join & dependency preserving decomposition,
multivalued dependencies,
join dependencies (4NF & 5NF),
Denormalization.
[Link] Dependencies & Normalization for Relational Database:
From the above table we can conclude some valid and invalid
For example,
44 Xyz 18
Here, {roll_no, name} →name is a trivial functional dependency, since the dependent name
is a subset of determinant set {roll_no, name}
Similarly, roll_no→
roll_no is also an example of trivial functional dependency.
In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant.
roll_ name ag
no e
42 Abc 1
7
43 Pqr 1
8
44 Xyz 1
8
For example,
Here, roll_no→name is a non-trivial functional dependency,
Additional Resources:
[Link]
Practice Questions:
[Link] Function Dependency.
2. What are the problems caused by Redundancy? Explain about Normalization and need for
normalization.
3. Define Multivalued Functional Dependencies
4. Discuss about different functional dependencies
5. Define Trivial Functional Dependencies
6. Define non-Trivial Functional Dependencies
[Link] design guidelines for relation schemas, functional Dependencies
[Link] TO SCHEMA REFINEMENT
The Schema Refinement refers to refine the schema by using some technique. The best technique
of schema refinement is decomposition.
Normalization means “split the tables into small tables which will contain less number of attributes in
such a way that table design must not contain any problem of inserting, deleting, updating anomalies and
guarantees no redundancy”.
Normalization or Schema Refinement is a technique of organizing the data in the database. It is a
systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics
like Insertion, Update and Deletion Anomalies.
Redundancy: refers to repetition of same data or duplicate copies of same data stored in different
locations.
Anomalies: Anomalies refers to the problems occurred after poorly planned and normalized
databases where all the data is stored in one table which is sometimes called a flat file database.
Anomalies refers to the problems occurred after poorly planned and unnormalized databases
where all the data is stored in one table which is sometimes called a flat file database. Let us consider
such type of schema
Here all the data is stored in a single table which causes redundancy of data or say anomalies
as SID and Sname are repeated once for same CID . Let us discuss anomalies one by one.
Due to redundancy of data we may get the following problems, those are-
1. insertion anomalies : It may not be possible to store some information unless some other information
is stored as well.
2. redundant storage: some information is stored repeatedly
3. update anomalies: If one copy of redundant data is updated, then inconsistency is created unless
all redundant copies of data are updated.
4. deletion anomalies: It may not be possible to delete some information without losing some other
information as well.
Problem in updation / updation anomaly – If there is updation in the fee from 5000 to 7000, then we
have to update FEE column in all the rows, else data will become inconsistent.
Insertion Anomaly and Deletion Anomaly- These anomalies exist only due to redundancy, otherwise
they do not exist.
Insertion Anomalies: New course is introduced C4, But no student is there who is having C4 subject.
Because of insertion of some data, It is forced to insert some other dummy data.
Deletion Anomaly:
Deletion of S3 student cause the deletion of course. Because of deletion of some data forced to delete
some other useful data.
Solutions To Anomalies : Decomposition of Tables – Schema Refinement, shown below.
Purpose of Normalization:
Advantages of Normalization:
1. Greater overall database organization will be gained.
2. The amount of unnecessary redundant data reduced.
3. Data integrity is easily maintained within the database.
4. The database & application design processes are much for flexible.
5. Security is easier to maintain or manage.
Disadvantages of Normalization:
1. The disadvantage of normalization is that it produces a lot of tables with a relatively small
number of columns. These columns then have to be joined using their primary/foreign key
relationship.
2. This has two disadvantages.
Performance: all the joins required to merge data slow processing & place additional
stress on your hardware.
Complex queries: developers have to code complex queries in order to merge data from
different tables.
Concept of Functional Dependency:
Functional Dependencies are fundamental to the process of Normalization i.e., Functional
Dependency plays key role in differentiating good database design from bad database designs. A
functional dependency is a “type of constraint that is a generalization of the notation of the key”.
Functional Dependency describes the relationship between attributes (columns) in a table. Functional dependency
is represented by an arrow sign (→).
In other words, a dependency FD: “X → Y” means that the values of Y are determined by the values
of X. Two tuples sharing the same values of X will necessarily have the same values of
Case1: A →B
Here A1 belongs to B1 & B2. So A1 does not have unique value in B. So it is not in FD.
Case1: A →C
Here A1→C1 and A2, A3→C2. So A has unique values in B. So it is in FD.
Note: try to find all the possibilities. i.e., A→D, B→C, B→D, and C→D
Reasoning about functional dependencies:Armstrong Axioms (Inference Rules ) : The term Armstrong axioms refers
to the sound and complete set of inference rules or axioms, introduced by William W. Armstrong, that is used to
test logical implication of functional dependencies.
Armstrong axioms define the set of rules for reasoning about functional dependencies and also to infer
all the functional dependencies on a relational database.
Various axioms rules or inference rules:
Primary axioms:
Attribute closure of an attribute set can be defined as set of attributes which can be functionally
determined from it.
The set of FD’s that is logically implied by F is called the closure of F and written as F +. And it is
defined as “If F is a set FD’s on a relation R, the F+, the closure of F by using the inferences axioms
that are not contained in F+.
Example: R (A, B, C, D) and set of Functional Dependencies are A→B, B→D, C→B then what is
the Closure of A, B, C, D?
Solution: A+ is
1. Fully Functional Dependency: A functional dependency is said to be full dependency “if and
only if the determinant of the functional dependency if either candidate key or super key, and
the dependent can be either prime or non-prime attribute”.
(OR)
Let’s take the functional dependency X → Y (i.e., X determines y). Here Y is said to be fully
determinant, if it cannot determine any subset of X.
Example: Consider the following determinant ABC → D i.e., ABC determines D but D is
not determined by any subset of A/ BC/C/B/AB i.e., BC→D, C→D, A→D Functional
dependencies are not exists. So D is Fully Functional Dependent.
2. Partial Functional Dependency: If a non-prime attribute of the relation is getting derived by only
a part of the candidate key, then such dependency is known as Partial Dependency.
(OR)
In a relation having more than one key field, a subset of non key fields may depend on all key
fields but another subset or a particular non-key field may depend on only one of the key
fields. Such dependency is defined as Partial Dependency.
Example: Consider the following determinants AC→P, A→D, D→P. From these determinants P is
not fully FD on AC. Because, If we find A+ (means A’s Closure) A→D, D→P i.e., A→P. But we
don’t have any requirement of C. C attribute is removed completely. So P is Partially Dependent
on AC.
Under the following conditions a table cannot have partial F.D
(1) If primary key consists a single attribute
(2) If table consists only two attributes
(3) If all the attributes in the table are part of the primary key
3. Transitive Functional Dependency: If a non-prime attribute of a relation is getting derived
by either another non-prime attribute or the combination of the part of the candidate key
along with non-prime attribute, then such dependency is defined as Transitive
dependency. i.e., in a relation, there may be dependency among non-key fields. Such
dependency is called Transitive Functional Dependency.
Example: X→Y, and Y→Z then we can determine X→Z holds.
Under the following Circumstances, a table cannot have transitive F.D
(1) If table consists only two attributes
(2) If all the attributes in the table are part of the primary key.
4. Trivial Functional Dependency: It is basically related to Reflexive rule. i.e., if X is a set of
attributes, and Y is subset of X then X→Y holds.
Example: ABC→BC is a Trivial Dependency.
Candidate Key:
Candidate Key is minimal set of attributes of a relation which can be used to identify a tuple
uniquely.
Consider student table: student(sno, sname,sphone,age)
we can take sno as candidate key. we can have more than 1 candidate key in a table. types of
candidate keys:
1. simple(having only one attribute)
2. composite(having multiple attributes as candidate key)
Super Key:
Super Key is set of attributes of a relation which can be used to identify a tuple uniquely.
➢ Adding zero or more attributes to candidate key generates super key.
➢ A candidate key is a super key but vice versa is not true.
Consider student table: student(sno, sname,sphone,age)
we can take sno, (sno, sname) as super key
Examples:
1. In a schema with attributes A,B,C,D and E the following set of attributes are given A B, A C,
CD E, B D, E A. Find CD AC determines from the given FDs or not.
Sol: Given FD is CD AC find the closure set of CD.
CD+ = CDE (∵ CD E)
= CDEA (∵ E A)
= CDEAB (∵ A B)
From the closure set the attributes AC are determined by CD so CD AC.
(i) Primary key: It is an unique value attribute in a table to enforce entity integrity and ti
identify rows in the table uniquely.
(ii) Composite Primary Key: Sometimes single attribute is not sufficient to identify uniquely the
rows in the table so, we combine 2 or more attributes to identify the rows uniquely.
(iii) Candidate keys: Sometimes 2 or more independent attribute or attributes can be used to
identify the rows uniquely Eg :( vech no,veng no,purchase date) Either vehicle no or
vehicle engine no can be used as a key attribute then they are called as candidate keys one
of the candidate key can be elected as primary key.
Example 1: Find candidate keys for the relation R(ABCD) having following FD’s AB CD, C A,
D A.
Sol: From the given FD’s, the attribute B is key attribute because it is not in RHS of
functional dependency.
CD+ = CDA (∵ D A
) AC+ = AC
AD+ =AD
From the above attributes AB and BC determines all attributes. AB,
BC are candidate keys.
Example 2: Find candidate keys for the relation R(ABCDE) having following FD’s A BC, CD E, B D,
E A.
Sol: From the given FD’s, no attribute is key attribute because all are in RHS of
functional dependency. So check for all attributes of LHS.
A+ = ABC (∵ A BC)
= ABCD (∵ B D)
= ABCDE (∵ CD E)
B+ = BD (∵ B D)
E+ = EA (∵ E A)
= EABC (∵ A BC)
= EABCD (∵ B D)
C+ = C
D+ = D
CD+ = CDE (∵ CD E)
= CDEA (∵ E A)
= CDEAB (∵ A BC)
BC+ = BCD (∵ B D)
= BCDE (∵ CD E)
= BCDEA (∵ E A)
From the above attributes A, E, CD and BC determines all attributes. A, E,
CD, BC are candidate keys.
(3) To identify equivalence of F.D’s :
Different database designers may define different F.D’s sets from the same requirements. To evaluate
whether they are equivalent if we are able to derive all F.D’s in G from F and vice- versa.
Step 1: Take set F and enclose all FD’s in G that can be derived from F. A CD
A+ from F
=A
=AC (∵ A C)
=ACD (∵ AC D)
A CD can be derived from F
E AH
E+ from F
=E
=EAD
=EADH
E AH can be derived from F
Step 2: Take set G and enclose all F.D’s in F that can be derived from
G. A C
A+ from G
=A
=ACD
A C can be derived from G
E AD
E+ from G
=E
=EAH
=EAHCD
E AH & E AD can be derived from G
G and F are equivalent.
(4) To identify the irreducible form of FD’s /canonical Form (minimal cover):
We try to minimize the functional dependency. The minimize FD should be equivalent to original FD,
Procedure to find minimal set:
Step 1: Have single attributes on the RHS for every FD.
Step 2: Evaluate all F.D’s in step 1 for their necessity. If they are not necessary, remove them from
the list.
Step 3: Evaluate the necessity of the LHS attributes in FD’s obtained from step [Link] they are not necessary
remove from FD.
Step 4: Apply the union rule for common to LHS attribute in the FD’s obtained from step
[Link] we will get irreducible set.
Remove 4 and compute D+ from 1, 2, 4, 5&6
Step 4:
Additional Resources:
[Link]
Practice Questions:
[Link] Schema.
Normal forms based on functional dependency (1NF, 2NF and 3 NF, Boyce- Codd normal form (BCNF),
4NF)
Normalization means “split the tables into small tables which will contain less number of attributes in such
a way that table design must not contain any problem of inserting, deleting, updating anomalies and
guarantees no redundancy”.
The evolution of Normalization theories / Steps of Normalization / Different Normal Forms
is illustrated below-
1. First Normal Form (1NF)
2. Second Normal Form (2NF)
3. Third Normal Form (3NF)
4. Boyce-Codd Normal Form (BCNF)
5. Fourth Normal Form (4NF)
6. Fifth Normal Form (5NF).
Points to be Remember
Solution: This table is not in first normal form because the [Color] column can contain multiple
values. For example, the first row includes values "red" and "green." To bring this table to first
normal form, we split the table into two tables and now we have the resulting tables:
Second Normal Form (2NF): A relation is said to be in 2NF, if it is already in 1st NF and it has no
Partial Dependency i.e., no non-prime attribute is dependent on the only a part of the candidate key.
(OR)
A relation is in second normal form if it satisfies the following conditions:
• It is in first normal form
• All non-key attributes are fully functional dependent on the primary key.
Note: Partial Functional Dependency: If a non-prime attribute of the relation is getting derived by
only a part of the candidate key, then such dependency is known as Partial Dependency
➔ This table has a composite primary key [Customer ID, Store ID]. The non-key attribute is [Purchase
Location]. In this case, [Purchase Location] only depends on [Store ID], which is only part of the
primary key. Therefore, this table does not satisfy second normal form.
➔ To bring this table to second normal form, we break the table into two tables, and now we have
the following:
AB BD
C
B D
AB C
Q2 Consider the relation R=ABCDEF and set of FDs are A FC, C D, B E Find the key
and normalize into 2NF
Third Normal Form (3NF): A database is in third normal form if it satisfies the following conditions:
• It is in 2NF.
• There is no transitive functional dependency
➢ By transitive functional dependency, we mean we have the following relationships in the table:
A is functionally dependent on B, and B is functionally dependent on C. In this case, C is
transitively dependent on A via B. and A non-key attribute is depending on
a non-key attribute.
➔ In the table able, [Book ID] determines [Genre ID], and [Genre ID] determines [Genre Type].
Therefore, [Book ID] determines [Genre Type] via [Genre ID] and we have transitive functional
dependency, and this structure does not satisfy third normal form.
➔ To bring this table to third normal form, we split the table into two as follows:
Q1 Given relation R(ABCDE) and F:{AB C, B D, D E} Decompose in into 3NF. from the
given FDs determine primary key. Necessary attributes to include in the key are A, B
(because this attributes are not in RHS of FD).
ABCD
B+
AB BDE
C
B D, D E
AB C
D E
Table is 3NF.
The relations after decomposing into 3NF.
R1: ABC
R2: BD
R3: DE
Q2 Given relation R=ABCDEFGHIJ and the set of FDs are AB C, A DE, B F, F GH, D
IJ Decompose R into 3NF.
Q3(a) Given a set of FDs for the relation schema R(ABCD) with primary key AB under which R is 1NF
but not in 2NF
Sol: R=ABCD
Key=AB
(a) Atomic values are allowed in 1NF and partial dependency is not allowed in 2NF. The
following FDs are allowed.
B C, A C, B D, A D
(show the FDs which is having partial dependency)
(b) According to question partial dependencies are not allowed and transitivity
dependency is allowed. The following FDs are allowed.
C D, D C
Boyce-Codd normal form (BCNF): A relation is said to be in BCNF, if and only if every determinant
should be a candidate key.
✓ BCNF is the advance version of 3NF. It is stricter than 3NF.
✓ A table is in 3NF if for every functional dependency X → Y, X is the super key of the table.
✓ For BCNF, the table should be in 3NF and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
Fourth Normal Form (4NF): A relation said to be in 4NF if it is in Boyce Codd normal form and
should have no multi-valued dependency.
✓ For a dependency A→ B, if for a single value of A, multiple value of B exists then the
relation will be multi-valued dependency.
✓ Note: Multi Valued Dependency: A table is said to have multi-valued dependency, if the following
conditions are true,
1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the
table may have multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3. And, for a relation R (A, B, C), if there is a multi-valued dependency between, A and B,
then B and C should be independent of each other.
■ If all these conditions are true for any relation (table), it is said to have multi-valued
dependency.
Example
➢ The given STUDENT table is in 3NF but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY. In the STUDENT
relation, student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which
leads to un-necessary repetition of data.
➢ So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STUDENT_HOBBY
res
erving decomposition:
Decomposition of a relation is done when a relation in a relational model is not in appropriate
normal form. Relation R is decomposed into two or more relations if decomposition
is lossless join as well as dependency preserving.
Lossless Join Decomposition
Additional Resources:
[Link]
Practice Questions:
1. Why do we need Normalization in DBMS?
2. What are the different Normal Forms in DBMS? Following are the different Database
Normal Forms.
3. What is a Primary Key in DBMS?
4. What are non-key attribute in a Table?
5. What is the fullform of BCNF?
6. What is First Normal Form (1NF)?
7. What is Dependency?
8. How to satisfy BCNF?
[Link]
Denormalization is used to alter the structure of a database. Denormalization focuses on adding
redundancy which means combining multiple tables so that execute queries quickly. In this article,
we’ll explore Denormalization and how it impacts database design.
In a traditional normalized database, we store data in separate logical tables and attempt to minimize
redundant data. We may strive to have only one copy of each piece of data in a database.
For example, in a normalized database, we might have a Courses table and a Teachers table. Each
entry in Courses would store the teacherID for a Course but not the teacherName. When we need
to retrieve a list of all Courses with the Teacher’s name, we would do a join between these two
tables.
In some ways, this is great; if a teacher changes his or her name, we only have to update the name
in one place.
The drawback is that if tables are large, we may spend an unnecessarily long time doing joins on
tables.
Denormalization, then, strikes a different compromise. Under denormalization, we decide that
we’re okay with some redundancy and some extra effort to update the database in order to get the
efficiency advantages of fewer joins.
Additional Resources:
[Link]
Practice Questions:
1. What is Denormalization in Databases?
2. What is the drawback of denormalization and how do you maintain?
3. What are the Disadvantages of Database denormalization: