What is functional dependency?
● Functional Dependency is a relationship that exists between multiple
attributes of a relation.
● This concept is given by E. F. Codd.
● Functional dependency represents a formalism on the infrastructure of
relation.
● It is a type of constraint existing between various attributes of a relation.
● It is used to define various normal forms.
● These dependencies are restrictions imposed on the data in database.
● If P is a relation with A and B attributes, a functional dependency
between these two attributes is represented as {A → B}. It specifies
that,
A It is a determinant set.
B It is a dependent attribute.
{A → B} A functionally determines B.
B is a functionally dependent on A.
● Each value of A is associated precisely with one B value. A functional
dependency is trivial if B is a subset of A.
● 'A' Functionality determines 'B' {A → B} (Left hand side attributes
determine the values of Right hand side attributes).
For example: <Employee> Table
EmpId EmpName
● In the above <Employee> table, EmpName (employee name) is
functionally dependent on EmpId (employee id) because the EmpId is
unique for individual names.
● The EmpId identifies the employee specifically, but EmpName cannot
distinguish the EmpId because more than one employee could have the
same name.
● The functional dependency between attributes eliminates the repetition
of information.
● It is related to a candidate key, which uniquely identifies a tuple and
determines the value of all other attributes in the relation.
Advantages of Functional Dependency
● Functional Dependency avoids data redundancy where same data should
not be repeated at multiple locations in same database.
● It maintains the quality of data in database.
● It allows clearly defined meanings and constraints of databases.
● It helps in identifying bad designs.
● It expresses the facts about the database design.
Introduction to Axioms Rules
● Armstrong's Axioms is a set of rules.
● It provides a simple technique for reasoning about functional
dependencies.
● It was developed by William W. Armstrong in 1974.
● It is used to infer all the functional dependencies on a relational
database.
Various Axioms Rules
A. Primary Rules
Rule Reflexivity
1 If A is a set of attributes and B is a subset of A, then A
holds B. { A → B }
Rule Augmentation
2 If A hold B and C is a set of attributes, then AC holds BC.
{AC → BC}
It means that attribute in dependencies does not change
the basic dependencies.
Rule Transitivity
3 If A holds B and B holds C, then A holds C.
If {A → B} and {B → C}, then {A → C}
A holds B {A → B} means that A functionally determines
B.
B. Secondary Rules
Rule 1 Union
If A holds B and A holds C, then A holds BC.
If{A → B} and {A → C}, then {A → BC}
Rule 2 Decomposition
If A holds BC and A holds B, then A holds C.
If{A → BC} and {A → B}, then {A → C}
Rule 3 Pseudo Transitivity
If A holds B and BC holds D, then AC holds D.
If{A → B} and {BC → D}, then {AC → D}
Sometimes Functional Dependency Sets are not able to reduce if
the set has following properties,
1. The Right-hand side set of functional dependency holds only one
attribute.
2. The Left-hand side set of functional dependency cannot be reduced, it
changes the entire content of the set.
3. Reducing any functional dependency may change the content of the
set.
A set of functional dependencies with the above three properties are also
called as Canonical or Minimal.
Trivial Functional Dependency
Trivial If A holds B {A → B}, where A is a subset of B, then it is called
a Trivial Functional Dependency. Trivial always holds
Functional Dependency.
Non-Trivial If A holds B {A → B}, where B is not a subset A, then it is called
as a Non-Trivial Functional Dependency.
Completely If A holds B {A → B}, where A intersect B = Φ, it is called as
Non-Trivial a Completely Non-Trivial Functional Dependency.
Introduction to Normalization
● Normalization is a process of organizing the data in the database.
● It is a systematic approach of decomposing tables to eliminate data
redundancy.
● It was developed by E. F. Codd.
● Normalization is a multi-step process that puts the data into a tabular
form by removing the duplicate data from the relation tables.
● It is a step by step decomposition of complex records into simple
records.
● It is also called as Canonical Synthesis.
● It is the technique of building database structures to store data.
Definition of Normalization
“Normalization is a process of designing a consistent database by
minimizing redundancy and ensuring data integrity through
decomposition which is lossless.”
Features of Normalization
● Normalization avoids the data redundancy.
● It is a formal process of developing data structures.
● It promotes the data integrity.
● It ensures data dependencies make sense that means data is logically
stored.
● It eliminates the undesirable characteristics like Insertion, Updation and
Deletion Anomalies.
Types of Normalization
Following are the types of Normalization:
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. Fourth Normal Form
5. Fifth Normal Form
6. BCNF (Boyce – Codd Normal Form)
7. DKNF (Domain Key Normal Form)
1. First Normal Form (1NF)
● First Normal Form (1NF) is a simple form of Normalization.
● It simplifies each attribute in a relation.
● In 1NF, there should not be any repeating group of data.
● Each set of column must have a unique value.
● It contains atomic values because the table cannot hold multiple values.
Example: Employee Table
ECode Employee_Name Department_Name
1 ABC Sales, Production
2 PQR Human Resource
3 XYZ Quality Assurance, Marketing
Employee Table using 1NF
ECode Employee_Name Department_Name
1 ABC Sales
1 ABC Production
2 PQR Human Resource
3 XYZ Quality Assurance
3 XYZ Marketing
2. Second Normal Form (2NF)
● In 2NF, the table is required in 1NF.
● The main rule of 2NF is, 'No non-prime attribute is dependent on the proper subset
of any candidate key of the table.'
● An attribute which is not part of candidate key is known as non-prime attribute.
Example : Employee Table using 1NF
ECode Employee_Name Employee_Age
1 ABC 38
1 ABC 38
2 PQR 38
3 XYZ 40
3 XYZ 40
Candidate Key: ECode, Employee_Name
Non prime attribute: Employee_Age
● The above table is in 1NF. Each attribute has atomic values. However, it
is not in 2NF because non prime attribute Employee_Age is dependent
on ECode alone, which is a proper subset of candidate key. This violates
the rule for 2NF as the rule says 'No non-prime attribute is dependent on
the proper subset of any candidate key of the table'.
2NF (Second Normal Form) : Employee1 Table
ECode Employee_Age
1 38
2 38
3 40
Employee2 Table
ECode Employee_Name
1 ABC
1 ABC
2 PQR
3 XYZ
3 XYZ
● Now, the above tables comply with the Second Normal Form (2NF).
3. Third Normal Form (3NF)
● Third Normal Form (3NF) is used to minimize the transitive redundancy.
● In 3NF, the table is required in 2NF.
● While using the 2NF table, there should not be any transitive partial
dependency.
● 3NF reduces the duplication of data and also achieves the data integrity.
Example : <Employee> Table
EId Ename DOB City State Zip
001 ABC 10/05/1990 Pune Maharashtra 411038
002 XYZ 11/05/1988 Mumbai Maharashtra 400007
● In the above <Employee> table, EId is a primary key but City, State
depends upon Zip code.
● The dependency between Zip and other fields is called Transitive
Dependency.
● Therefore we apply 3NF. So, we need to move the city and state to the
new <Employee_Table2> table, with Zip as a Primary key.
<Employee_Table1> Table
EId Ename DOB Zip
001 ABC 10/05/1990 411038
002 XYZ 11/05/1988 400007
<Employee_Table2> Table
City State Zip
Pune Maharashtra 411038
Mumbai Maharashtra 400007
● The advantage of removing transitive dependency is, it reduces the amount
of data dependencies and achieves the data integrity.
● In the above example, using with the 3NF, there is no redundancy of data
while inserting the new records.
● The City, State and Zip code will be stored in the separate table. And
therefore the updation becomes more easier because of no data
redundancy.
4. BCNF (Boyce – Code Normal Form)
● BCNF which stands for Boyce – Code Normal From is developed by
Raymond F. Boyce and E. F. Codd in 1974.
● BCNF is a higher version of 3NF.
● It deals with the certain type of anomaly which is not handled by 3NF.
● A table complies with BCNF if it is in 3NF and any attribute is fully
functionally dependent that is A → B. (Attribute 'A' is determinant).
● If every determinant is a candidate key, then it is said to be BCNF.
● Candidate key has the ability to become a primary key. It is a column in
a table.
Example : <EmployeeMain> Table
Empid Ename DeptName DepType
E001 ABC Production D001
E002 XYZ Sales D002
The functional dependencies are:
Empid → EmpName
DeptName → DeptType
Candidate Key:
Empid
DeptName
● The above table is not in BCNF as neither Empid nor DeptName alone are
keys.
● We can break the table in three tables to make it comply with BCNF.
<Employee> Table
Empid EmpName
E001 ABC
E002 XYZ
<Department> Table
DeptName DeptType
Production D001
Sales D002
<Emp_Dept> Table
Empid DeptName
E001 Production
E002 Sales
Now, the functional dependencies are:
Empid → EmpName
DeptName → DeptType
Candidate Key:
<Employee> Table : Empid
<Department> Table : DeptType
<Emp_Dept> Table : Empid, DeptType
● So, now both the functional dependencies left side part is a key, so it is
in the BCNF.
5. Fourth Normal Form (4NF)
● Fourth Normal Form (4NF) does not have non-trivial multivalued
dependencies other than a candidate key.
● 4NF builds on the first three normal forms (1NF, 2NF and 3NF) and the
BCNF.
● It does not contain more than one multivalued dependency.
● This normal form is rarely used outside of academic circles.
For example : A table contains a list of three things that is 'Student',
'Teacher', 'Book'. Teacher is in charge of Student and recommended book
for each student. These three elements (Student, Teacher and Book) are
independent of one another. Changing the student's recommended book,
for instance, has no effect on the student itself. This is an example of
multivalued dependency, where an item depends on more than one
value. In this example, the student depends on both teacher and book.
● Therefore, 4NF states that a table should not have more than one
dependencies.
6. Fifth Normal Form (5NF)
● 5NF is also knows as Project-Join Normal Form (PJ/NF).
● It is designed for reducing the redundancy in relational databases.
● 5NF requires semantically related multiple relationships, which are rare.
● In 5NF, if an attribute is multivalued attribute, then it must be taken out
as a separate entity.
● While performing 5NF, the table must be in 4NF.
What is decomposition?
● Decomposition is the process of breaking down in parts or elements.
● It replaces a relation with a collection of smaller relations.
● It breaks the table into multiple tables in a database.
● It should always be lossless, because it confirms that the information in the
original relation can be accurately reconstructed based on the decomposed
relations.
● If there is no proper decomposition of the relation, then it may lead to problems
like loss of information.
Properties of Decomposition
Following are the properties of Decomposition,
1. Lossless Decomposition
2. Dependency Preservation
3. Lack of Data Redundancy
1. Lossless Decomposition
● Decomposition must be lossless. It means that the information should not get lost
from the relation that is decomposed.
● It gives a guarantee that the join will result in the same relation as it was
decomposed.
Example:
Let's take 'E' is the Relational Schema, With instance 'e'; is decomposed into: E1,
E2, E3, . . . . En; With instance: e1, e2, e3, . . . . en, If e1 ⋈ e2 ⋈ e3 . . . . ⋈ en,
then it is called as 'Lossless Join Decomposition'.
● In the above example, it means that, if natural joins of all the decomposition give
the original relation, then it is said to be lossless join decomposition.
Example: <Employee_Department> Table
Eid Ename Age City Salary Deptid DeptName
E001 ABC 29 Pune 20000 D001 Finance
E002 PQR 30 Pune 30000 D002 Production
E003 LMN 25 Mumbai 5000 D003 Sales
E004 XYZ 24 Mumbai 4000 D004 Marketing
E005 STU 32 Bangalore 25000 D005 Human Resource
● Decompose the above relation into two relations to check whether a
decomposition is lossless or lossy.
● Now, we have decomposed the relation that is Employee and
Department.
Relation 1 : <Employee> Table
Eid Ename Age City Salary
E001 ABC 29 Pune 20000
E002 PQR 30 Pune 30000
E003 LMN 25 Mumbai 5000
E004 XYZ 24 Mumbai 4000
E005 STU 32 Bangalore 25000
● Employee Schema contains (Eid, Ename, Age, City, Salary).
Relation 2 : <Department> Table
Deptid Eid DeptName
D001 E001 Finance
D002 E002 Production
D003 E003 Sales
D004 E004 Marketing
D005 E005 Human Resource
● Department Schema contains (Deptid, Eid, DeptName).
● So, the above decomposition is a Lossless Join Decomposition, because
the two relations contains one common field that is 'Eid' and therefore
join is possible.
● Now apply natural join on the decomposed relations.
Employee ⋈ Department
Eid Ename Age City Salary Deptid DeptName
E001 ABC 29 Pune 20000 D001 Finance
E002 PQR 30 Pune 30000 D002 Production
E003 LMN 25 Mumbai 5000 D003 Sales
E004 XYZ 24 Mumbai 4000 D004 Marketing
E005 STU 32 Bangalore 25000 D005 Human Resource
Hence, the decomposition is Lossless Join Decomposition.
● If the <Employee> table contains (Eid, Ename, Age, City, Salary) and
<Department> table contains (Deptid and DeptName), then it is not possible to
join the two tables or relations, because there is no common column between
them. And it becomes Lossy Join Decomposition.
2. Dependency Preservation
● Dependency is an important constraint on the database.
● Every dependency must be satisfied by at least one decomposed table.
● If {A → B} holds, then two sets are functional dependent. And, it becomes more
useful for checking the dependency easily if both sets in a same relation.
● This decomposition property can only be done by maintaining the functional
dependency.
● In this property, it allows to check the updates without computing the natural join
of the database structure.
3. Lack of Data Redundancy
● Lack of Data Redundancy is also known as a Repetition of Information.
● The proper decomposition should not suffer from any data redundancy.
● The careless decomposition may cause a problem with the data.
● The lack of data redundancy property may be achieved by Normalization process