Functional Dependency
and Normalization
• The functional dependency is a relationship that exists
between two attributes.
• It typically exists between the primary key and non-key
attribute within a table.
•X → Y
• Inference Rule (IR):
• The Armstrong's axioms are the basic inference rule.
• Armstrong's axioms are used to conclude functional
dependencies on a relational database.
• The inference rule is a type of assertion. It can apply to
a set of FD(functional dependency) to derive other FD.
• Using the inference rule, we can derive additional
functional dependency from the initial set.
1. Reflexive Rule (IR1)
• In the reflexive rule, if Y is a subset of X, then X
determines Y.
• If X ⊇ Y then X → Y
X = {a, b, c, d, e}
Y = {a, b, c}
2. Augmentation Rule (IR2)
• The augmentation is also called as a partial
dependency. In augmentation, if X determines Y, then
XZ determines YZ for any Z.
If X → Y then XZ → YZ
For R(ABCD), if A → B then AC → BC
3. Transitive Rule (IR3)
• In the transitive rule, if X determines Y and Y determine Z,
then X must also determine Z.
• If X → Y and Y → Z then X → Z
4. Union Rule (IR4)
• Union rule says, if X determines Y and X determines Z, then
X must also determine Y and Z.
• If X → Y and X → Z then X → YZ
1. X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX =
X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
5. Decomposition Rule (IR5)
• Decomposition rule is also known as project rule. It is
the reverse of union rule.
• This Rule says, if X determines Y and Z, then X
determines Y and X determines Z separately.
• f X → YZ then X → Y and X → Z
6. Pseudo transitive Rule (IR6)
• In Pseudo transitive Rule, if X determines Y and YZ
determines W, then XZ determines W.
• If X → Y and YZ → W then XZ → W
1. X → Y (given)
2. WY → Z (given)
3. WX → WY (using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)
What is Normalization?
• Normalization is the process of organizing the data in
the database.
• Normalization is used to minimize the redundancy from
a relation or set of relations. It is also used to eliminate
undesirable characteristics like Insertion, Update, and
Deletion Anomalies.
• Normalization divides the larger table into smaller and
links them using relationships.
• The normal form is used to reduce redundancy from the
database table.
• Data modification anomalies can be categorized
into three types:
• Insertion Anomaly: Insertion Anomaly refers to when
one cannot insert a new tuple into a relationship due to
lack of data.
• Deletion Anomaly: The delete anomaly refers to the
situation where the deletion of data results in the
unintended loss of some other important data.
• Updatation Anomaly: The update anomaly is when an
update of a single data value requires multiple rows of
data to be updated.
First Normal Form (1NF)
• A relation will be 1NF if it contains an atomic value.
• It states that an attribute of a table cannot hold multiple
values. It must hold only single-valued attribute.
• First normal form disallows the multi-valued attribute,
composite attribute, and their combinations.
• Relation EMPLOYEE is not in 1NF because of multi-
valued attribute EMP_PHONE.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab
Second Normal Form (2NF)
• In the 2NF, relational must be in 1NF.
• In the second normal form, all non-key attributes are
fully functional dependent on the primary key
Example: Let's assume, a school can store the data of
teachers and the subjects they teach.
In a school, a teacher can teach more than one subject.
TEACHER_ID SUBJECT TEACHER_AGE
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
Third Normal Form (3NF)
• A relation will be in 3NF if it is in 2NF and not contain
any transitive partial dependency.
• 3NF is used to reduce the data duplication. It is also
used to achieve the data integrity.
• If there is no transitive dependency for non-prime
attributes, then the relation must be in third normal
form.
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopa
• Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP
and EMP_ZIP dependent on EMP_ID.
• The non-prime attributes (EMP_STATE, EMP_CITY)
transitively dependent on super key(EMP_ID). It violates
the rule of third normal form.
• That's why we need to move the EMP_CITY and
EMP_STATE to the new <EMPLOYEE_ZIP> table, with
EMP_ZIP as a Primary key.
EMP_ID EMP_NAME EMP_ZIP
222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007
EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal