Normalization
The process of decomposing relations with anomalies to produce smaller, well-structured relations
Well-structured Relation
A relation that contains minimal redundancy and allows users to insert, modify, and delete the rows in a table without errors
or inconsistencies.
First, we will discuss: Problems we encounter if we don’t perform Normalization. In other words, problems we encounter when
there is redundancy.
Then, functional dependency.
Later: How to do Normzalization? How Normalization solves these problems?
What is redundancy?
Having multiple copies (i.e., repetition) of the same data in the database.
https://www.geeksforgeeks.org/the-problem-of-redundancy-in-database/
Do you see redundancy in the following table? Yes, the faculty id
FacultyID FacultyName DeptID DeptName DeptPhone
1 Srikanth 1 MIS 607-000-1111
2 Jinglu 1 MIS 607-000-1111
3 Surinder 1 MIS 607-000-1111
3 Murali 2 FIN 607-000-2222
4 Joey 2 FIN 607-000-2222
What are the issues with redundancy?
*Increases the size of database
* anomaly: An error or inconsistency that may result when a user attemps to update a table that contains redundant data.
1. Insertion Anomaly
2. Deletion Anomaly
3. Modification Anomaly
1. Insertion Anomaly:
Let us try to insert a new faculty member
FacultyID FacultyName DeptID DeptName DeptPhone
1 Srikanth 1 MIS 607-000-1111
2 Jinglu 1 MIS 607-000-1111
3 Surinder 1 MIS 607-000-1111
3 Murali 2 FIN 607-000-2222
4 Joey 2 FIN 607-000-2222
6 Mary 1 MIS 607-000-1111
7 John 1 MIS 607-000-1111
To insert redundant data for every new row is a data insertion problem or anomaly.
What information is needed for entering a new faculty? Faculty Info Dept Info
However, in a normal business situation, a new faculty should be allowed to be added without the need of a Dept.
What is the reason? redundancy, two different but related data in the same table
2. Deletion Anomaly:
Let us try to delete faculty data.
FacultyID FacultyName DeptID DeptName DeptPhone
1 Srikanth 1 MIS 607-000-1111
2 Jinglu 1 MIS 607-000-1111
3 Surinder 1 MIS 607-000-1111
3 Murali 2 FIN 607-000-2222
4 Joey 2 FIN 607-000-2222
What happenned? Dept info deleted along with student data
Loss of a related data when some other data is deleted is called deletion anomaly
3. Modification Anomaly
The MIS Dept phone number changes to 607-000-3333. Lets update!
FacultyID FacultyName DeptID DeptName DeptPhone
1 Srikanth 1 MIS 607-000-1111
2 Jinglu 1 MIS 607-000-1111
3 Surinder 1 MIS 607-000-8888 **If I change for one dept, I need to change for every in
3 Murali 2 FIN 607-000-2222
4 Joey 2 FIN 607-000-2222
Modification should be performed for every row corresponding to that entity instance; otherwise, the data will
be inconsistent.
How will normalization solve this problem?
The process of decomposing relations with anomalies to produce smaller, well-structured relations
Well-structured Relation: A relation that contains minimal redundancy and allows users to insert, modify, and delete the rows in a table
without errors or inconsistencies.
A technique of organizing the data into multiple related tables, to minimize DATA REDUNDANCY.
DIVIDE and RULE
FacultyID FacultyName DeptID DeptName DeptPhone
1 Srikanth 1 MIS 607-000-1111 normalization means breakdown
2 Jinglu 1 MIS 607-000-1111
3 Surinder 1 MIS 607-000-1111
3 Murali 2 FIN 607-000-2222
4 Joey 2 FIN 607-000-2222
FacultyID FacultyName DeptID
1 Srikanth 1
2 Jinglu 1 DeptID DeptName DeptPhone
3 Surinder 1 1 MIS 607-000-3333
4 Murali 2 2 FIN 607-000-2222
5 Joey 2
6 Mary 1
What about the Anomalies?
1. Insertion Anomaly? Let us try to insert a new faculty member mary
2. Deletion Anomaly? Let us try to delete faculty data. the strikethrough
3. Modification Anomaly? The MIS Dept phone number changes to 607-000-3333. Lets update! easily changed
pt, I need to change for every instance in the dept
rows in a table
lization means breakdown
Functional Dependency determinant vs candidate key
A constraint between two attributes in which the value of one attribute is determined by the value of another.
For a relation R, attribute B is functionally dependent on attribute A, if for every valid instance of A, that value A uniquely determines
the value of B.
· Representation: A → B
· Functional dependency is not a mathematical dependency: B cannot be computed from A.
· If you know the value of A, there can be only one value for B
Examples:
1. Bnumber -> Name, Age, Gender, Major, DateEnrolled B0001 --> Jim, 23, M, MIS this entire statement is con
2. DateCompleted EmpID, MIS333 -> a students course completion date of a course
The functional dependency in this statement implies that the date when a course is completed is determined by the identity of the
employee and the title of the course.
3. SSN --> Name, Address, BirthDate SSN is the determinant here
If someone looks up an SSN in the government database, can’t we find the corresponding person’s name, address, and birth date?
Yes. Hence, a person’s name, address, and birth date are functionally dependent on that person’s Social Security number.
There can be only one Name, one Address, and one Birthdate for each SSN.
4. Vin --> Make, Model, Color
The make, model, and the original color of a vehicle are functionally dependent on the vehicle identification number (VIN).
As above (3), there can be only one value of Make, Model, and Color associated with each VIN.
5. ISBN - --> FirstAuthor, Publisher, Title
The title of a book, the name of the first author, and the publisher are functionally dependent on the book’s international standard
book number (ISBN).
the attribute on the left hand side of the arrow = determinant
Determinant
The attribute on the left side of the arrow (→) in a functional dependency. BUNumber, SSN, VIN, and ISBN, are determinants in the
preceding three examples 1, 3, 4, 5, respectively.
EmpID and CourseTitle together become determinant for example 2.
Candidate Key
A candidate key is an attribute, or combination of attributes, that uniquely identifies a row in a relation. It satisfies the following properties.
1. Unique identification: For every row, the value of the key must uniquely identify that row. This property implies that each
nonkey attribute is functionally dependent on that key.
2. Nonredundancy: No attribute in the key can be deleted without destroying the property of unique identification.
All candidate keys are determinants. Not all determinants are candidate keys.
What are the candidate keys in the following tables?
EMPLOYEE1
The whole statement below is the functional dependency in EMPLOYEE1.
EmpID -> Name, DeptName, Salary the candidate key and the determinant are the same in this table
EmpID is the determinant in EMPLOYEE1
EmpID is the candidate key.
Pictorial Notation for Functional Dependencies in EMPLOYEE1
EMPLOYEE2
The two statements below represent the functional dependencies in EMPLOYEE2.
EmpID --> Name, DeptName, Salary
EmpID, CourseTItle --> Date Completed
The determinants are: (1) EmpID , and (2) EmpID,Course ID
Emp ID and Course Title is the candidate key.
candidate keys and determinants are not the same here
Pictorial Notation for Functional Dependencies in EMPLOYEE2
s entire statement is considered a functional dependency
Normalization Steps
Normalization can be accomplished and understood in stages, each of which corresponds to a normal form.
Normal form is a state of a relation that requires certain rules regarding relationships between attributes (or functio
1. First normal form
2. Second normal form
3. Third normal form
4. Boyce-Codd normal form
5. Fourth normal form
6. Fifth normal form
Let us start with a bill and think about a table to store the data.
If we were to store the bill data in a table, it will result in the following table.
What is the issue here? multivalued attribute or repeating groups
What is the consequence?
every bill is a row in the table
OrderID OrderDate CustID CusName CustAddress ProdID ProdDesc
1006 10/24/2015 2 Value Furniture Plano, TX 7 Dining Table
5 Writer's Desk
Entertainment
4 Center
Furniture
1007 10/25/2015 6 Gallery Boulder, CO 11 4-Dr Dresser
Entertainment
4 Center
convert this table to first normal form --> the relation does not have multival
Step 1: Convert to First Normal Form
A relation that has a primary key and in which there are no repeating groups.
a. Remove repeating groups.
b. Define a primary key that uniquely identifies each row.
OrderID OrderDate CustID CusName CustAddress ProdID ProdDesc
1006 10/24/2015 2 Value Furniture Plano, TX 7 Dining Table
1006 10/24/2015 2 Value Furniture Plano, TX 5 Writer's Desk
Entertainment
1006 10/24/2015 2 Value Furniture Plano, TX 4 Center
Furniture
1007 10/25/2015 6 Gallery Boulder, CO 11 4-Dr Dresser
Furniture Entertainment
1007 10/25/2015 6 Gallery Boulder, CO 4 Center
convert this table to first normal form --> the relation does not have multival
Relational Model
INVOICE
OrderID OrderDate CustID CusName CustAddress ProdID ProdDesc
What is the issue here? redundancy
What is the consequence for database? All anomalies apply.
1. Insertion Anomaly: Let us say the company wants to introduce a new product (Product ID 8).
Customer wants to change his order (orderID 1007). Results in data replication and data en
2. Deletion Anomaly: Customer wants to delete “Dining Table” from orderID 1006. Entire row should be de
Crucial Item information is lost.
3. Update Anomaly: Price increase of ProductID 4 to $750. Change has to propagate in all rows.
Why? in the 1NF you have partial functional dependency
Partial Functional Dependency: one or more nonkey attributes are functionally dependent on part (but not all) of th
primary key.
Non Key attributes OrderDate CustID CusName CustAddrProdDesc
Key attributes OrderID ProdID even tho both are key attributes, there are so
OrderID → OrderDate, CustID, CusName, and CustAddress ex. I only need order id for order date, I only n
ProdID → ProdDesc, ProdFinish, ProdStdPrice I need orderid and prodid for ordered quantit
Only OrderedQty depends on OrderID, ProdID i.e, OrderedQty → OrderID, ProdID
Step 2: Convert to Second Normal Form
A relation in first normal form in which every nonkey attribute is fully functionally dependent on the primary key.
Simply: the relation should be in first normal form and there should be no partial functional dependency.
a. Create a new relation for each primary key attribute that is a determinant in the partial dependency.
That attribute is the primary key in the new relation.
b. Move those nonkey attributes that are only dependent on this primary key attribute from the old relatio
to the new relation.
ORDER
OrderID OrderDate CustID CusName CustAddress
Order Line this database is in 2nd normal form
OrderID ProdID OrderedQty - it should be in 1nf
- should have no partial dependency
PRODUCT
ProdID ProdDesc ProdFinish ProdStdPrice
What is the problem here?
OrderID OrderDate CustID CusName CustAddress
1006 10/24/2015 2 Value Furniture Plano, TX
1006 10/24/2015 2 Value Furniture Plano, TX transitive dependency
1006 10/24/2015 2 Value Furniture Plano, TX non key attributes depend on other n
Furniture
1007 10/25/2015 6 Gallery Boulder, CO OrderID
Furniture
1007 10/25/2015 6 Gallery Boulder, CO key attribute
Step 3: Convert to Third Normal Form
A relation that is in second normal form and has no transitive dependencies.
a. Create a new relation for each determinant (CustomerID).
b. Move the attributes (customername, customer address) dependent on this determinant.
c. Keep customerID as the foreign key in the old relation (ORDER).
d. Associate two relations.
Final Relational Diagram After Normalization
CUSTOMER DB is in 3NF
CustID CusName CustAddress -- 2NF
-- no transitive
ORDER
OrderID OrderDate CustID
Order Line
OrderID ProdID OrderedQty
PRODUCT
ProdID ProdDesc ProdFinish ProdStdPrice
nds to a normal form.
between attributes (or functional dependencies are satisfied).
ProdFinish ProdStdPrice OrderedQty
Natural Ash 800.00 2
Cherry 325.00 2
Natural Maple 650.00 1
Oak 500.00
Natural Maple 650.00 3
elation does not have multivalued attributes (repeating groups), primary key needs to be present
ProdFinish ProdStdPrice OrderedQty
Natural Ash 800.00 2
Cherry 325.00 2
Natural Maple 650.00 1
Oak 500.00
Natural Maple 650.00 3
elation does not have multivalued attributes (repeating groups), primary key needs to be present
ProdFinish ProdStdPrice OrderedQty
uct (Product ID 8).
n data replication and data entry errors.
1006. Entire row should be deleted.
pagate in all rows.
dent on part (but not all) of the
ProdFinish ProdStdPrice OrderedQty
re key attributes, there are some attributes you can get with partial key attributes
rder id for order date, I only need prodid for proddesc
nd prodid for ordered quantity
endent on the primary key.
tional dependency.
in the partial dependency.
attribute from the old relation
tabase is in 2nd normal form
uld be in 1nf
d have no partial dependency
ve dependency
y attributes depend on other non-key attributes
OrderDate CustID CusName CustAddress
non key attribut non key attribut non key attri non key attribute
CustID --> CUsName CustAddress
key attribute non key attri non key attribute
determinant.