Normalization
use [Link]
[Link]
The theory of Normalization
process
o Identifying potential problems, called updating anomalies, in the design
of a relational database
o Methods for correcting these problems
Normal form: table has desirable properties
o First normal form (1NF)
o Second normal form (2NF)
o Third normal form (3NF)
o Fourth normal form (4NF)
Normalization Rule of Thumb
o Table in first normal form better than table not in first normal form
o Table in second normal form better than table in first normal form, and
so on
o Goal: new collection of tables that is free of update anomalies
Why do we normalize?
free of update anomalies (problems)
o if I update (change, delete or edit), I shouldn’t have to throughout the
table
tables are smaller
save space
o there may be more records! But less overall data since repeated less
Functional Dependence
Column B is functionally dependent on column A
o Each value for A is associated with exactly one value of B
A functionally determines B
o Column B is functionally dependent on another column A (or collection
of columns) when each value for A in the database is associated with
exactly one value of B
The only way to determine the functional dependenices that exist is to
examine the user’s polices
StudentNum StudentL StudentF HighSchoolNum HighSchoolName Adnum AdL AdF
1. Identify columns that will be unigiue (as in only one person could have a #)
2. for those columns identified, determine what other columns
Keys
Column A (or a collection of columns) is the primary key for a relation R
o Property 1: all columns in R are functionally dependent on A
o Property 2: no subcollection of columns in A also have Property 1
Candidate key: column(s) on which all columns in table are functionally
dependent
Alternate keys: candidate keys not chosen as primary key
Column(s) A is the primary key if all other columns are functionally
dependent on A and no subcollection of columns in A also have this
property
First Normal Form
A Table in first normal form (1NF) does not contain repeating groups
o Repeating group: multiple entries (data) for a single record
o Unnormalized relation: contains a repeating group
o Some may already come in 1NF!!
1NF Example
Unnormalized Normalized
Orders(OrderNum, OrderDate, PartNum, NumOrdered)
(dates are all wrong - typo)
Second Normal Form
Table (relation) in second normal form (2NF) when
o Table is in first normal form (1NF)
o No nonkey column is dependent on only a portion of primary key
Nonkey column (nonkey attribute): not part of primary key
Worries about redundant data, that COULD have update anomalies or
inconsistent data
o when rows are Updated, Additions, Deleted
an ID# will never be changed, even if the person/part does
o Inconsistent data, could you change one, and the rest would not
update?
You will have to create more tables (but smaller in size and data)
You may have to ADD columns (like an ID)
Non-2NF Table and possible problems
Notice that if I made ONE change in a record, I would have to find OTHERS to
change as well!!
o Example, what if I needed to change the Gas Range’s part number to
something new. It’d have to go through ALL of the records and change
them!!
o No no!!
o So let’s break it down so if I make a change, it will be made DB wide.
Converting to 2NF
o Table must FIRST be in 1NF
o Dependency Diagrams in Normalization
arrows indicate all functional dependencies
Arrows above boxes: normal dependencies
Arrows below boxes: partial dependencies
dependencies are formed by logic
Dependency diagrams for Orders Table
Finalized 2NF for OrderLine to these tables
Orders(OrderNum, OrderDate)
Part (PartNum, Description)
OrderLine(OrderNum, PartNum, NumOrdered, QuotedPrice)
Third Normal Form (3NF)
2NF tables may still contain problems
tables may have wasted space and Redundancy
could also have the same update anomalies and redundancy as before!!
so we look for Determinants:
o column(s) that determines another column
o Its only determinants are candidate keys
Customer Table Example
Customer Table Dependencies
Correction procedure
1. For each determinant that is not a candidate key, remove from table the
columns that depend on this determinant
2. Create new table containing all columns from the original table that depend
on this determinant
3. Make determinant the primary key of new table
Customer Table Normalized to 3NF
Customer(CustomerNum, CustomerName, Balance, CreditLimit, RepNum)
Rep(RepNum, LastName, FirstName)
Fourth Normal Form (4NF)
It is in third normal form
No multivalued dependencies
Converting table to fourth normal form
Split third normal form table into separate tables, each containing the
column that multidetermines the others
Avoiding the Problem with Multivalued Dependencies
o Place each repeating group in separate table
o Each table will contain all columns of a repeating group, and primary
key of the original table
o Primary key to each new table will be the concatenation of the
primary keys of the original table and the repeating group
Conversion to 4NF
Overall Idea
From start to Finish Example
Produce the Third Normal Form of this document by normalization.
0NF
o ORDER(order#, customer#, name, address, order_line(product#,
description, quantity, unitprice))
1NF
o ORDER(order#, customer#, name, address, orderdate)
o ORDER_LINE(order#, product#, description, quantity, unitprice)
2NF
o ORDER(order#, customer#, name, address, orderdate)
o ORDER_LINE(order#, product#, quantity)
o PRODUCT(product#, description, unitprice)
3NF
o ORDER(order#, customer#, orderdate)
o CUSTOMER(customer#, name, address)
o ORDER_LINE(order#, product#, quantity)
o PRODUCT(product#, description, unitprice)
Draw this out.