5.
Normalization
• Normalization is a systematic approach to organize data in a database to eliminate
redundancy, avoid anomalies and ensure data consistency. The process involves breaking
down large tables into smaller, well-structured ones and defining relationships between
them
• A large set of data is structured into a bunch of smaller tables, and the tables have a proper
Relationship between them.
• It is a multi-step process that puts data into tabular form, removes duplicate data, and set up
the relationship between tables.
• Types of Normal Forms
First Normal Form (1NF): This is the most basic level of normalization. In 1NF, each
table cell should contain only a single value, and each column should have a unique name. The
first normal form helps to eliminate duplicate data and simplify queries.
Second Normal Form (2NF): 2NF eliminates redundant data by requiring that each
non-key attribute be dependent on the primary key. This means that each column should be
directly related to the primary key, and not to other columns.
Third Normal Form (3NF): 3NF builds on 2NF by requiring that all non-key attributes
are independent of each other. This means that each column should be directly related to the
primary key, and not to any other columns in the same table.
Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF that ensures
that each determinant in a table is a candidate key. In other words, BCNF ensures that each non-
key attribute is dependent only on the candidate key.
Fourth Normal Form (4NF): 4NF is a further refinement of BCNF that ensures that a
table does not contain any multi-valued dependencies.
Fifth Normal Form (5NF): 5NF is the highest level of normalization and involves
decomposing a table into smaller tables to remove data redundancy and improve data integrity.
1NF
Prod_Id Sup_Id Price Prod_name Sup_Name Cust_Id Cust_name Date
Sales
Primary key: ( Prod_Id,Sup_Id)
The above table is in 1NF
But the above table is not in 2NF because there is a partial dependency
If you know Pro_Id, we can write Prod_name that is
Prod_Id->Prod_name ( Partial dependency)
If you know Sup_Id, we can write Sup_Name that is
Sup_Id->Sup_Name (Partial Dependency)
Hence the table is not in 2N
Decompose the table to convert into 2NF
Sales:
Prod_Id Sup_Id Price Cust_Id Cust_name Date
Primary key: ( Prod_Id,Sup_Id)
Product:
Prod_Id Prod_name
Primary key: Prod_Id
Supplier:
Sup_Id Sup_Name
Primary key: Sup_Name
Product and Supplier tables are in 3NF but Sales table is not in 3NF because there is a transitive
dependency
Sales:
Prod_Id Sup_Id Price Cust_Id Cust_name Date
If you know the Cust_Id we can write Cust_name,
Cust_Id->Cust_name
that is a non key attribute determines an attribute value that non key attribute Cust_nmae
transitively dependent on primary key, this is not allowed in 3NF.
1) (Prod_Id,Sup_Id ) -> Cust_Id
2) Cust_Id->Cust_name
From 1 and 2 : (Prod_Id,Sup_Id )->Cust_name
To convert into 3NF decompose the table
Sales:
Prod_Id Sup_Id Price Date
Primary key: ( Prod_Id,Sup_Id)
Customer:
Cust_Id Cust_name
Primary key: Cust_Id
Product:
Prod_Id Prod_name
Primary key: Prod_Id
Supplier:
Sup_Id Sup_Name
Primary key: Sup_Name
Now the tables Sales,Customer,Product and supplier are in 3NF