Introduction of Database Normalization
Normalization is an important process in database design that helps improve the database’s
efficiency, consistency, and accuracy. It makes it easier to manage and maintain the data and
ensures that the database is adaptable to changing business needs.
Database normalization is the process of organizing the attributes of the database to reduce or
eliminate data redundancy (having the same data but at different places).
Data redundancy unnecessarily increases the size of the database as the same data is repeated
in many places. Inconsistency problems also arise during insert, delete, and update operations.
In the relational model, there exist standard methods to quantify how efficient a databases is.
These methods are called normal forms and there are algorithms to covert a given database
into normal forms.
Normalization generally involves splitting a table into multiple ones which must be linked each
time a query is made requiring data from the split tables.
Why do we need Normalization?
The primary objective for normalizing the relations is to eliminate the below anomalies. Failure
to reduce anomalies results in data redundancy, which may threaten data integrity and cause
additional issues as the database increases. Normalization consists of a set of procedures that
assist you in developing an effective database structure.
Insertion Anomalies: Insertion anomalies occur when it is not possible to insert data
into a database because the required fields are missing or because the data is
incomplete. For example, if a database requires that every record has a primary key, but
no value is provided for a particular record, it cannot be inserted into the database.
Deletion anomalies: Deletion anomalies occur when deleting a record from a database
and can result in the unintentional loss of data. For example, if a database contains
information about customers and orders, deleting a customer record may also delete all
the orders associated with that customer.
Updation anomalies: Updation anomalies occur when modifying data in a database and
can result in inconsistencies or errors. For example, if a database contains information
about employees and their salaries, updating an employee’s salary in one record but not
in all related records could lead to incorrect calculations and reporting.
Before Normalization: The table is prone to redundancy and anomalies (insertion, update, and
deletion).After Normalization: The data is divided into logical tables to ensure consistency, avoid
redundancy and remove anomalies making the database efficient and reliable.
Features of Database Normalization
Elimination of Data Redundancy: One of the main features of normalization is to
eliminate the data redundancy that can occur in a database. Data redundancy refers to
the repetition of data in different parts of the database. Normalization helps in reducing
or eliminating this redundancy, which can improve the efficiency and consistency of the
database.
Ensuring Data Consistency: Normalization helps in ensuring that the data in the
database is consistent and accurate. By eliminating redundancy, normalization helps in
preventing inconsistencies and contradictions that can arise due to different versions of
the same data.
Simplification of Data Management: Normalization simplifies the process of managing
data in a database. By breaking down a complex data structure into simpler tables,
normalization makes it easier to manage the data, update it, and retrieve it.
Improved Database Design: Normalization helps in improving the overall design of the
database. By organizing the data in a structured and systematic way, normalization
makes it easier to design and maintain the database. It also makes the database more
flexible and adaptable to changing business needs.
Avoiding Update Anomalies: Normalization helps in avoiding update anomalies, which
can occur when updating a single record in a table affects multiple records in other
tables. Normalization ensures that each table contains only one type of data and that
the relationships between the tables are clearly defined, which helps in avoiding such
anomalies.
Standardization: Normalization helps in standardizing the data in the database. By
organizing the data into tables and defining relationships between them, normalization
helps in ensuring that the data is stored in a consistent and uniform manner.
Anomalies example:
1. Update anomaly:
Consider a college database that keeps student information in a table called student, which
contains four columns: stu_id for the student's id, stu_name for the student's
name, stu_address for the student's address, and stu_club for the student's club. Eventually, the
table will appear as follows:
stu_id stu_name stu_address stu_club
330 Muthu Rajasthan Literature
330 Muthu Rajasthan Finance
331 Mukesh Mumbai Crypto
332 Nanda Karnataka Public Speaking
332 Nanda Karnataka Arts
For student Muthu, we have two columns in the above table as he belongs to two clubs at the
college. If we want to change Muthu's address, we must update it twice otherwise the data will
be inconsistent.
When the correct address gets updated in one club but not in another, Muthu would possess
two different addresses, which is not acceptable and could result in inconsistent data.
2. Insert anomaly:
Example
We use the same table in the previous example with modified data
stu_id stu_name stu_address stu_club
220 Annamalai Kerala yoga
220 Muthu Kerala Music
231 Mukesh Mumbai Crypto
232 Muni Karnataka Public Speaking
232 Muni Karnataka Arts
For example, in the above table if a new student named Nanda has joined the college and he
has no department affiliation as the club allows intake of students only from second year. Then
we can't insert the data of Nanda into the table since the stu_club field cannot accept null
values.
3. Delete anomaly:
Example In this example, we use modified data from the previous example
stu_id stu_name stu_address stu_club
120 Nanthu Maharasthra yoga
stu_id stu_name stu_address stu_club
122 Nanthu Maharashthra Music
131 Mukesh Mumbai Crypto
132 Muni Karnataka Public Speaking
132 Muni Karnataka Arts
Suppose, for instance, the college at some point closes the club crypto, then deleting the rows
that contain s_club as crypto would also delete the information of student Mukesh since he
belongs only to this department.
Normal Forms in DBMS
Normal
Forms Description of Normal Forms
First Normal A relation is in first normal form if every attribute in that relation is single-
Form (1NF) valued attribute.
Second A relation that is in First Normal Form and every non-primary-key attribute is
Normal Form fully functionally dependent on the primary key, then the relation is
(2NF) in Second Normal Form (2NF).
A relation is in the third normal form, if there is no transitive dependency for
non-prime attributes as well as it is in the second normal form. A relation is
in 3NF if at least one of the following conditions holds in every non-trivial
function dependency X –> Y.
Third Normal X is a super key.
Form (3NF) Y is a prime attribute (each element of Y is part of some candidate key).
For BCNF the relation should satisfy the below conditions
Boyce-Codd The relation should be in the 3rd Normal Form.
Normal Form X should be a super-key for every functional dependency (FD) X−>Y in a given
(BCNF) relation.
Normal
Forms Description of Normal Forms
Fourth A relation R is in 4NF if and only if the following conditions are satisfied:
Normal Form It should be in the Boyce-Codd Normal Form (BCNF).
(4NF) The table should not have any Multi-valued Dependency.
A relation R is in 5NF if and only if it satisfies the following conditions:
Fifth Normal R should be already in 4NF.
Form (5NF) It cannot be further non loss decomposed (join dependency).
Advantages of Normalization
Normalization eliminates data redundancy and ensures that each piece of data is stored
in only one place, reducing the risk of data inconsistency and making it easier to
maintain data accuracy.
By breaking down data into smaller, more specific tables, normalization helps ensure
that each table stores only relevant data, which improves the overall data integrity of
the database.
Normalization simplifies the process of updating data, as it only needs to be changed in
one place rather than in multiple places throughout the database.
Normalization enables users to query the database using a variety of different criteria,
as the data is organized into smaller, more specific tables that can be joined together as
needed.
Normalization can help ensure that data is consistent across different applications that
use the same database, making it easier to integrate different applications and ensuring
that all users have access to accurate and consistent data.
Disadvantages of Normalization
Normalization can result in increased performance overhead due to the need for
additional join operations and the potential for slower query execution times.
Normalization can result in the loss of data context, as data may be split across multiple
tables and require additional joins to retrieve.
Proper implementation of normalization requires expert knowledge of database design
and the normalization process.
Normalization can increase the complexity of a database design, especially if the data
model is not well understood or if the normalization process is not carried out correctly.
EXAMPLES:
https://www.tpointtech.com/anomalies-in-dbms
https://www.geeksforgeeks.org/introduction-of-database-normalization/
Types of Keys in Relational Model (Candidate, Super, Primary, Alternate and Foreign)
Keys are one of the basic requirements of a relational database model. It is widely used to
identify the tuples(rows) uniquely in the table. We also use keys to set up relations amongst
various columns and tables of a relational database.
Different Types of Database Keys
Super Key
The set of one or more attributes (columns) that can uniquely identify a tuple (record) is known
as Super Key. For Example, STUD_NO, (STUD_NO, STUD_NAME), etc.
A super key is a group of single or multiple keys that uniquely identifies rows in a table. It
supports NULL values in rows.
A super key can contain extra attributes that aren’t necessary for uniqueness. For example, if
the “STUD_NO” column can uniquely identify a student, adding “SNAME” to it will still form a
valid super key, though it’s unnecessary.
Candidate Key
The minimal set of attributes that can uniquely identify a tuple is known as a candidate key. For
Example, STUD_NO in STUDENT relation.
A candidate key is a minimal super key, meaning it can uniquely identify a record but contains
no extra attributes.
It is a super key with no repeated data is called a candidate key.
The minimal set of attributes that can uniquely identify a record.
A candidate key must contain unique values, ensuring that no two rows have the same value in
the candidate key’s columns.
Every table must have at least a single candidate key.
A table can have multiple candidate keys but only one primary key.
The candidate key can be simple (having only one attribute) or composite as well.
Primary Key
There can be more than one candidate key in relation out of which one can be chosen as the
primary key. For Example, STUD_NO, as well as STUD_PHONE, are candidate keys for relation
STUDENT but STUD_NO can be chosen as the primary key (only one out of many candidate
keys).
A primary key is a unique key, meaning it can uniquely identify each record (tuple) in a table.
It must have unique values and cannot contain any duplicate values.
A primary key cannot be NULL, as it needs to provide a valid, unique identifier for every record.
A primary key does not have to consist of a single column. In some cases, a composite primary
key (made of multiple columns) can be used to uniquely identify records in a table.
Databases typically store rows ordered in memory according to primary key for fast access of
records using primary key.
Alternate Key
An alternate key is any candidate key in a table that is not chosen as the primary key. In other
words, all the keys that are not selected as the primary key are considered alternate keys.
An alternate key is also referred to as a secondary key because it can uniquely identify records
in a table, just like the primary key.
An alternate key can consist of one or more columns (fields) that can uniquely identify a record,
but it is not the primary key
Foreign Key
A foreign key is an attribute in one table that refers to the primary key in another table. The
table that contains the foreign key is called the referencing table, and the table that is
referenced is called the referenced table.
A foreign key in one table points to the primary key in another table, establishing a relationship
between them.
It helps connect two or more tables, enabling you to create relationships between them. This is
essential for maintaining data integrity and preventing data redundancy.
They act as a cross-reference between the tables.
For example, DNO is a primary key in the DEPT table and a non-key in EMP
It may be worth noting that, unlike the Primary Key of any given relation, Foreign Key can be
NULL as well as may contain duplicate tuples i.e. it need not follow uniqueness constraint. For
Example, STUD_NO in the STUDENT_COURSE relation is not unique. It has been repeated for
the first and third tuples. However, the STUD_NO in STUDENT relation is a primary key and it
needs to be always unique, and it cannot be null.
Composite Key
Sometimes, a table might not have a single column/attribute that uniquely identifies all the
records of a table. To uniquely identify rows of a table, a combination of two or more
columns/attributes can be used. It still can give duplicate values in rare cases. So, we need to
find the optimal set of attributes that can uniquely identify rows in a table.
It acts as a primary key if there is no primary key in a table
Two or more attributes are used together to make a composite key .
Different combinations of attributes may give different accuracy in terms of identifying the rows
uniquely.
https://www.youtube.com/watch?v=_UZLrD_R0T4