0% found this document useful (0 votes)
28 views17 pages

Normalization and Data Analysis - Introduction

Data normalization is the process of cleaning and organizing data to make it clear and machine-readable, involving the use of primary, foreign, and composite keys. It consists of three normal forms (1NF, 2NF, 3NF) that eliminate redundancy and ensure data integrity by structuring data into related tables. The main purpose of normalization is to avoid complexities, eliminate duplicates, and optimize data storage while improving access and consistency.

Uploaded by

Tommy Zitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views17 pages

Normalization and Data Analysis - Introduction

Data normalization is the process of cleaning and organizing data to make it clear and machine-readable, involving the use of primary, foreign, and composite keys. It consists of three normal forms (1NF, 2NF, 3NF) that eliminate redundancy and ensure data integrity by structuring data into related tables. The main purpose of normalization is to avoid complexities, eliminate duplicates, and optimize data storage while improving access and consistency.

Uploaded by

Tommy Zitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Normalization and

Data Analysis
What is data normalization?
Data normalization is just cleaning up the collected data so as to make it more
clear and machine-readable.

Format 1
Format 2
Keys
• A primary key is a column that uniquely identifies the rows of data in
that table. It’s a unique identifier such as an employee ID, student ID,
voter’s identification number (VIN), and so on.

• A foreign key is a field that relates to the primary key in another table.

• A composite key is just like a primary key, but instead of having a


column, it has multiple columns.
Types of Data
How does data normalization work?
Example:
First Normal Form (1NF)
To make it 1NF database, each cell should have only a single value, and every record
needs to be unique.

CustID CustName CustAge Purchases

1 Mark 19 Sugar

1 Mark 19 Rice

2 Paul 18 Milk

2 Paul 18 Carrot

3 Suzanne 17 Banana

3 Suzanne 17 Icecream
Second Normal Form (2NF)
For a database to be 2NF, it has to be 1NF. And all non-key columns (columns that can’t be used to identify a
record) should be fully functional dependant on the primary key (the column used to identify a record uniquely,
in this case, CustID).

CustID Purchases

1 Sugar

1 Rice

2 Milk

2 Carrot

3 Banana

3 Icecream
Third Normal Form (3NF)
This level removes columns that don't relate to the primary key, or first
column, of data. Instead, all columns provide information that relates
to the first column of information.

• In the example, only one primary key was identified, hence there are
no columns that depend on non-key attributes.
Example 2

Is this relation un-normalised, in 1NF, 2NF or 3NF? Provide reasons….


Example 2: 2NF
• Relation is already in 1NF.
• Normalise into 2NF
Example 2: 3NF
What is the Purpose of
Normalization?
• The main purpose of database normalization is to avoid complexities, eliminate
duplicates, and organize data in a consistent way. In normalization, the data is
divided into several tables linked together with relationships.

• Database administrators are able to achieve these relationships by using


primary keys, foreign keys, and composite keys.

• To get it done, a primary key in one table, for example, employee_wages is


related to the value from another table, for instance, employee_data.
Rules – 1NF
• For a table to be in the first normal form, it must meet the following
criteria:

- a single cell must not hold more than one value (atomicity)
- there must be a primary key for identification
- no duplicated rows or columns
- each column must have only one value for each row in the table
Rules – 2NF
• The 1NF only eliminates repeating groups, not redundancy. That’s
why there is 2NF.

• A table is said to be in 2NF if it meets the following criteria:

- it’s already in 1NF


- has no partial dependency. That is, all non-key attributes are fully
dependent on a primary key.
Rules – 3NF
• When a table is in 2NF, it eliminates repeating groups and
redundancy, but it does not eliminate transitive partial dependency.

• This means a non-prime attribute (an attribute that is not part of the
candidate’s key) is dependent on another non-prime attribute. This is
what the third normal form (3NF) eliminates.

• So, for a table to be in 3NF, it must:

- be in 2NF
- have no transitive partial dependency.
Why is it important to normalize
data?
• Corrects duplicate data and anomalies
• Removes unwanted data connections
• Prevents data deletion
• Optimizes data storage space
• Adds new data
• Improves access and interpretation of data
• Creates a logical map of data
• Increases data consistency
• Creates data connections
• Saves time and money
Resources
• https://www.freecodecamp.org/news/database-normalization-1nf-2n
f-3nf-table-examples/

• https://www.indeed.com/career-advice/career-development/why-is-it
-important-to-normalize-data

• https://blog.invgate.com/data-normalization

You might also like