Complete
Guide on
Data
cleaning
in sql
Email [Link]@[Link]
PART -1
Data
cleaning
IN sql
Email [Link]@[Link]
02
For this tutorial I have used
different Databases and tables.
This data is only for learning
purpose.
Let’s learn how to do Data
Cleaning in SQL.
03
Data Cleaning
INITIAL STEPS
First Step is to create a New table for our
cleaning purpose.
We should never clean data in original
dataset.
So let’s create a new temporary table
and insert our original data into it.
CREATE TABLE temp_employees LIKE employees;
04
Data Cleaning
CREATING TEMP TABLE
Employees is our original table and we
have created a same schema temporary
table from our original table.
SELECT * FROM temp_employees;
05
Data Cleaning
INSERTING VALUES IN TEMP TABLE
Now Let’s Insert data from our original
table to our temporary table to start data
cleaning.
INSERT INTO temp_employees
SELECT * FROM employees;
06
Data Cleaning
UNDERSTANDING SCHEMA
Now we have the data in our temporary
table we can start data cleaning.
Let’s look at the table data types and
constraints.
DESCRIBE temp_employees ;
07
Data Cleaning
UNDERSTANDING SCHEMA
We can now check the data types of the
columns whether it is correct or not.
We can also see all the constraints on our
columns.
08
Data Cleaning
CHANGING DATA TYPES
From the Schema we can clearly see that
hire_date column is of VARCHAR data type
instead of Date Type.
So let’s Convert this to correct data type.
We have 2 ways :
1. Altering the table
ALTER TABLE temp_employees
MODIFY COLUMN hire_date DATE;
09
Data Cleaning
CHANGING DATA TYPES
Casting and Converting Data
We use this when we don’t want to alter
our table.
SELECT STR_TO_DATE(hire_date,'%Y-%m-%d') as
hire_date FROM temp_employees;
We use this function to convert any column
to date data type to year-month-date
format.
10
Data Cleaning
CHANGING DATA TYPES
To Convert VARCHAR to INT we use
SELECT CAST(age as UNSIGNED) as age FROM
temp_employees;
VARCHAR to DATE
SELECT CONVERT(hire_date,DATE) as hire_date FROM
temp_employees;
11
Data Cleaning
CHANGING DATA TYPES
To Convert VARCHAR to DECIMAL
SELECT CONVERT(‘12345’,DECIMAL(10,2)) as salary
FROM temp_employees;
12
This is just the start of data
cleaning we have many things to
cover in this series.
So make sure to follow me and
save this pdf.
After all parts are done I will
merge all this pdf into single
data cleaning SQL Pdf.
For More Education Content
FOLLOW
ME ON
LINKEDIN
Please Like and
Repost if you find
this helpful
LinkedIn [Link]/in/dhruvik-detroja/
PART -2
Data
cleaning
IN sql
Email [Link]@[Link]
02
For this tutorial I have used
different Databases and tables.
This data is only for learning
purpose.
Let’s learn how to do Data
Cleaning in SQL.
Check Part 1 in Comment
Section.
03
Data Cleaning
HANDLING MISSING DATA
We have customers table like this.
First we need to identify missing data.
04
Data Cleaning
IDENTIFYING MISSING DATA
Step 1 : Identify Missing Data
SELECT
COUNT(*) AS TotalRows,
COUNT(CASE WHEN FullName IS NULL THEN 1 END) AS
MissingFullName,
COUNT(CASE WHEN Age IS NULL THEN 1 END) AS
MissingAge,
COUNT(CASE WHEN Email IS NULL THEN 1 END) AS
MissingEmail,
COUNT(CASE WHEN PhoneNumber IS NULL THEN 1 END) AS
MissingPhoneNumber,
COUNT(CASE WHEN SignupDate IS NULL THEN 1 END) AS
MissingSignupDate
FROM Customers;
05
Data Cleaning
IDENTIFYING MISSING DATA
This will return the count of null values
in our table.
06
Data Cleaning
ANALYZING MISSING DATA
Step 2 : Analyzing Missing Values
SELECT * FROM Customers WHERE
FullName IS NULL OR
Age IS NULL OR
Email IS NULL OR
PhoneNumber IS NULL OR
SignupDate IS NULL;
07
Data Cleaning
ANALYZING MISSING DATA
Now we have all the rows having some
kind of null values.
We can look at this data and we should
try to understand what can we do with
this null values.
There is only 2 options we can do to
handle this missing values
Delete Missing Dat
Adding Values to that Missing Data
08
Data Cleaning
HANDLING MISSING DATA
Step 3 : New Values for Missing Data
a) When we have missing data for
numerical columns we can set those
missing value according to mean,
median or mode approach.
Remember how to handle missing
completely depends on what kind of
data you are working on and applying
the domain knowledge on that dataset.
It should make sense to apply those
values when we have big dataset but
very low null values.
09
Data Cleaning
HANDLING MISSING DATA
UPDATE customers
SET Age = (SELECT Age FROM (SELECT AVG(Age) as Age
FROM customers) AS S)
WHERE Age IS NULL;
Using this query we are applying Mean
age to our all null values in age column.
10
Data Cleaning
HANDLING MISSING DATA
b) Imputing with some other value
We have phone number column we don’t
have any value we can put in this row so
we can write unknown to this column.
UPDATE Customers
SET PhoneNumber = 'Unknown'
WHERE PhoneNumber IS NULL;
11
Data Cleaning
HANDLING MISSING DATA
b) Imputing with some other value
Look at this data
12
Data Cleaning
HANDLING MISSING DATA
By Looking at that missing data and
applying simple logic we can say
industry for Airbnb should be Travel
So this time we have not null value we
have blank value which is different thing
We can have blank values as well as Null
Values in our dataset.
So Check it properly also for blank
values.
13
Data Cleaning
HANDLING MISSING DATA
UPDATE customers
SET industry = 'Travel'
WHERE company='Airbnb';
14
Data Cleaning
HANDLING MISSING DATA
Step 4 : Deleting Missing Data
This step should be avoided as this
involve data loss.
This should be our least priority when
handling null values.
If we don’t have any other option then
deleting the data then only we should
use this.
When dataset is bigger and there are
only few null values then it’s fine to
delete this data.
15
Data Cleaning
HANDLING MISSING DATA
Let’s remove data which doesn’t have
FullName.
DELETE FROM Customers
WHERE FullName IS NULL;
16
Data Cleaning
HANDLING MISSING DATA
We have successfully deleted the record.
17
This is just the start of data
cleaning we have many things to
cover in this series.
So make sure to follow me and
save this pdf.
After all parts are done I will
merge all this pdf into single
data cleaning SQL Pdf.
For More Education Content
FOLLOW
ME ON
LINKEDIN
Please Like and
Repost if you find
this helpful
LinkedIn [Link]/in/dhruvik-detroja/
PART -3
Data
cleaning
IN sql
Email [Link]@[Link]
02
For this tutorial I have used
different Databases and tables.
This data is only for learning
purpose.
Let’s learn how to do Data
Cleaning in SQL.
Check Part 2 in Comment
Section.
03
Data Cleaning
HANDLING DUPLICATE DATA
We have customers table like this.
First we need to identify duplicate data.
04
Data Cleaning
IDENTIFYING DUPLICATE DATA
First we need to identify duplicate data.
SELECT
FullName, Age, Email, PhoneNumber, SignupDate, COUNT(*)
FROM
Customers
GROUP BY
FullName, Age, Email, PhoneNumber, SignupDate
HAVING
COUNT(*) > 1;
05
Data Cleaning
IDENTIFYING DUPLICATE DATA
As you can see we are grouping all
columns and also using Count(*) > 1 which
give us records which are duplicate.
We got Count value as 3 this tell us that
this record is written 3 times and so we
need to remove 2 duplicate records from
this table.
06
Data Cleaning
IDENTIFYING DUPLICATE DATA
But we don’t have CustomerID for this
column so we need to write some kind of
code to find it.
We can also delete this record giving all
column values in our delete statement or
we can find CustomerID for records and
then delete records using those
CustomerID.
07
Data Cleaning
IDENTIFYING DUPLICATE DATA
This is the code to find CustomerID
SELECT
CustomerID, FullName, Age, Email, PhoneNumber, SignupDate
FROM
Customers
WHERE
(FullName, Age, Email, PhoneNumber, SignupDate) IN (
SELECT
FullName, Age, Email, PhoneNumber, SignupDate
FROM
Customers
GROUP BY
FullName, Age, Email, PhoneNumber, SignupDate
HAVING
COUNT(*) > 1
ORDER BY
FullName, Age, Email, PhoneNumber, SignupDate;
08
Data Cleaning
REMOVING DUPLICATE DATA
We can keep first occurance and remove
other duplicate records.
DELETE FROM Customers
WHERE CustomerID IN (4,7)
Note: Before doing any Update/Delete
Operation also Commit the database.
This is important when we might have
done something wrong we can always
rollback to that state of database again.
Next Query We can use so we do not need
to hard code any value.
09
Data Cleaning
REMOVING DUPLICATE DATA
WITH CTE AS (
SELECT
CustomerID,
FullName,
Age,
Email,
PhoneNumber,
SignupDate,
ROW_NUMBER() OVER (PARTITION BY FullName, Age, Email,
PhoneNumber, SignupDate ORDER BY CustomerID) AS row_num
FROM
Customers
-- Step 2: Delete duplicates, keeping the first occurrence
DELETE FROM Customers
WHERE CustomerID IN (
SELECT CustomerID
FROM CTE
WHERE row_num > 1
);
10
Data Cleaning
RECHECKING DUPLICATE DATA
Using the same SQL Query we should
double check at the end whether All
duplicate records are removed or not.
For More Education Content
FOLLOW
ME ON
LINKEDIN
Please Like and
Repost if you find
this helpful
LinkedIn [Link]/in/dhruvik-detroja/
PART -4
Data
cleaning
IN sql
Email [Link]@[Link]
02
For this tutorial I have used
different Databases and tables.
This data is only for learning
purpose.
Let’s learn how to do Data
Cleaning in SQL.
Check Part 3 in Comment
Section.
03
Data Cleaning
STANDARDIZING
Standardizing is the process to make
data standardize means ensuring all the
data values are in perfect format.
You will Understand this topic better
with examples.
So we can standardize different
categories of data like text , numbers
and dates.
Let’s start standardizing.
04
Data Cleaning
STANDARDIZING DATES
We have this Orders Table let’s
standardize it into proper date format.
We will Standardize it to yyyy-mm-dd
format.
05
Data Cleaning
STANDARDIZING DATES
-- Format: 'YYYY/MM/DD'
UPDATE Orders
SET OrderDate = DATE_FORMAT(STR_TO_DATE(OrderDate,
'%Y/%m/%d'), '%Y-%m-%d')
WHERE OrderDate LIKE '%/%/%';
-- Format: 'DD-MM-YYYY'
UPDATE Orders
SET OrderDate = DATE_FORMAT(STR_TO_DATE(OrderDate,
'%d-%m-%Y'), '%Y-%m-%d')
WHERE OrderDate LIKE '__-__-____';
STR_TO_DATE() function is used to convert
string to date data type.
06
Data Cleaning
STANDARDIZING DATES
-- Format: 'Month DD, YYYY'
UPDATE Orders
SET OrderDate = DATE_FORMAT(STR_TO_DATE(OrderDate, '%M
%d, %Y'), '%Y-%m-%d')
WHERE OrderDate LIKE '% %,%';
-- Format: '[Link]'
UPDATE Orders
SET OrderDate = DATE_FORMAT(STR_TO_DATE(OrderDate, '%Y.
%m.%d'), '%Y-%m-%d')
WHERE OrderDate LIKE '%.%.%';
After we have date format we can format
this date using DATE_FORMAT() function.
07
Data Cleaning
STANDARDIZING DATES
Here is the result.
08
Data Cleaning
STANDARDIZING TEXT FIELD
We have this employees table and as you
can see text field needs to be standardized
as there are some spaces also some words
are in capital some in smaller letter.
Let’s standardize this text fields.
09
Data Cleaning
STANDARDIZING TEXT FIELD
-- Convert full name to proper case
UPDATE Employees
SET FullName = CONCAT(
UPPER(LEFT(TRIM(FullName), 1)),
LOWER(SUBSTR(TRIM(FullName), 2))
);
-- Trim leading and trailing spaces from email addresses
UPDATE Employees
SET Email = TRIM(Email);
TRIM() Function is used to remove wide
spaces from both sides then LEFT function
is used to retrieve the characters from left
side mentioning index we have used 1 so it
will take 1st character of our FullName.
10
Data Cleaning
STANDARDIZING TEXT FIELD
UPPER() is used to uppercase our
character and here we want 1st character
upper, then SUBSTR() is used to retrieve
substring by mentioning index we have
used 2 that mean start from 2nd index to
end of characters so we will get resultant
substring starting from 2nd position to end
LOWER() is used to lower case the letters.
Then We have used CONCAT to merge
both strings the 1st character which will be
now capital and after string will be in
Lower case
Ex-dhruvik d | hruvik = Dhruvik
11
Data Cleaning
STANDARDIZING TEXT FIELD
This will be our Result.
I have used multiple functions so you get
opportunity to learn all functions try those
functions separately when practicing.
12
Data Cleaning
STANDARDIZING PHONE NUMS
We have Contacts Table like this
Let’s Standardize this.
13
Data Cleaning
STANDARDIZING PHONE NUMS
We have Contacts Table like this
UPDATE Contacts
SET PhoneNumber = CONCAT(
SUBSTR(PhoneNumber, 2, 3), '-',
SUBSTR(PhoneNumber, 7, 3), '-',
SUBSTR(PhoneNumber, 11, 4)
WHERE PhoneNumber LIKE '(%';
-- Remove non-numeric characters and format as '123-456-7890'
UPDATE Contacts
SET PhoneNumber = CONCAT(
SUBSTR(PhoneNumber, 1, 3), '-',
SUBSTR(PhoneNumber, 5, 3), '-',
SUBSTR(PhoneNumber, 9, 4))
WHERE PhoneNumber LIKE '%-%' OR PhoneNumber LIKE '%.%'
OR PhoneNumber LIKE '% %';
14
Data Cleaning
STANDARDIZING PHONE NUMS
Note :
When we talk about using SUBSTR() let’s
understand how it works 1st argument is
column_name , 2nd argument is Index
Position from which we want to retrieve
the string, 3rd argument is how much
characters we want after that index
position.
15
Data Cleaning
STANDARDIZING ADDRESS
Now we have Addresses Table like this.
Let’s Standardize this.
16
Data Cleaning
STANDARDIZING ADDRESS
We have Contacts Table like this
-- Convert state abbreviations to uppercase
UPDATE Addresses
SET State = UPPER(State);
-- Standardize street suffixes (e.g., 'st.' to 'Street')
UPDATE Addresses
SET Street = REPLACE(Street, ' st.', ' Street')
WHERE Street LIKE '% st.';
-- Remove extra spaces and standardize city names
UPDATE Addresses
SET City = CONCAT(
UPPER(LEFT(TRIM(City), 1)),
LOWER(SUBSTR(TRIM(City), 2))
);
-- Standardize ZIP code format (e.g., '90001-1234' to '90001')
UPDATE Addresses
SET ZipCode = SUBSTR(TRIM(ZipCode), 1, 5)
WHERE ZipCode LIKE '%-%';
17
Data Cleaning
STANDARDIZING ADDRESS
This time we have a new REPLACE()
function which has 1st argument as column
name, 2nd argument value we want to
replace and 3rd argument value which we
want to replace with.
We already know other stuff so here is the
result.
18
Data Cleaning
END OF STANDARDIZING
We have covered most of the important
stuffs in our data cleaning series and we
have few topics left in the series.
If you find my content value then please
like , repost and share it this will help me
grow my engagement and provide more
value to the community.
Check Comment Box for other parts of this
series.
Thank You!
Dhruvik Detroja
For More Education Content
FOLLOW
ME ON
LINKEDIN
Please Like and
Repost if you find
this helpful
LinkedIn [Link]/in/dhruvik-detroja/
PART -5
Data
cleaning
IN sql
Email [Link]@[Link]
02
For this tutorial I have used
different Databases and tables.
This data is only for learning
purpose.
Let’s learn how to do Data
Cleaning in SQL.
Check Part 4 in Comment
Section.
LinkedIn [Link]/in/dhruvik-detroja/
03
Data Cleaning
OUTLIERS
Outliers are the data points which
deviate significantly from the rest of the
dataset.
Outliers can distort statistical analysis
and lead to inaccurate conclusions.
So today we are going to learn how to
handle outliers with step by step
process.
Ready?
Let’s Go
LinkedIn [Link]/in/dhruvik-detroja/
04
Data Cleaning
IDENTIFYING OUTLIERS
To identify outliers we have different
method let’s learn them
1. Z Score Method:
Measures how many standard deviations
a data point is from the mean.
Normally we use Z score technique when
data is normally distributed.
We have sales table like this.
Let’s learn how to find Z score.
LinkedIn [Link]/in/dhruvik-detroja/
05
Data Cleaning
IDENTIFYING OUTLIERS
LinkedIn [Link]/in/dhruvik-detroja/
06
Data Cleaning
IDENTIFYING OUTLIERS
SELECT
SaleID,
CustomerID,
SaleAmount,
(SaleAmount - (SELECT AVG(SaleAmount) FROM Sales)) /
(SELECT STDDEV(SaleAmount) FROM Sales) AS ZScore
FROM
Sales
HAVING
ABS(ZScore) > 3;
-- Assuming a threshold of 3 for identifying outliers
Mostly if Z score is above +-3 we
consider it as outlier.
LinkedIn [Link]/in/dhruvik-detroja/
07
Data Cleaning
IDENTIFYING OUTLIERS
2. IQR Method:
Interquartile Range (IQR) is used to
identify outliers mostly used in skewed
distributions.
IQR is range between Q1 (first quartile)
and Q3 (third quartile)
IQR= Q3-Q1
Outliers are below Q1-1.5*IQR
OR
Outliers are above Q3+1.5*IQR
LinkedIn [Link]/in/dhruvik-detroja/
08
Data Cleaning
IDENTIFYING OUTLIERS
We have the Products table like this.
Let’s find outliers
LinkedIn [Link]/in/dhruvik-detroja/
09
Data Cleaning
IDENTIFYING OUTLIERS
-- Calculate the total number of rows
SELECT COUNT(*) INTO @row_count FROM Products;
-- Calculate Q1
WITH OrderedPrices AS (
SELECT
Price,
ROW_NUMBER() OVER (ORDER BY Price) AS RowNum
FROM Products
),
Quartiles AS (
SELECT
Price AS Q1
FROM OrderedPrices
WHERE RowNum = FLOOR(@row_count * 0.25) + 1
SELECT Q1 INTO @Q1 FROM Quartiles;
LinkedIn [Link]/in/dhruvik-detroja/
10
Data Cleaning
IDENTIFYING OUTLIERS
-- Calculate Q3
WITH OrderedPrices AS (
SELECT
Price,
ROW_NUMBER() OVER (ORDER BY Price) AS RowNum
FROM Products
),
Quartiles AS (
SELECT
Price AS Q3
FROM OrderedPrices
WHERE RowNum = FLOOR(@row_count * 0.75) + 1
SELECT Q3 INTO @Q3 FROM Quartiles;
LinkedIn [Link]/in/dhruvik-detroja/
11
Data Cleaning
IDENTIFYING OUTLIERS
SET @IQR = @Q3 - @Q1;
-- Calculate the lower and upper bounds for outliers
SET @LowerBound = @Q1 - 1.5 * @IQR;
SET @UpperBound = @Q3 + 1.5 * @IQR;
-- Identify outliers
SELECT
ProductID,
ProductName,
Price
FROM
Products
WHERE
Price < @LowerBound OR Price > @UpperBound;
LinkedIn [Link]/in/dhruvik-detroja/
12
Data Cleaning
IDENTIFYING OUTLIERS
Here is the result and we have successfully
found the outlier.
LinkedIn [Link]/in/dhruvik-detroja/
13
Data Cleaning
ANALYZING OUTLIERS
This was just a sample dataset.
But Real world don’t works always on
formula there might be cases where you
find a data point which is very deviated
from dataset but it might not be outlier.
But for your analysis purpose you might
remove it or might not it completely
depends on what data we have in hand.
So Analyzing Outliers is very important
before taking any next step so check data
entry errors, understand business context
or identify rare but valid cases.
LinkedIn [Link]/in/dhruvik-detroja/
14
Data Cleaning
ANALYZING OUTLIERS
Take Example of rooms booked in a hotel
dataset is like:
Rooms Booked
Here 12 is outlier but still
it’s possible in real world
1
to book 12 rooms at a
2
time.
2
So might not be any
12
data entry error here.
3 Now to remove or not
completely depends on
analysis you want to
perform.
LinkedIn [Link]/in/dhruvik-detroja/
15
Data Cleaning
DECIDING HANDLING STRATEGY
We can do following things to handle
outlier after detecting
Remove Outlier
Replacing Outliers with some other
value.
You can replace it with mean, maybe with
median or with some other value it
completely depends on what type of
analysis you doing or will replacement
value will make sense for those outliers.
LinkedIn [Link]/in/dhruvik-detroja/
16
Data Cleaning
REMOVING OUTLIERS
Simply delete the data after identifying
outliers if you want to remove outliers.
We have identified lower bound and upper
bound using IQR so let’s remove data
which outside of this bound.
We have Contacts Table like this
DELETE FROM Products
WHERE Price < @LowerBound OR Price > @UpperBound
For Replacing Value simply use Update
statement with your replacement value.
LinkedIn [Link]/in/dhruvik-detroja/
For More Education Content
FOLLOW
ME ON
LINKEDIN
Please Like and
Repost if you find
this helpful
LinkedIn [Link]/in/dhruvik-detroja/
PART -6
Data
cleaning
IN sql
Email [Link]@[Link]
02
For this tutorial I have used
different Databases and tables.
This data is only for learning
purpose.
Let’s learn how to do Data
Cleaning in SQL.
Check Part 5 in Comment
Section.
LinkedIn [Link]/in/dhruvik-detroja/
03
Data Cleaning
ENSURING CONSTRAINTS
During Data Cleaning Process its
important that we should add
meaningful constraints to our columns.
So we can minimize our data insertion
errors.
Consider a simple employees table from
the next page.
LinkedIn [Link]/in/dhruvik-detroja/
04
Data Cleaning
ENSURING CONSTRAINTS
CREATE TABLE Employees (
EmployeeID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Email VARCHAR(100),
PhoneNumber VARCHAR(20),
HireDate DATETIME,
Salary DECIMAL(10, 2),
DepartmentID INT
);
Now Applying Constraints in these columns
will help in future preventing data insertion
errors.
LinkedIn [Link]/in/dhruvik-detroja/
05
Data Cleaning
ENSURING CONSTRAINTS
ALTER TABLE Employees
ADD CONSTRAINT uq_email UNIQUE (Email);
ALTER TABLE Employees
ADD CONSTRAINT chk_salary CHECK (Salary >= 0);
ALTER TABLE Employees
MODIFY COLUMN HireDate DATETIME DEFAULT NOW();
Similarly we can also apply primary key,
not null foreign key constraints.
What constraints you need to apply
completely depends on your data.
LinkedIn [Link]/in/dhruvik-detroja/
06
Data Cleaning
REMOVING UNNECESSARY COLUMNS
Sometimes during data cleaning we might
want to remove some unnecessary
columns which might not be very useful to
our data analysis so we should drop those
columns.
LinkedIn [Link]/in/dhruvik-detroja/
07
Data Cleaning
REMOVING UNNECESSARY COLUMNS
ALTER TABLE employees
DROP COLUMN MiddleName;
As we are working in temporary tables for
our data cleaning so its fine to remove
those unnecessary columns.
The original data will be still present in our
main data table.
LinkedIn [Link]/in/dhruvik-detroja/
08
Data Cleaning
DERIVING NEW COLUMN
When we are performing data analysis we
might want to derive some new column
using some calculations in our existing
table.
Take similar example of our employees
table.
Let’s derive new columns for our table
let’s see what types of new columns we
can create.
LinkedIn [Link]/in/dhruvik-detroja/
09
Data Cleaning
DERIVING NEW COLUMN
We have birthdate column we need to
derive new age column.
LinkedIn [Link]/in/dhruvik-detroja/
10
Data Cleaning
DERIVING NEW COLUMN
Add column to the table
We have birthdate column we need to
derive newEmployees
ALTER TABLE age column.
ADD COLUMN Age INT;
Adding data to New Column
We have birthdate column we need to
derive
UPDATE new age column.
Employees
SET Age = YEAR(CURDATE()) - YEAR(BirthDate)
- (DATE_FORMAT(CURDATE(), '%m-%d') <
DATE_FORMAT(BirthDate, '%m-%d'));
LinkedIn [Link]/in/dhruvik-detroja/
11
Data Cleaning
DERIVING NEW COLUMN
We have salary column we need to derive
bonus for our employees.
LinkedIn [Link]/in/dhruvik-detroja/
12
Data Cleaning
DERIVING NEW COLUMN
Add column to the table
We have birthdate column we need to
derive newEmployees
ALTER TABLE age column.
ADD COLUMN YearlyBonus DECIMAL(10, 2);
We assuming bonus to be 15% of salary
We have birthdate column we need to
derive
UPDATE new age column.
Employees
SET YearlyBonus = Salary * 0.15;
LinkedIn [Link]/in/dhruvik-detroja/
13
Data Cleaning
DERIVING NEW COLUMN
Here we go we have our new derived
columns which can be useful for our data
analysis.
So this is how we can derive new columns
during our data cleaning , it’s actually a
part of preparing data for our future data
analysis.
LinkedIn [Link]/in/dhruvik-detroja/
For More Education Content
FOLLOW
ME ON
LINKEDIN
Please Like and
Repost if you find
this helpful
LinkedIn [Link]/in/dhruvik-detroja/