Excel Data Cleaning
Made Easy: Tips,
Tools and Examples
Pooja Pawar
Data cleaning is an essential step in preparing data for analysis. It
involves identifying and correcting errors, ensuring consistency, and
formatting data for usability. Excel offers numerous tools and
functions to make this process efficient. Here’s an in-depth guide to
cleaning data in Excel with detailed explanations and fresh examples.
1. Remove Duplicates
Duplicate data can lead to inaccurate analysis. Removing duplicates
ensures each record is unique.
Steps to Remove Duplicates:
1. Select the dataset.
2. Go to the Data tab and click on Remove Duplicates.
3. Choose the columns where duplicates might exist (e.g.,
"Email" and "Order ID").
4. Click OK, and Excel will retain the first occurrence and
remove duplicates.
Pooja Pawar
Example:
Customer Name Email Order ID
After removing duplicates, the duplicate record for Lisa Ray is
removed.
2. Handle Blank Cells
Blank cells can disrupt calculations or cause errors during analysis.
Depending on your needs, you can either fill, delete, or replace them.
Find Blank Cells:
o Press Ctrl + G > Special > Blanks to highlight empty cells.
Fill Blank Cells:
o Replace blanks with default values using Ctrl + H (Find and
Replace).
Pooja Pawar
o Use formulas like =IF(A1="", "No Data", A1) to handle
blanks dynamically.
Example:
Product Name Sales
Phone 500
Laptop
Tablet 300
Replace blanks in the "Sales" column with 0, so the table
becomes:
Product Name Sales
Phone 500
Laptop 0
Tablet 300
3. Trim Extra Spaces
Extra spaces in data can cause issues with sorting, filtering, and
matching records. The TRIM function removes unnecessary spaces.
Pooja Pawar
Formula: =TRIM(A1)
What It Does: Removes all leading, trailing, and extra spaces,
leaving only single spaces between words.
Example: Input: " Sarah Johnson " Output: "Sarah Johnson"
4. Standardize Data Types
Inconsistent data types, such as numbers stored as text or improperly
formatted dates, can cause errors in calculations.
Convert Text to Numbers:
o Select the range > Click on the error flag (⚠️) > Select
Convert to Number.
o Alternatively, use =VALUE(A1).
Convert Text to Date:
o Use Text to Columns:
1. Go to Data > Text to Columns.
2. Select the format (e.g., MDY, DMY) to convert text to
a proper date.
Pooja Pawar
Example: Input: "01-15-2025" (as text)
Output: 01/15/2025 (as a date).
5. Use Find and Replace
The Find and Replace tool is an efficient way to standardize or
correct data.
Steps:
1. Press Ctrl + H.
2. Enter the value to find and the replacement value.
3. Click Replace All.
Example: Replace all instances of "NY" with "New York" in the
"City" column.
Name City
Michael Ross NY
Donna Paulsen LA
Pooja Pawar
Output:
Name City
Michael Ross New York
Donna Paulsen LA
6. Standardize Text Formatting
To ensure consistency in text formatting, use Excel's text functions.
Functions:
o =UPPER(A1) – Converts text to uppercase.
o =LOWER(A1) – Converts text to lowercase.
o =PROPER(A1) – Capitalizes the first letter of each word.
Example: Input: "john doe" Output: "John Doe"
7. Split and Merge Data
Split Data into Columns:
o Use the Text to Columns tool to split data by a delimiter
(e.g., commas, spaces, or tabs).
Pooja Pawar
o Example: Split "Alice, New York, 25" into three columns:
Name, City, Age.
Merge Data into One Column:
o Use =A1 & " " & B1 or =CONCATENATE(A1, " ", B1) to
combine data.
Example:
First Name Last Name
Alex Carter
Formula: =A1 & " " & B1
Output: "Alex Carter"
8. Handle Errors
Errors in data can interrupt analysis. Excel offers functions to manage
errors effectively.
Formula: =IFERROR(A1/B1, "Error")
What It Does: Replaces errors (e.g., division by zero) with a
custom message.
Example: If cell B1 is 0, =A1/B1 will result in #DIV/0!. Using
=IFERROR(A1/B1, "Invalid") will return "Invalid" instead.
Pooja Pawar
9. Detect Outliers
Outliers can distort your analysis and should be identified and
addressed.
Steps:
o Use the IQR Method:
1. Calculate Q1: =QUARTILE(A1:A100, 1)
2. Calculate Q3: =QUARTILE(A1:A100, 3)
3. Find the IQR: =Q3-Q1
4. Flag outliers: =IF(A1<(Q1-1.5*IQR) OR
A1>(Q3+1.5*IQR), "Outlier", "Normal").
10. Automate Cleaning with Power Query
Power Query simplifies repetitive cleaning tasks.
Steps:
1. Go to Data > Get Data > From Table/Range.
2. Use options like Remove Duplicates, Filter Rows, Replace
Values, and Split Columns.
3. Load the cleaned data back into Excel.
Pooja Pawar
Example: Combine monthly sales data from multiple sheets
into one consolidated table with duplicate removal.
11. Check for Consistency
Use Data Validation to restrict invalid inputs.
o Go to Data > Data Validation > Set rules (e.g., numbers
only between 1-100).
Example: Ensure only valid country codes (e.g., US, UK, CA) are
entered in the "Country" column.
12. Remove Unnecessary Data
Delete Unused Rows/Columns:
o Highlight unused rows/columns > Right-click > Delete.
Filter Irrelevant Data:
o Use filters to hide irrelevant data and delete it.
Pooja Pawar
13. Check Spelling
Misspellings can lead to inconsistent data. Use Spell Check (F7) to
find and correct errors.
Example: Correct "Calfornia" to "California."
14. Document Changes
Maintain a log of all changes, either in a separate sheet or a
comments column.
Pooja Pawar
Practical Examples: Real-World Data Cleaning in
Excel
1. Cleaning Sales Data for a Retail Company
Customer Name Purchase Date Email Sales Amount Region
Emily Carter 01/05/2023 emily@gmail,com 500 East
Cleaning Steps:
1. Remove Extra Spaces: Use =TRIM(A2) to clean "Emily Carter."
2. Fix Date Format: Standardize "Purchase Date" to DD-MM-YYYY
using Text to Columns or formatting tools.
3. Correct Email Errors: Replace invalid characters in emails using
Find and Replace (e.g., ,com to .com).
4. Handle Missing Data: Replace "NA" in "Purchase Date" with a
placeholder like "01-01-1900" for incomplete records.
5. Correct Negative Values: Replace the negative sales amount (-
200) with 0 using =IF(A1<0, 0, A1).
Pooja Pawar
2. Cleaning Patient Records in a Healthcare System
Patient Name Age Appointment Date Gender Diagnosis
Sarah Smith 28 2023/12/01 Female Diabetes
Jake Thomas NA 01-Dec-2023 Male Hypertension
Mia Brown 35 2023-12-01 F Diabetes
Liam Davis 41 December 1, 2023 M Hypertension
Cleaning Steps:
1. Fill Missing Age Values: Replace "NA" in "Age" with the average
age using =AVERAGE(B2:B100) or a default value.
2. Standardize Gender Entries: Replace "F" and "M" with
"Female" and "Male" using Find and Replace.
3. Format Dates: Ensure all "Appointment Dates" are in the DD-
MM-YYYY format.
4. Check for Consistency: Ensure only allowed diagnoses
("Diabetes," "Hypertension") are present using Data Validation.
Pooja Pawar
3. Cleaning Employee Data for HR Reports
Employee Name Department Salary Date of Joining Email
John Williams Marketing 45000 15-03-2022 johnw@gmail,com
Ava Thompson Sales NA 2022-03-15 ava.thompson@domain
David Johnson Engineering -50000 March 15, 2022 [email protected]
Cleaning Steps:
1. Correct Negative Salary: Replace -50000 with "Error" or 0 using
=IF(C2<0, "Error", C2).
2. Handle Missing Data: Replace "NA" in "Salary" with the median
salary value.
3. Standardize Date Format: Format all "Date of Joining" to DD-
MM-YYYY.
4. Correct Email Errors: Fix invalid email entries by replacing
",com" with ".com" and adding ".com" where missing.
5. Validate Departments: Ensure only valid departments (e.g.,
Marketing, Sales, HR) are present using Data Validation.
Pooja Pawar
4. Cleaning Product Inventory Data for an E-Commerce Store
Product Name Category Price Stock SKU
iPhone 13 Electronics 799 100 IP13-2023
Running Shoes Sports NA 200 RS123
Coffee Mug Home & Kitchen -15 500 CMUG_100
Laptop - Dell XPS Electronics 1200 50 DELLXPS,23
Cleaning Steps:
1. Handle Missing Price: Replace "NA" with the average price of
the same category using Pivot Tables or a formula.
2. Correct Negative Values: Replace negative prices (-15) with 0.
3. Fix SKU Format: Replace commas in SKUs with dashes using
Find and Replace.
4. Check Consistency in Categories: Ensure category names match
predefined values using Conditional Formatting.
5. Remove Extra Spaces: Clean up "Product Name" with
=TRIM(A2).
Pooja Pawar
5. Cleaning Customer Feedback Data for Marketing
Customer Name Rating Feedback Comments Feedback Date
Lily Anderson 5 Excellent service! 2023/11/15
Ethan Taylor NA Great product, slow delivery 15-Nov-2023
Emma Martinez 2 Needs improvement. November 15, 2023
Noah Harris -3 Poor packaging. 15/11/2023
Cleaning Steps:
1. Fill Missing Ratings: Replace "NA" with the average rating using
=AVERAGE(B2:B100).
2. Correct Negative Ratings: Replace negative ratings (-3) with 0
using =IF(B2<0, 0, B2).
3. Standardize Feedback Date: Convert all dates to DD-MM-YYYY
format using Text to Columns.
4. Clean Feedback Comments: Remove extra spaces or accidental
line breaks using =CLEAN(C2).
5. Validate Ratings: Ensure all ratings are within 1 to 5 using
Conditional Formatting or a validation rule.
Pooja Pawar