0% found this document useful (0 votes)

51 views43 pages

UNIT IV Data Cleaning Techhniques

This document outlines data cleaning techniques and the importance of data munging in Power BI, including tasks such as handling missing values, removing duplicates, and transforming data for analysis. It emphasizes the role of Power Query Editor in shaping data and ensuring its quality for accurate reporting and decision-making. Additionally, it discusses factors affecting data quality and the shared responsibility of data cleaning across an organization.

Uploaded by

hardworkermanu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views43 pages

UNIT IV Data Cleaning Techhniques

Uploaded by

hardworkermanu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

UNIT IV

UNIT IV
Data Cleaning Techniques
Data Munging - Getting, cleaning, and shaping data , Cleanse data - Merge, shape, and filter data -
Group and aggregate data - Insert calculated columns
Power BI Data Model
Creating data Model - Explain what a data model is - Create relationships between tables in the
model - Create and use a star schema - Understand when and how to de-normalize the data –
Create and use linked tables

Data Cleaning Techniques

Data Munging - Getting, cleaning, and shaping data , Cleanse data - Merge, shape, and filter data - Group
and aggregate data - Insert calculated colum

Data Munging

Data munging, also known as data wrangling, is the process of cleaning and transforming data in
Power BI to make it suitable for analysis and visualization.
This involves tasks like
• Handling missing values
• Removing duplicates
• Fixing inconsistencies
• Formatting data into the correct types.

Data Cleaning:
This involves identifying and addressing issues like missing data, duplicate rows, inconsistent
formatting, and incorrect data types.

Data Transformation:
This involves reshaping data (e.g., pivoting or unpivoting columns), combining data from
multiple sources, and creating new columns based on existing ones.

Data Formatting:
This ensures that data is in a consistent and usable format for analysis, including data type
conversion and standardization of text and numbers.

Power Query Editor:

Power BI's primary tool for data munging, allowing you to connect to various data sources,
perform transformations, and clean data before loading it into the data model.

Common data munging tasks in Power BI:

• Handling Missing Values:
You can remove rows with missing data, replace missing values with a specific value, or
use more advanced techniques like interpolation.
• Removing Duplicates:
You can easily remove duplicate rows from your dataset using the Power Query Editor.
UNIT IV

• Fixing Inconsistencies:
This might involve standardizing text formats, correcting spelling errors, or converting
values to the correct data type.
• Transforming Data:
You can use Power Query's features to pivot, unpivot, aggregate, and split columns.
• Creating Calculated Columns:
You can create new columns that are derived from existing data, using DAX (Data
Analysis Expressions) formulas.
• Aggregating Data:
You can group data by specific columns and perform calculations like sum, average, or
count.
• Data Blending:
You can combine data from multiple sources into a single dataset for analysis.

Benefits of Data Munging:

• Improved Data Quality:
Data munging ensures that your data is accurate, consistent, and ready for analysis, leading
to more reliable insights.
• Efficient Analysis:
Cleaned and transformed data is easier to analyze and visualize, saving time and effort.
• Accurate Reports and Dashboards:
Data munging ensures that your reports and dashboards are based on accurate and reliable
data.

Clean, transform, and load data in Power BI

Power Query has an incredible number of features that are dedicated to helping you clean and
prepare your data for analysis. Power query editor role
• How to simplify a complicated model
• Change data types
• Rename objects, and pivot data.
• How to profile columns so as to know which columns have the valuable data for deeper
analytics.

Learning objectives :By the end of this module, you’ll be able to:
• Resolve inconsistencies, unexpected or null values, and data quality issues.
• Apply user-friendly value replacements.
• Profile data so you can learn more about a specific column before using it.
• Evaluate and transform column data types.
• Apply data shape transformations to table structures.
• Combine queries.
• Apply user-friendly naming conventions to columns and queries.
• Edit M code in the Advanced Editor.
UNIT IV

Introduction

Consider the scenario where you have imported data into Power BI from several different sources
and, when you examine the data, it is not prepared for analysis. What could make the data unprepared
for analysis?

When examining the data, you discover several issues, including:

• A column called Employment status only contains numerals.

• Several columns contain errors.

• Some columns contain null values.

• The customer ID in some columns appears as if it was duplicated repeatedly.

• A single address column has combined street address, city, state, and zip code.

You start working with the data, but every time you create visuals on reports, you get bad data, incorrect
results, and simple reports about sales totals are wrong.
UNIT IV

Dirty data can be overwhelming and, though you might feel frustrated, you decide to get to work and
figure out how to make this semantic model as pristine as possible.

Fortunately, Power BI and Power Query offer you a powerful environment to clean and prepare the
data. Clean data has the following advantages:

• Measures and columns produce more accurate results when they perform aggregations and
calculations.

• Tables are organized, where users can find the data in an intuitive manner.

• Duplicates are removed, making data navigation simpler. It will also produce columns that can be
used in slicers and filters.

• A complicated column can be split into two, simpler columns. Multiple columns can be combined
•

• into one column for readability.

• Codes and integers can be replaced with human readable values.

In this module, you will learn how to:

• Resolve inconsistencies, unexpected or null values, and data quality issues.

• Apply user-friendly value replacements.

• Profile data so you can learn more about a specific column before using it.

• Evaluate and transform column data types.

• Apply data shape transformations to table structures.

• Combine queries.
• Apply user-friendly naming conventions to columns and queries.

• Edit M code in the Advanced Editor.

Shape the initial data

Power Query Editor in Power BI Desktop allows you to shape (transform) your imported data. Power
Query can do actions such as
renaming columns or tables,
changing text to numbers
removing rows
setting the first row as headers, and much more
UNIT IV

Note: It is important to shape data to ensure that it is suitable for use in reports.
You have loaded raw sales data from two sources into a Power BI model. Some of the data came from
a .csv file that was created manually in Microsoft Excel by the Sales team. The other data was loaded
through a connection to your organization's Enterprise Resource Planning (ERP) system. Now, when you
look at the data in Power BI Desktop, you notice that it's in disarray; some data that you don't need and
some data that you do need are in the wrong format.
You need to use Power Query Editor to clean up and shape this data before you can start building reports.
Get started with Power Query Editor
To start shaping your data, open Power Query Editor by selecting the Transform data option on
the Home tab of Power BI Desktop.

In Power Query Editor, the data in your selected query displays in the middle of the screen and, on
the left side, the Queries pane lists the available queries (tables).
UNIT IV

An un-titled power Query editor appears

Select New Source

UNIT IV

Select New Source excel workbook

browse to excel work book and select the file apple_products.xslx

UNIT IV
UNIT IV

The excel data appears at middle of screen, then click ok

Now we can observe the data set as above.

Identify column headers and names

Step1:
Identify the column headers and names within the data
Then check, whether they located in the right place or not, if not make them to appear in right place
Case#1
In some cases, where the column headers and names are jumbled, Reorganize thges to reorganize the
data.

As in our case we case apple_products.xlsx

UNIT IV

Observe the data,

Here we have multiple columns, we need observer them clearly which one we require for analysis, which
we did not require.

In our case you can observe the second column(Product_URL), which is no essential for us, now we can
remove the column Product_URL.
Click on the column Product_URL
UNIT IV

Now right click and select option remove

Now, the column Product_URL is removed

UNIT IV

Now it time to split the First column(Product Name) as per our requirement

Now we split the first row

APPLE iPhone 8 Plus (Gold, 64 GB)

UNIT IV

Now select first row, and right click and go to split and select delimiter

Now new screen appears as

UNIT IV

Now we select the space as delimiter and select the option split at leftmost delimiter and click ok
UNIT IV

Now select the rest of product name and go to split and select custom delimiter as ( and select split @
leftmost delimiter.
UNIT IV

Similarly, select next column and select delimiter as ) and split at right most delimiter.
UNIT IV

Now, we have Gold,64GB. Similarly, select the column and select delimiter as comma and split at
left/right most delimiter
UNIT IV

Now you can observe this column, we have accessories along with phone

Now, it is time to separate accessories EarPods and Power Adapter in separate column
UNIT IV

Now, it is time to separate accessories EarPods and Power Adapter into separate columns
UNIT IV

Now observe this column, which contains word Includes we need to remove

Now select split space as delimiter and split @ right most delimiter in lieu of left most delimiter
UNIT IV

Remove the column which contains (Includes

UNIT IV
UNIT IV

Now we move column brand as first column

select at the top of the column and click mouse move towards left till the column brand becomes first
UNIT IV

Now we have first and second columns have same data, Now we can remove second column
UNIT IV

Now its time to rename the columns, first go to second column Product Name 2.1 which we need rename
UNIT IV

Product Name 2.1 to to renamed as Model

UNIT IV

Similarly, select next column Product Name.2.2.1.1 and rename as color

Similarly change all the columns which has been split earlier as per out requirement
UNIT IV
UNIT IV

The above one is a empty column which we need to remove

UNIT IV

Now we can observe, that one cell has (PRODUCT)RED, which is to be replaced with Red.
Now go to that cell and right click
UNIT IV

This is all about the splitting column and renaming columns and go to file and select save as
UNIT IV

give appropriate name and the file is saved as Power Bi file with extention .pbix
UNIT IV
UNIT IV
UNIT IV

Now it is time to find the missing values

Similarly, select next column and select delimiter as ) and split at right most delimiter
When you work in Power Query Editor, all steps that you take to shape your data are recorded.
Then, each time the query connects to the data source, it automatically applies your steps, so your
data is always shaped the way that you specified. Power Query Editor only makes changes to a
particular view of your data, so you can feel confident about changes that are being made to your
original data source. You can see a list of your steps on the right side of the screen, in the Query
Settings pane, along with the query's properties.
The Power Query Editor ribbon contains many buttons you can use to select, view, and shape
your data.
To learn more about the available features and functions, see The query ribbon.
UNIT IV

With Power BI Desktop, you can connect to many different types of data sources, then shape the
data to meet your needs, enabling you to create visual reports to share with others. Shaping data means
transforming the data: renaming columns or tables, changing text to numbers, removing rows, setting the
first row as headers, and so on.
Combining data means connecting to two or more data sources, shaping them as needed,
then consolidating them into a single query.

What is data quality?

First understand some key basics of what will affect your data quality.
Data quality is essential for accurate analysis, informed decision-making,and successful business
outcomes.
Understanding factors that affect data quality and recognizing the importance of data cleaning are crucial
steps in the data preparation process.
In general, several factors describe and make up the quality of a dataset for analysis:

Data accuracy:
Data accuracy means the extent to which data represents the true values and attributes it is
intended to capture, indicating the degree to which it aligns with the true, real-world information it
seeks to represent.
Factors such as human errors during data entry, system glitches, or outdated information can
compromise data accuracy.

Data completeness:
This describes the degree to which all required data elements are present in a dataset.
Missing or incomplete data can occur due to data collection errors
system limitations, or data integration challenges.

Data consistency:
The uniformity and coherence of data across different sources or datasets.
Inconsistencies may arise from variations in data formats, naming conventions, or conflicting data
definitions.

Data validity:
Refers to the extent to which data conforms to defined rules, constraints, or standards. Invalid data
can result from data entry errors, data integration issues, or changes in business rules.

Data timeliness:
The relevance and currency of data in relation to the analysis or reporting time frame.
Outdated or stale data can lead to inaccurate insights and hinder decision-making.

Where do data quality issues come from?

Data quality issues can arise from various sources throughout the data life cycle.
Some common origins of data quality issues:

Data entry errors:

Mistakes made during manual data entry processes can introduce errors such as typos,
misspellings, or incorrect values.
Human error, lack of training, or inadequate validation mechanisms can contribute to data entry
issues.

Incomplete or missing data:

UNIT IV

Data may be incomplete or have missing values due to various reasons, such as data collection
processes that fail to capture all required information, data entry omissions, or system
limitations that prevent data collection.

Data integration challenges:

When combining data from multiple sources or systems, inconsistencies can arise due to
differences in data formats, naming conventions, or data structures. Mismatched or
incompatible data elements can lead to data quality issues.
Data transformation and manipulation:
Data transformations, such as aggregations, calculations, or data conversions, can introduce errors
if not implemented correctly.
Issues can arise from improper formulas, incorrect assumptions, or errors in the data manipulation
process.

Data storage and transfer:

Unreliable storage systems may lead to data corruption, loss, or unauthorized access, impacting
data quality. Events such as hardware failures or system crashes can result in data loss, affecting
completeness and accuracy. Delays in data transfer may lead to latency issues, with outdated or
stale data impacting the accuracy of analyses. Lastly, incompatibility between systems during data
transfer can cause format mismatches and structural issues, introducing inaccuracies.

Data governance and documentation:

Inadequate data governance practices, including a lack of data standards, data definitions, or
metadata documentation, can lead to misunderstandings or misinterpretations of data, resulting in
data quality problems.
Poor documentation of data lineage makes it challenging to trace the origin of quality issues and
prevent future problems.

Data changes and updates:

As data evolves over time, changes in business rules, system updates, or modifications to data
sources can impact data quality. Data may become outdated, inconsistent, or no longer aligned
with the intended use.

External data sources:

When incorporating data from external sources, such as third-party providers or open datasets,
data quality issues may arise.
Inaccurate or unreliable data from external sources can affect the overall data quality.

The role of data cleaning in improving data quality

Data cleaning is a task for data professionals or analysts, the responsibility for ensuring clean data
extends beyond a specific team or department. Data cleaning is to be considered a shared responsibility
within a company, involving stakeholders from all levels and functions.

Data integrity and accuracy

Data cleaning plays a vital role in maintaining data integrity and accuracy.
Inaccurate or inconsistent data can lead to flawed analysis, flawed decision-making, and potential
business risks.
By recognizing data cleaning as a shared responsibility, all individuals working with data can
contribute to maintaining the integrity of the data they generate, use, or interact with.

Decision-making and business outcomes

Data serves as the foundation for informed decision-making.
UNIT IV

When data quality is compromised, the decisions made based on that data are also compromised.
By acknowledging the impact of data quality on business outcomes, individuals across the
organization can understand the importance of data cleaning in facilitating accurate insights and
driving successful outcomes.

Data ownership and accountability

Every individual who interacts with data, regardless of their role, possesses a level of data
ownership and accountability.
By considering data cleaning as part of this ownership, employees become active participants in
maintaining data quality.
When employees take responsibility for the accuracy and cleanliness of the data they work with, a
culture of data stewardship is fostered within the organization.

A holistic view of the data ecosystem

Data flows across departments, systems, and processes within an organization.
Each touchpoint introduces the potential for data quality issues.
Recognizing data cleaning as everyone’s responsibility encourages individuals to consider the
broader data ecosystem and how their actions impact the quality of data used by others.
This holistic view promotes collaboration and communication to address data quality concerns.

Early detection of issues

Data cleaning is not just about rectifying existing issues;
it also involves proactively identifying and addressing data quality issues.
Employees on the front lines of data collection, entry, and analysis are often the first to notice
anomalies or inconsistencies.
By encouraging a culture where data issues are shared and addressed promptly, organizations can
mitigate the impact of poor data quality before it cascades into larger problems.

Continuous improvement and learning

Data cleaning presents opportunities for continuous improvement and learning.
When individuals actively participate in data cleaning, they gain insights into data patterns,
common errors, and areas for improvement.
This knowledge can be shared, leading to enhanced data collection processes,better data entry
practices, and improved data quality over time.

Empowerment and collaboration

Recognizing the shared responsibility of data cleaning empowers employees to take ownership of
the data they work with.
It fosters a sense of collaboration and accountability, as individuals understand that their actions
impact the overall data quality and, consequently, the success of the organization.
By leveraging the collective expertise and commitment of employees, organizations can
effectively address data quality challenges.

Best practices for data quality overall

How you can actually clean your data with Power BI.
Dirty data can have a significant impact on business operations, decision-making, and overall success.
To combat the challenges posed by dirty data, organizations must establish robust data cleaning practices.
Best practices that businesses can implement to effectively tackle dirty data and ensure data quality
throughout their operations.

Establishing data quality standards

UNIT IV

Define clear data quality standards that align with your organization’s goals and objectives.
These standards should include criteria for accuracy, completeness, consistency, validity, and
timeliness.

Developing a data governance framework:

Develop a comprehensive data governance framework that outlines roles, responsibilities,
and processes for data management.
Establish data stewardship roles to oversee data quality initiatives and enforce data
governance policies.
This framework will help create a structured approach to data cleaning and ensure
accountability across the organization.

Implementing data validation and verification techniques:

Apply data validation and verification techniques to identify and resolve inconsistencies,
errors, and outliers.
Use automated validation rules, data profiling tools, and statistical analysis to check data
against predefined rules and validate its accuracy.
Implement regular data audits to identify data quality issues and take corrective actions.

Standardizing data entry processes:

Standardize data entry processes to minimize human errors and ensure consistent data
formats.
Implement data entry controls, such as drop-down menus, data validation lists, and
input masks, to guide data entry and prevent incorrect or inconsistent data.
Provide training and guidelines to employees on proper data entry practices.

Leveraging data cleaning tools and technologies:

Utilize data cleaning tools and technologies, such as Power Query in Power BI, to
automate the data cleaning process.
These tools can help identify and correct data errors, remove duplicates, handle missing
values, and perform various data transformations.
Invest in appropriate data cleaning solutions based on your specific requirements.

Collaborating across departments:

Data cleaning is a collaborative effort that involves various departments within an
organization.
Foster collaboration and communication between departments to address data quality
issues holistically.
Encourage cross-functional teams to share insights, exchange knowledge, and work
collectively toward data cleaning goals.

Continuous data monitoring:

Implement mechanisms for ongoing data monitoring to proactively identify and resolve
data quality issues.
Establish data quality metrics, set up alerts and notifications for data anomalies, and
regularly monitor data quality dashboards.
Continuously monitor data sources, data pipelines, and data integration processes to
maintain high data quality standards.

Data education and training:

Provide data education and training programs to equip employees with the necessary skills
and knowledge to understand and address data quality issues.
UNIT IV

Offer training on data cleaning techniques, data entry best practices, data validation
methods, and the importance of data quality.
This education will empower employees to take ownership of data quality and contribute
to the battle against dirty data.

Ensure that all stakeholders understand and adhere to these standards,fostering a shared commitment to
data quality.

Combine data in Power BI Desktop

Open/get data from web : https://www.fool.com/research/best-states-to-retire

UNIT IV

Click on web
UNIT IV

paste the url https://www.fool.com/research/best-states-to-retire

select ok to fetch the data from web

UNIT IV

Select anonymous and connect

What Does it Mean to Merge Tables in Power BI?

Merging tables in Power BI means combining two or more tables into a single table.
One of Power BI's unique ability to combine data from multiple sources into one unifying platform.
Whether your data comes from a warehouse, Excel files, third-party applications, or even a webpage,
Power BI can access it and merge it.
The beauty of Power BI is the simplicity of this process. You don't need to learn any complex syntax or
take a course in programming just to merge your data. The experience on Power BI is very intuitive and
easy to follow.
For example, you might have one table with customer details and another table with their purchase
history. By merging these tables, you can create a unified view that shows both customer information and
their purchases in one place.
In Power BI, you do this by using the "Merge Queries" feature. It allows you to join tables based on
common columns, like customer ID or product ID, so you can see all the related data together.

4 Clean Transform, and Load Data in Power BI
100% (3)
4 Clean Transform, and Load Data in Power BI
88 pages
Week 2 Stats
No ratings yet
Week 2 Stats
46 pages
Data Cleaning Steps in Power BI
No ratings yet
Data Cleaning Steps in Power BI
12 pages
Data Cleaning in Power Query - Best Practices and Techniques
No ratings yet
Data Cleaning in Power Query - Best Practices and Techniques
20 pages
powerBI - Lesson2
No ratings yet
powerBI - Lesson2
33 pages
Module 4 Notes
No ratings yet
Module 4 Notes
16 pages
Dta102 - Bid102 - Day 2
No ratings yet
Dta102 - Bid102 - Day 2
21 pages
POWER BI - Student
No ratings yet
POWER BI - Student
63 pages
Module 3
No ratings yet
Module 3
47 pages
Lab 02 Manual
No ratings yet
Lab 02 Manual
9 pages
Power BI Data Analyst
No ratings yet
Power BI Data Analyst
51 pages
POWER BI - Participant
No ratings yet
POWER BI - Participant
35 pages
Data Cleaning in Power BI ? 1732181739
No ratings yet
Data Cleaning in Power BI ? 1732181739
11 pages
Data Visualization for B.Sc Students
No ratings yet
Data Visualization for B.Sc Students
45 pages
Data Cleaning in Power BI
100% (1)
Data Cleaning in Power BI
9 pages
Power Query Data Transformation Guide
No ratings yet
Power Query Data Transformation Guide
21 pages
Power BI - Transformation and Modeling
No ratings yet
Power BI - Transformation and Modeling
36 pages
Wsly B.I Record 5-24 PDF
No ratings yet
Wsly B.I Record 5-24 PDF
33 pages
Excel PowerQuery PowerPivot DAX
No ratings yet
Excel PowerQuery PowerPivot DAX
105 pages
Introduction To Power Query Week 2
No ratings yet
Introduction To Power Query Week 2
35 pages
PowerBI Compressed
No ratings yet
PowerBI Compressed
42 pages
Excel BI Tools for Data Analysts
No ratings yet
Excel BI Tools for Data Analysts
29 pages
Introduction of Power BI
No ratings yet
Introduction of Power BI
3 pages
Power Query: Transforming Data in Excel
No ratings yet
Power Query: Transforming Data in Excel
13 pages
Power Query Transforming Data in Power BI Like A Pro
No ratings yet
Power Query Transforming Data in Power BI Like A Pro
31 pages
Session 4 - Details
No ratings yet
Session 4 - Details
18 pages
Power BI Blog 4
No ratings yet
Power BI Blog 4
3 pages
002 Course Slides 2025
No ratings yet
002 Course Slides 2025
168 pages
Power BI Desktop Training Overview
No ratings yet
Power BI Desktop Training Overview
70 pages
Wa0001.
No ratings yet
Wa0001.
24 pages
PowerBI Excel Import Practical NO 3
No ratings yet
PowerBI Excel Import Practical NO 3
3 pages
Power BI Unit-1 Complete Study Guide
No ratings yet
Power BI Unit-1 Complete Study Guide
5 pages
Basics of Power BI
No ratings yet
Basics of Power BI
56 pages
Document Blankj
No ratings yet
Document Blankj
6 pages
Data Analysis Using Excel Handouts
No ratings yet
Data Analysis Using Excel Handouts
95 pages
Hand Out On Data Cleaning
No ratings yet
Hand Out On Data Cleaning
4 pages
Shaping and Combining Data
No ratings yet
Shaping and Combining Data
21 pages
Notes BAE
No ratings yet
Notes BAE
9 pages
StepByStepProject Ebook WaveX
No ratings yet
StepByStepProject Ebook WaveX
43 pages
Power+BI Cheat+Sheet
No ratings yet
Power+BI Cheat+Sheet
1 page
Power BI
No ratings yet
Power BI
6 pages
Power Query Tutorial1
No ratings yet
Power Query Tutorial1
36 pages
Creating Simple Files Using Power BI
No ratings yet
Creating Simple Files Using Power BI
11 pages
Power BI Syllabus
No ratings yet
Power BI Syllabus
12 pages
Step-by-Step Data Cleaning in Power Query
No ratings yet
Step-by-Step Data Cleaning in Power Query
3 pages
Power Bi
No ratings yet
Power Bi
8 pages
What Ios Transform Tab
No ratings yet
What Ios Transform Tab
61 pages
PowerBI Detailed Data Preparation Notes
No ratings yet
PowerBI Detailed Data Preparation Notes
3 pages
Power Bi Notes
No ratings yet
Power Bi Notes
5 pages
Power BI Course for Professionals
No ratings yet
Power BI Course for Professionals
31 pages
Power BI Desktop
No ratings yet
Power BI Desktop
111 pages
Daily Running Notes - Power Query
No ratings yet
Daily Running Notes - Power Query
21 pages
PBI 1.2 (Final)
No ratings yet
PBI 1.2 (Final)
27 pages
Day-2 Power Bi
No ratings yet
Day-2 Power Bi
31 pages
Power Query Tutorial1
No ratings yet
Power Query Tutorial1
45 pages
Power BI Manual - 01
No ratings yet
Power BI Manual - 01
22 pages
Power Query Tutorial1
No ratings yet
Power Query Tutorial1
34 pages
Suprem Work Acp Bord Noida
No ratings yet
Suprem Work Acp Bord Noida
33 pages
Database - IBM Db2
No ratings yet
Database - IBM Db2
17 pages
NIT Patna Resume 1
No ratings yet
NIT Patna Resume 1
1 page
Advanced Android Programming Guide
No ratings yet
Advanced Android Programming Guide
4 pages
Hybrid Recommendation Project Report
No ratings yet
Hybrid Recommendation Project Report
3 pages
Unit III Interactive SQL and Performance Tuning
No ratings yet
Unit III Interactive SQL and Performance Tuning
20 pages
Veera Venkata Vijaya Krishna Nallam
No ratings yet
Veera Venkata Vijaya Krishna Nallam
4 pages
Test Bank For Concepts of Database Management 10th Edition
No ratings yet
Test Bank For Concepts of Database Management 10th Edition
18 pages
Final Weka Lab Tutorial
No ratings yet
Final Weka Lab Tutorial
142 pages
Data Governance Scorecard Template
No ratings yet
Data Governance Scorecard Template
3 pages
Secure Servers Hosted On Amazon Web Services (AWS) EC2 and ECS
No ratings yet
Secure Servers Hosted On Amazon Web Services (AWS) EC2 and ECS
2 pages
Software Developer Career Overview
No ratings yet
Software Developer Career Overview
2 pages
Revtest 12 Sample 1
No ratings yet
Revtest 12 Sample 1
3 pages
Introduction To Object Oriented Database: Unit-I
No ratings yet
Introduction To Object Oriented Database: Unit-I
67 pages
How To Knit Your Data Mesh On Snowflake-2
No ratings yet
How To Knit Your Data Mesh On Snowflake-2
14 pages
HHLD - Youtube
No ratings yet
HHLD - Youtube
71 pages
Sen Microproject Completed
No ratings yet
Sen Microproject Completed
15 pages
Spreadsheet & Database Basics
No ratings yet
Spreadsheet & Database Basics
3 pages
Python MySQL DBMS Connectivity Q&A
No ratings yet
Python MySQL DBMS Connectivity Q&A
3 pages
EO9 Release Notes
No ratings yet
EO9 Release Notes
6 pages
Farmer's Assistance 1
No ratings yet
Farmer's Assistance 1
63 pages
Cs Final
No ratings yet
Cs Final
123 pages
Object-Relational DBMS Overview
No ratings yet
Object-Relational DBMS Overview
10 pages
Title of The Project: A Project Report On
100% (1)
Title of The Project: A Project Report On
9 pages
CISA
No ratings yet
CISA
7 pages
Sap S4 Hana: Finance Module
No ratings yet
Sap S4 Hana: Finance Module
85 pages
2 BW 4HANAVsSAP DATASPHERE
No ratings yet
2 BW 4HANAVsSAP DATASPHERE
3 pages
CHAPTER 7 Past Paper Marking Scheme
No ratings yet
CHAPTER 7 Past Paper Marking Scheme
10 pages
Lab Practical On OLAP (Online Analytical Processing)
No ratings yet
Lab Practical On OLAP (Online Analytical Processing)
5 pages
Machine Learning Techniques
100% (2)
Machine Learning Techniques
45 pages
Unit 4 JDBC
No ratings yet
Unit 4 JDBC
40 pages

UNIT IV Data Cleaning Techhniques

Uploaded by

UNIT IV Data Cleaning Techhniques

Uploaded by

UNIT IV

Data Cleaning Techniques

Power Query Editor:

Common data munging tasks in Power BI:

Benefits of Data Munging:

Clean, transform, and load data in Power BI

When examining the data, you discover several issues, including:

• A column called Employment status only contains numerals.

• Several columns contain errors.

• Some columns contain null values.

• The customer ID in some columns appears as if it was duplicated repeatedly.

• into one column for readability.

• Codes and integers can be replaced with human readable values.

In this module, you will learn how to:

• Resolve inconsistencies, unexpected or null values, and data quality issues.

• Apply user-friendly value replacements.

• Evaluate and transform column data types.

• Apply data shape transformations to table structures.

• Edit M code in the Advanced Editor.

Shape the initial data

An un-titled power Query editor appears

Select New Source

Select New Source excel workbook

browse to excel work book and select the file apple_products.xslx

The excel data appears at middle of screen, then click ok

Now we can observe the data set as above.

Identify column headers and names

As in our case we case apple_products.xlsx

Observe the data,

Now right click and select option remove

Now, the column Product_URL is removed

Now we split the first row

APPLE iPhone 8 Plus (Gold, 64 GB)

Now new screen appears as

Remove the column which contains (Includes

Now we move column brand as first column

Product Name 2.1 to to renamed as Model

Similarly, select next column Product Name.2.2.1.1 and rename as color

The above one is a empty column which we need to remove

Now it is time to find the missing values

What is data quality?

Where do data quality issues come from?

Data entry errors:

Incomplete or missing data:

Data integration challenges:

Data storage and transfer:

Data governance and documentation:

Data changes and updates:

External data sources:

The role of data cleaning in improving data quality

Data integrity and accuracy

Decision-making and business outcomes

Data ownership and accountability

A holistic view of the data ecosystem

Early detection of issues

Continuous improvement and learning

Empowerment and collaboration

Best practices for data quality overall

Establishing data quality standards

Developing a data governance framework:

Implementing data validation and verification techniques:

Standardizing data entry processes:

Leveraging data cleaning tools and technologies:

Collaborating across departments:

Continuous data monitoring:

Data education and training:

Combine data in Power BI Desktop

Open/get data from web : https://www.fool.com/research/best-states-to-retire

paste the url https://www.fool.com/research/best-states-to-retire

select ok to fetch the data from web

Select anonymous and connect

What Does it Mean to Merge Tables in Power BI?

You might also like