-{ONE}-
INTRODUCTION TO DATA
ANALYTICS
DATA
Data refers to any collection of facts, statistics, information, or records that are either raw or processed. It can take various
forms, including text files, images, videos, audio recordings, spreadsheets, databases, and more.
Data is typically used to derive insights, support decision-making, and facilitate analysis or research in various fields.
Types of Data
There are two main categories of data:
Numerical Data: 1. This type of data represents quantitative values and can be further divided into two subcategories:
[email protected]
a. Discrete Data: Refers to data with finite or countable values, such as the number of cars in a parking lot.
b. Continuous Data: Represents data that can take any value within a range, such as temperature or time.
Categorical Data: 2. Also known as qualitative or nominal data, it represents data in non-numeric form and is further
divided into two subcategories:
a. Nominal Data: Represents data without any specific order or hierarchy, such as colors or names of cities.
b. Ordinal Data: Refers to data with a specific order or hierarchy, such as ratings or rankings.
What is analytics?
Analytics is derived from analysis, which is the detailed examination of the elements or structure of something.
It involves breaking a complex topic or substance into smaller parts in order to gain a better understanding of it.
What is Data Analytics?
Data analytics is the process of examining and analyzing raw data to extract meaningful insights, patterns, and trends. It involves
applying various techniques, tools, and algorithms to transform data into actionable information that can support decision-
making, problem-solving, and strategic planning.
[email protected]
The process of analyzing data typically moves through the following phases:
1. Define the problem or question
2. Data collection
3. Data Preprocessing:
4. Exploratory Data Analysis (EDA)
5. Data modeling
6. Interpretation and evaluation
7. Data visualization and reporting
8. Action and implementation
Types of Data Analytics
1. Descriptive Analytics: Descriptive analytics focuses on summarizing and understanding historical data to provide insights into
what has happened in the past. It involves aggregating and visualizing data to identify patterns, trends, and key metrics.
Descriptive analytics answers questions such as "What happened?" and "How did it happen?"
2. Diagnostic Analytics: Diagnostic analytics delve deeper into understanding the causes and factors that contributed to specific
outcomes or events. It involves analyzing past data to identify patterns and relationships that help answer the question "Why did
it happen?" Diagnostic analytics helps organizations uncover insights about the root causes of successes or failures.
[email protected]
3. Predictive Analytics: Predictive analytics uses historical data, statistical algorithms, and machine learning techniques to make
informed predictions or forecasts about future outcomes. By analyzing patterns and relationships in data, predictive analytics enables
organizations to answer questions like met?" "What is likely to happen?" or "What will happen if certain conditions are
This type of analytics is valuable in making proactive decisions and planning for potential scenarios.
4. Prescriptive Analytics: Prescriptive analytics takes data analysis a step further by suggesting possible courses of action to
optimize outcomes or solve problems. It uses a combination of historical data, predictive modeling, and optimization techniques to
provide recommendations on what actions to take. Prescriptive analytics helps answer questions like "What should we do?" or
"How can we achieve the best possible outcome?".
[email protected]
Data Analyst Tasks And Responsibilities
A data analyst is a person whose job is to gather and interpret data in order to solve a specific problem. The role includes
plenty of time spent with data but entails communicating findings too.
Here’s what many data analysts do on a day-to-day basis:
Gather data: Analysts often collect data themselves. This could include conducting surveys, tracking visitor characteristics
on a company website, or buying datasets from data collection specialists.
Clean data: Raw data might contain duplicates, errors, or outliers. Cleaning the data means maintaining the quality of data
in a spreadsheet or through a programming language so that your interpretations won’t be wrong or skewed.
Model data: This entails creating and designing the structures of a database. You might choose what types of data to
store and collect, establish how data categories are related to each other, and work through how the data actually appears.
Interpret data: Interpreting data will involve finding patterns or trends in data that could answer the question at hand.
Presententation: Communicating the results of your findings will be a key part of your job. You do this by putting
together visualizations like charts and graphs, [email protected] reports, and presenting information to interested
parties.
What Tools Do Data Analysts Use?
During the process of data analysis, analysts often use a wide variety of tools to make their work more accurate and
efficient. Some of the most common tools in the data analytics industry include:
Microsoft Excel
Microsoft Power BI
SQL
Tableau
Google Sheets
R or Python
Jupyter Notebooks
Types Of Data Analyst
As advancing technology has rapidly expanded the types and amount of information we can collect, knowing how to gather,
sort, and analyze data has become a crucial part of almost any industry. You’ll find data analysts in the criminal justice, fashion,
food, technology, business, environment, and public sectors—among many others.
[email protected]
People who perform data analysis might have other titles, such as:
Medical and health care analyst
Market research analyst
Business analyst
Business intelligence analyst
Operations research analyst
Intelligence analyst
Interested in business intelligence? Gain skills in data modeling and data visualization.
Data Analyst Technical Skills
Database tools: Microsoft Excel and SQL should be mainstays in any data analyst’s toolbox. While Excel is ubiquitous across
industries, SQL can handle larger sets of data and is widely regarded as a necessity for data analysis.
Programming languages: Learning a statistical programming language like Python or R will let you handle large sets of data
and perform complex equations. Though Python and R are among the most common, it’s a good idea to look at several job
descriptions of a position you’re interested in to determine which language will be most useful to your industry.
[email protected]
Data visualization: Presenting your findings in a clear and compelling way is crucial to being a successful data analyst.
Knowing how best to present information through charts and graphs will make sure colleagues, employers, and stakeholders
will understand your work. Tableau, Jupyter Notebook, and Excel are among the many tools used to create visuals.
Statistics and math: Knowing the concepts behind what data tools are actually doing will help you tremendously in your
work. Having a solid grasp of statistics and math will help you determine which tools are best to use to solve a particular
problem, help you catch errors in your data, and have a better understanding of the results.
Data Analyst Workplace Skills
Problem solving: A data analyst needs to have a good understanding of the question being asked and the
problem that needs to be solved. They also should be able to find patterns or trends that might reveal a story.
Having critical thinking skills will allow you to focus on the right types of data, recognize the most revealing
methods of analysis, and catch gaps in your work.
Communication: Being able to get your ideas across to other people will be crucial to your work as a data
analyst. Strong written and speaking skills to communicate with colleagues and other stakeholders are good
assets to have as a data analyst.
[email protected]
Industry knowledge: Knowing about the industry you work in—health care, business, finance, or otherwise—
will give you an advantage in your work and in job applications. If you’re trying to break into a specific industry,
take some time to pay attention to the news in your industry or read a book on the subject. This can familiarize
you with the industry’s main issues and trends.
-{TWO}-
THE DATA ANALYTICS PROCESS
The data analytics process
The data analytics process involves a systematic approach to collecting, organizing, analyzing, and interpreting data
to gain insights and make informed decisions.
[email protected]
While the specific steps may vary depending on the context and goals, here is a general overview of the data
analytics process:
1. Define the problem or question: Clearly articulate the problem or question you want to address through data
analysis. This step helps set the direction for the entire process.
2. Data collection: Identify and gather relevant data from various sources, such as databases, APIs, surveys, or
other data repositories. Ensure that the data collected is accurate, complete, and representative of the problem or
question at hand.
3. Data Preprocessing: Cleanse and prepare the data for analysis. This involves tasks like handling missing values,
removing duplicates, standardizing formats, and transforming variables as required. Data preprocessing sets the
foundation for accurate and meaningful analysis.
4. Exploratory Data Analysis (EDA): Perform initial exploratory analysis to understand the data better. This may
involve techniques like summarizing data through descriptive statistics, visualizing data through charts or graphs,
and identifying patterns or outliers.
[email protected]
5. Data modeling: Apply appropriate analytical techniques and models to extract insights from the data. This step
depends on the nature of the problem and may involve statistical analysis, machine learning algorithms, predictive
modeling, or other methods.
6. Interpretation and evaluation: Analyze the results from the data modeling step and interpret the findings.
Assess the quality and reliability of the insights obtained and evaluate them against the original problem or question.
Consider the limitations of the analysis and potential biases.
7. Data visualization and reporting: Communicate the results effectively through visualizations, reports, or
dashboards. Presenting data visually can enhance understanding and facilitate decision-making for stakeholders
who may not have technical expertise.
8. Action and implementation: Based on the insights and recommendations derived from the analysis, take
appropriate actions or make informed decisions. Implement changes, strategies, or interventions as necessary to
address the original problem or question.
Dashboard
[email protected]
A visual interface that presents key data points, metrics, and trends in a clear, concise, and visually engaging way, designed to facilitate
understanding, analysis, and decision-making.
In essence, dashboards in data analytics serve as:
•Visual storytellers: They bring data to life, making it easier to comprehend and act upon.
•Performance monitors: They keep a pulse on key metrics, enabling timely intervention and optimization.
•Insight generators: They facilitate exploration and discovery, leading to new understanding and opportunities.
•Decision catalysts: They empower evidence-based decision-making, driving better outcomes.
•Collaboration hubs: They foster shared understanding and data-driven conversations across teams.
Presentation
A presentation in data analytics is the culmination of your analysis, transformed into a compelling narrative delivered to an audience. It's
more than just displaying charts and graphs; it's about bringing your data to life, sparking understanding, and influencing action.
Here's what makes a presentation in data analytics unique:
Purpose:
•Communicate insights: To effectively share the key findings and takeaways from your data analysis with a particular audience.
•Engage and persuade: To capture the audience's attention, guide them through the analysis, and convince them of the significance of
your results.
[email protected]
•Drive action: To motivate the audience to take specific actions based on your insights.
[email protected]
Portfolio Setup
In data analytics, portfolio setup refers to the process of curating and presenting a collection of your most impactful and relevant
data projects to showcase your skills, experience, and expertise to potential employers or clients. It serves as a visual resume
that demonstrates your problem-solving abilities, technical proficiency, and ability to create value from data.
Here's a breakdown of its key elements and purposes:
Purposes:
Showcasing Expertise:
Highlight your knowledge of various data analysis techniques, tools, and methodologies.
Demonstrate your ability to apply these skills to solve real-world problems.
Communicating Impact:
Clearly articulate the outcomes and business value you've generated through your projects.
Provide tangible evidence of your ability to deliver results.
Differentiating Yourself:
Stand out in a competitive job market by showcasing your unique skills and experiences.
Demonstrate your passion for data analysis and problem-solving.
Aligning with Opportunities:
Tailor your portfolio to specific roles or projects you're targeting to demonstrate relevant experience.
Show potential employers you have the skills needed for their specific [email protected]
-{THREE}-
DATA PREPERATION (ETL PROCESS)
Defining your audience and their expectations in data preparation as a data analyst is crucial for ensuring your work is relevant,
efficient, and impactful.
Here are some steps to help you achieve this:
1. Identify the Stakeholders:
•Who will be using the data you prepare? This could include internal teams like BI analysts, data scientists, marketers, or
executives.
2. Understand their Needs and Expectations:
•For each stakeholder: What are their specific goals and objectives in using the data?
•What level of data literacy do they have? This will influence the complexity of your documentation and communication.
•What are their data quality expectations? Do they need high accuracy, timely updates, or specific data formats?
•Are there any regulatory or compliance requirements to consider?
3. Define and Document Expectations:
•Create a clear understanding of the target audience and their data preparation expectations.
•Document this information in a readily accessible format like a data dictionary, glossary, or project charter.
. Maintain clear and consistent communication throughout the data preparation process.
Hypothesis
In data analytics, a hypothesis is an educated guess about a relationship between two or more variables within
your data. It's the question you want to answer with your analysis, the starting point that drives your entire
investigation.
Here's what makes a hypothesis so crucial in data analytics:
Purpose:
•Guides your analysis: It defines the focus of your investigation, directing your exploration and analysis towards
specific variables and relationships.
•Provides clarity: It helps you avoid aimlessly wandering through data without a clear direction or goal.
•Enables testing: Your hypothesis can be transformed into a testable statement, allowing you to use statistical
methods to confirm or reject it based on evidence.
•Drives insights: By testing your hypothesis, you uncover patterns, trends, and relationships within your
data, leading to valuable insights and conclusions.
•Encourages exploration: Even when rejected, a strong hypothesis can spark new questions and avenues for
further exploration, leading to unexpected discoveries.
Key factors of a good hypothesis:
Specificity: It should clearly define the relationship between specific variables you want to investigate.
Testability: It should be formulated in a way that allows you to gather data and apply statistical tests to assess
its validity.
Falsifiability: It should be possible to disprove your hypothesis based on contrary evidence, leading to new
knowledge even if it's rejected.
Predictive power: It should offer a clear prediction about what you expect to find if your hypothesis is true.
Examples of good data analytics hypotheses:
"There is a positive correlation between the number of online reviews a product receives and its sales volume."
"Customers who engage in social media discussions about a brand are more likely to make repeat purchases."
"Employees who receive regular performance feedback have higher job satisfaction and lower turnover rates."
DATA PREPARATION (ETL PROCESS)
This is the process of setting and agreeing on the objective of the project, extract, transform and load the data.
The 4 steps in data preparation (OMG-C)
1). SET OBJECTIVES: To set the objectives means defining clear, specific, and measurable goals that you intend to achieve
within a certain timeframe. It entails asking the right questions in order to understand the need of the client. What problem to
solve, what questions to answer, what decision to take or what trend or insight to discover.
How to set objectives in data analytics
1. Define the Problem or Opportunity:
Start by focusing on the business context: What problem are you trying to solve, or what opportunity are you trying to
capitalize on?
Be specific: Avoid vague statements like "Improve customer experience" or "Boost sales." Instead, define a specific problem
area or target metric.
2. Translate into Data Objectives:
Break down the problem or opportunity into measurable data objectives: These should be concrete goals you can
achieve through data analysis.
Use the SMART framework: Ensure your objectives are Specific, Measurable, Achievable, Relevant, and Time-bound.
3. Focus on Impact:
Align your objectives with the overall business goals: Ensure your data analysis drives actionable insights
that contribute to strategic objectives.
Don't just focus on numbers: Consider the qualitative impact of your analysis, like improving customer
satisfaction or enhancing decision-making capabilities.
4. Examples of Data Analytics Objectives:
Increase website conversion rate by 5% within 3 months by identifying and optimizing key user touchpoints.
Reduce customer churn by 2% by analyzing customer behavior patterns and implementing targeted retention
strategies.
Develop a customer segmentation model to personalize marketing campaigns and improve campaign
effectiveness.
5.)Flexibility and Iteration:
Remember, data analysis is iterative: Be prepared to refine your objectives as you learn more about the
data and gain insights.
Encourage feedback and collaboration: Discuss your objectives with stakeholders and colleagues to ensure
alignment and gather valuable perspectives.
2).CREATE MEASURES: Measures means the datasets needed based on the objectives. What data to measure
to extract information. What entities to investigate and what are their characteristics. A Measure is made up of
objects (tables), variables (columns), relationship between the objects (database diagram) and the dictionary
(details about each column).
Primary Key
Primary key is a column or set of columns in a table that uniquely identifies each row (record) in that table. It
serves as a way to ensure data integrity, enforce uniqueness, and establish relationships between tables.
CASE SUDY1
a) A company wants to give a performance award to the employees. To qualify, an employee must attain at least 90%
punctuality score during the period under consideration. Punctuality means being on time for duty on every working day.
If the employee is absent for any day, the employee is disqualified. Prepare the data.
b)The owner also wants to know the punctuality with respect to gender. Create the Measures.
Objectives
• Performance
• Punctuality
• Absenteeism
• Based on gender
Measures
Object (Tables) Variables (Columns)
Admin Attendance Register Time In
Employee Name
Date
EmployeeID
HR ( Employee Records) Name
Gender
EmployeeID
-{FOUR}-
(ETL- EXTRACT TRANSFORM LOAD)
EXTRACT/GET DATA: Get means to acquire the datasets from the client or for the client. Get the data from the client, or
a variety of sources into the application / tools that will be used to clean the data. Data sources can be from existing, from
third-party or new.
Exploratory Data Analysis (EDA)
Exploratory Data Analysis (EDA) is the initial investigation of a dataset to uncover patterns, trends, anomalies, and
relationships. It's like embarking on a detective adventure, seeking clues within your data to guide further analysis
and unlock hidden insights. Here's a breakdown of the key steps and techniques involved:
Examine data structure (variables, data types, etc.).
Determining data quality (missing values, inconsistencies, etc.).
Documenting any initial observations or questions.
Data Quality Dimensions: The Pillars of Reliable Insights
Data Quality Dimensions are the essential characteristics that define the reliability and usefulness of data. The most
common dimensions include accuracy, completeness, consistency, timeliness, validity, and uniqueness.
CLEAN AND TRANSFORM THE DATA. Cleaning and transformation refer to the combined process of
identifying and rectifying errors, inconsistencies, and inaccuracies in datasets, as well as converting or
reshaping the data to make it suitable for analysis and modeling.
Somebasiccleaningandtransformationsteps
-{FIVE}-
DATA CLEANING AND DATA
TRANSFORMATION
DATA CLEANING AND DATA TRANSFORMATION
Data cleaning and data transformation are two essential steps in the data preparation process, often performed as part of data
preprocessing before analysis or modeling. While they are related, they serve distinct purposes:
Data Cleaning:
Data cleaning involves identifying and correcting errors, inconsistencies, and inaccuracies in a dataset to ensure that the data is
accurate, complete, and reliable. The goal of data cleaning is to improve data quality by addressing issues that might negatively
impact analysis or modeling.
Common data cleaning tasks include:
1. Handling Missing Values
2. Removing Duplicates
3. Dealing with Outliers. ( Outliers are data points that significantly differ from the majority of the data in a dataset.)
4. Addressing Inconsistent Formats 5.Correcting Errors
6.Standardizing Units: Converting data to consistent units of measurement to enable meaningful analysis.
Data Transformation:
Data transformation involves modifying the structure, format, or values of data to make it suitable for analysis, modeling, or other
specific purposes. Transformations are applied to enhance the usability of the data, create new features, or simplify complex
relationships.
Common data transformation tasks include:
1. Aggregation: Summarizing data at a higher level (e.g., daily to monthly) for analysis or reporting.
2. Normalization/Standardization: Scaling numerical features to a common scale, such as z-score normalization.
3.Datetime Operations: Extracting specific components (e.g., year, month) from datetime columns for analysis.
4. Calculating Derivatives: Calculating rate of change, differences, or percentages to capture trends or patterns.
5. Reshaping Data: Pivoting or melting data to change its structure to better suit analysis requirements.
6. Combining Data: Merging or joining multiple datasets to enrich or consolidate information.
POWER QUERY
Power Query is a data transformation and data preparation tool developed by Microsoft. It is a feature available
in Microsoft Excel, Power BI, and other Microsoft products. Power Query allows users to connect to various data
sources, transform , reshape and load it into their preferred data analysis or visualization tool.
With Power Query, you can perform the following tasks:
1. Data Source Connectivity
2. Data Transformation
3. Data Cleaning
4. Data Enrichment
5. Custom Calculations
6. Automation Etc
Data Quality Dimension
Data quality dimensions are categories or criteria used to assess the quality of data. They provide a structured framework for
evaluating the accuracy, completeness, reliability, and overall fitness of data for its intended use. There are several commonly
recognized data quality dimensions that organizations use to measure and improve the quality of their data. These dimensions
help identify issues and areas that need attention. Here are some key data quality dimensions:
1. Accuracy: Accuracy refers to how closely data values match the true or intended values.
2. Completeness: Completeness measures whether all necessary data is present.
3. Consistency: Consistency assesses whether data values are uniform and coherent across different sources and systems.
4. Timeliness: Timeliness evaluates how current the data is and whether it is up to date for the intended use.
5. Validity: Validity checks if data values adhere to defined rules, formats, and constraints.
6. Uniqueness: Uniqueness ensures that each data record is distinct and not duplicated.
7. Integrity: Data integrity ensures that data relationships and dependencies are maintained accurately.
8. Reliability: Reliability measures the trustworthiness and credibility of the data source.
9. Relevance: Relevance assesses whether the data is applicable and useful for the intended purpose