Data Analytics Fundamentals
Abdirahman Awale
Learning Objectives
Explain Big Data and its Application
Explain what makes a good question and evaluate
questions relative to the SMART framework.
Describe the extract, transform, and load (ETL) process
and key components of each step of the process
Explain the differences between descriptive, diagnostic,
predictive, and prescriptive analytics.
Learning Objectives
Understand the situations for which each type
of analytic is appropriate.
List the principles that lead to high-quality data
visualizations.
Describe how automation interacts with the
analytics mindset and when data analytics is
not the right tool for making a decision.
Introduction
• Data is proliferating at an exponential rate. This data
proliferation is caused by increasing computer processing
power, increasing storage capacity, and increasing
bandwidth.
• At present, there is no sign of slowing for any of the
accelerators. On the Internet of Things (IoT) world,
coupled with 5G bandwidth, more and more data will
come from more and more sources.
• Experts estimate that 90% of the world’s data was
created in just the last 2 years, and the total amount of
digital data doubles roughly every 2 years.
• Data analytics are needed to turn this mountain of data
into useful information.
Big Data
• Big data is the term companies use to describe the
massive amounts of data they now capture, store, and
analyze.
• Although Big Data does not refer to any specific
quantity, the term is often used when speaking about
petabytes and exabytes of data.
• It can be defined as extremely large collections of data
(data sets) that may be analyzed to reveal patterns,
trends, and associations, especially relating to human
behavior and interactions.
Big Data-The Four V's
The Four V’s represent the defining characteristics of Big
Data:
1. Data volume refers to the amount of data created
and stored by an organization.
2. Data velocity refers to the speed at which data is
created and stored.
3. Data variety refers to the different forms data can
take.
4. Data veracity refers to the quality or
trustworthiness of data
Facebook Analytics Case
• Facebook was launched in February 2004 and now has
over 3 billion active monthly users. This includes over
691 million fake profiles.
• Every day 500,000 new users are added as the social
media site’s growth continues.
• Every 60 seconds Facebook’s users upload over 100,000
photographs, post over 250,000 status updates and
generate over 4 million “likes”.
• In total, this creates 4 petabytes of new data every day.
• This allows their advertisers to target specific audiences
using precise segmentation tools.
Facebook Analytics Case
• Facebook tracks user activity beyond its platform using
cookies, IP addresses, and mobile tracking to identify
interests and locations.
• As a result of this focused advertising strategy almost 1%
of users that view an advert will usually click on the links
provided and visit the corresponding website.
• Approximately 20% of all website visits in USA now come via
Facebook.
• This success creates over $120 billion in 2024 of advertising
revenue for the company.
• Facebook is now one of the ten most valuable companies
in the world.
Facebook Analytics Case
What are the challenges that Big Data creates for
Facebook.
Big Data Challenges-Facebook
• Volume
• Facebook has over 3 billion active monthly users,
creating 4 petabytes (4ooo TB) of data every day. It is
therefore crucial that the company has expansive
data warehouse facilities available to cope with it
colossal volume of data created.
• Velocity
• Every 60 seconds Facebook’s users upload over
100,000 photographs and post over 250,000 status
updates. Therefore Facebook needs to have the
facilities available to capture this data which is
occurring at an immense speed.
Big Data Challenges-Facebook
• Variety
• Internal data includes photographs, status updates
and “liked” pages.
• In addition to this Facebook captures external data
from user’s devices and location tracking
technology. Therefore they need to have the
capabilities of integrating all of this information
which has come from a variety of sources.
• Veracity
• Capturing external data may create difficulties
verifying the trustworthiness of information
collected. There are also 80 million fake profiles
which create data that may not have a legitimate use.
Other examples of the use of big data
Airlines
Netflix
Track customer
Uses viewing behavior
preferences, booking
(e.g., pauses, replays,
behavior, and profitability
ratings) to recommend
to personalize offers and
content and even decide
prioritize service (e.g., in
what shows to produce.
flight cancellations).
Disease Tracking
(Google Flu Trends) Target (Retail)
Predicted flu outbreaks by Analyzed shopping
analyzing popular search patterns (e.g., bland foods,
terms like "flu symptoms" scent-free products) to
before official health data predict pregnancy.
was available.
An Analytics Mindset
• A mindset is a mental attitude, a way of thinking, or a
frame of mind.
• An analytics mindset is a way of thinking that centers
on the correct use of data and analysis for decision-
making.
• According to EY, an analytics mindset is the ability to
1. Ask the right questions.
2. Extract, transform, and load relevant data.
3. Apply appropriate data analytic techniques.
4. Interpret and share the results with stakeholders.
Ask the Right Questions
• Asking the right question is the first step of the analytics
mindset.
• To define “right” or “good” questions in the context of
data analytics, start by establishing objectives that are
SMART. A good data analytic question is:
• Specific
• Measurable
• Achievable
• Relevant
• Timely.
Ask the Right Questions-Examples
1. Are we losing money or inventory due to fraud and
error and if so where and how much exactly are we
losing and why?
2. Which products or customers are the most
profitable and why?
3. How can I predict my sales if the weather or other
factors change?
4. What is the cheapest way of distributing our goods
from the warehouses to the stores?
5. At what price should I sell my new product to
maximize my profit?
Extract, Transform, and Load
Relevant Data
Extract
Pull data from various sources (databases, files, APIs).
➤ Understand what data is needed ➤ Perform extraction
➤ Verify quality and document the source
Transform
Clean and structure the data for analysis.
➤ Standardize formats ➤ Remove duplicates or errors
➤ Ensure the data fits analysis goals
Load
Move the prepared data into the analysis tool (Excel, BI tool, etc.)
➤ Ready for analysis and visualization
Apply Appropriate Data Analytic
Techniques
• Data analytics fall into four categories:
1. Descriptive analytics,
2. Diagnostic analytics,
3. Predictive analytics,
4. Prescriptive analytics
Importance and Contribution of Different Analytics
Descriptive Analytics
• Descriptive analytics are computations that address
the basic question of “what happened”? .
• Business Analysts use many descriptive analytics,
including computing profit margins and leverage ratios
to examine if business risk changed significantly during a
period and to identify possible fraud.
• Corporate accountants compute descriptive analytics
to understand how the business is performing.
• Metrics such as cost-per-unit, inventory turnover ratios,
customer acquisition costs, and variance of budgets-to-
actual expenses and revenues are examples of
descriptive analytics.
Descriptive analytics
• When examining new data, seek to understand:
1. The central tendency of the data,
2. The spread of the data,
3. The distribution of the data, and
4. Correlations in the data.
Diagnostic Analytics
• Diagnostic analysis goes beyond examining what
happened to try to answer the question, “why did this
happen?”.
• Within diagnostic analysis, both informal and formal
analyses can be conducted.
• Informal diagnostic analysis builds on descriptive
analytics.
• For example, if a company observes that the overall gross
margin fell in the last quarter (a descriptive analytic), they
might examine the mix of products sold.
• The analysis can be as simple as creating a table that
shows all products sold in the last quarter versus the
previous quarter, then computing the difference to see if
more or fewer products with high/low gross margin sold.
Predictive Analytics
• Predictive analytics go a step further than diagnostic
analytics to answer the question “what is likely to
happen in the future?” .
• Successful predictive analytics can be transformative in
an organization.
• Amazon.com uses customer purchasing and search
patterns to predict other products the customer might
be interested in purchasing.
Predictive Analytics
• Predictive analytics use historical data to find patterns
likely to manifest themselves in the future—the more
data, the better chance of finding patterns.
• The dramatic increases in computing power and in
available historical data allow computers to find
relations that humans cannot.
• However, to be successful, predictive analytics require
that future events are predictable based on past data and
that the organization has collected the necessary data
for prediction.
Prescriptive Analytics
• Prescriptive analytics answers the question
“what should be done?”
• Prescriptive analytics can be either
recommendations to take or programmed
actions a system can take based on predictive
analytics results.
• While only a small percentage of companies
are using prescriptive analytics, many are
working toward this goal.
Test Your Understanding
A CFO created an algorithm that would recommend
zip codes for mailing coupons for a particular
product. The CFO likely computed what type of
analytic?
A) Descriptive.
B) Diagnostic.
C) Predictive.
D) Prescriptive.
Test Your Understanding
A controller examined how long it took to close
the accounting books for the last 5 years. The
controller likely computed what type of
analytic?
A) Descriptive.
B) Diagnostic.
C) Predictive.
D) Prescriptive.
Test Your Understanding
A CIO designed a model to predict when employees would
have IT problems and call the IT department for help. The
CIO then used the model to automatically schedule more
employees to work during peak times. The CIO likely
computed what type of analytics for scheduling.
A) Descriptive.
B) Diagnostic.
C) Predictive.
D) Prescriptive.
Test Your Understanding
A bank manager created a model to forecast customer
payoffs of loans. The bank manager likely computed
what type of analytics?
A) Descriptive.
B) Diagnostic.
C) Predictive.
D) Prescriptive.
Data Presentation
• The common expression, “a picture is worth a thousand
words” accurately conveys what researchers have found
about the human brain being programmed to process
visual information better than written information.
• Indeed, researchers have identified several benefits
including:
1. Visualized data is processed faster than written or
tabular information.
2. Visualizations are easier to use. Users need less
guidance to find information with visualized data.
3. Visualization supports the dominant learning
style of the population because most learners are
visual learners.
Choosing The Right Visualization
• Different visualizations are designed to convey
different messages.
• Choosing the right type of visualization strengthens the
ability of the viz to communicate effectively.
• The five main purposes for visualization are:
1. Comparison
2. Correlation
3. Distribution
4. Trend evaluation
5. Part-to-whole
Comparison
• Comparing data across categories or groups
represents the most common reason to
create a visualization in business.
Comparison visualizations require both
numeric and categorical data values.
• The most common type of visualization
used in making comparisons is a bar chart
(also called a bar graph, bar plot, or, if
rotated, a column chart or rotated bar chart).
Correlation
• Another common visualization is
comparing how two numeric variables
fluctuate with each. As shown in Figure , the
most common correlation viz is a
scatterplot
Distribution
• Distribution visualizations show the spread
of numeric data values.
• Showing distribution can help develop a
deeper understanding of data than by just
examining simple descriptive statistics like
the minimum, maximum, and average.
• The two most common distribution
visualizations are histograms and boxplots.
Trend Evaluation
• Trend evaluation visualizations show
changes over an ordered variable, most
often a measurement of time.
• The difference between visualizations
showing trends and correlation is that the
axis in a trend viz is ordered.
• The line chart is the most typical viz used to
show trends.
• In a line chart, the x-axis is
an ordered unit such as days,
months, or years
Part-to-whole
• Part-to-whole visualizations, such as pie
charts, show which items make up the
parts of a total.
• Pie charts are often misused to
communicate comparisons or trends where
bar chart or line graphs are better charts.
• Pie charts are most appropriate when
showing percentages that sum up to 100%
and the data only has a few categories.
Designing High-quality Visualizations
• High-quality visualizations follow three important design
principles:
• Simplification (quantity, distance, and orientation)
• Emphasis (highlighting, weighting, and ordering.)
• Ethical Presentation (avoid data deception).
Summary and Conclusion
• The presentation integrated descriptive, diagnostic, and
predictive statistics.
• The common expression, “a picture is worth a thousand
words” accurately conveys what researchers have found
about the human brain being programmed to process
visual information better than written information.
• To be helpful, data needs to be presented using the right
visualization, and the visualization needs to be designed
correctly
Ernst & Young Foundation Recommended Data Analytics
Skills
Best Resources to Learn Data Analytics
Online Courses & Learning Platforms
• Google Data Analytics Certificate (Coursera)
• Microsoft Learn – Data Analyst Path
Tools to Explore
• Google Sheets / Excel – Essential for beginners.
• Tableau Public – Free version of Tableau for creating visualizations.
• Power BI – Microsoft's business intelligence tool.
Books (Beginner-Friendly)
• Data Science for Business” by Provost & Fawcett – Covers the
why behind analytics.
Thank You