Data Analysis
Week 8 Lecture
Note
Facilitator : Obayagbo Oluwafemi
TOPIC
Title: Exploratory Data
Analysis (EDA)
Subtitle: Visualizing Data
Distributions, Relationships,
and Trends
What is EDA?
Definition: Exploratory Data Analysis (EDA) is the
process of analyzing data sets to summarize their
main characteristics, often using visual methods.
Purpose: Understand the data's structure, detect
outliers, and find patterns or trends.
Example: Before launching a new product,
analyzing sales data from a similar product to
understand customer behavior.
Why Visualization Matters in EDA?
Definition: Visualization allows us to
see patterns, relationships, and trends
that might not be obvious from raw
data.
Importance: Easier to communicate
insights and understand complex
datasets.
Visualizing Data Distributions
Definition: Shows how data points are
spread out, revealing patterns like
skewness or symmetry.
Common Charts: Histogram, Boxplot.
Real-Life Example: Analyzing exam
scores of students—histograms show if
most students scored above or below
Understanding Relationships with
Scatter Plots
Definition:
Scatter plots show relationships
between two variables.
Purpose: Identify correlation or lack of a
relationship.
Real-Life Example: Relationship between
advertising spend and sales revenue—are
they positively correlated?
Trends in Line Charts
Definition: Line charts help show changes
over time, making it easy to detect trends
or patterns.
Common Usage: Time-series data.
Real-LifeExample: Monitoring stock
prices over a year to understand price
trends.
Detecting Outliers with Boxplots
Definition:Boxplots summarize data with
quartiles and highlight outliers.
Why it Matters: Outliers can indicate errors
or important insights.
Real-Life Example: Analyzing income data
of a city’s residents—boxplots can show if
there are any extreme income earners.
Using Heatmaps for Relationships
Definition:Heatmaps display values in a
matrix form using color to represent the
magnitude of the values.
Purpose:Great for showing relationships
between multiple variables.
Real-Life Example: Visualizing a
correlation matrix to understand
relationships between several stock prices.
Combining Visualizations
Definition: Often, multiple charts (like
scatter plots and histograms) are combined
to tell a fuller story.
Purpose: Helps in better decision-making
by providing a comprehensive view of data.
Real-Life Example: Combining scatter
plots and boxplots to explore customer
spending behavior and outliers.
Conclusion & Next Steps
Summary: Visualizing data helps uncover
hidden trends, relationships, and anomalies.
Next Steps: Practice by exploring public
datasets (e.g., sales, weather, or sports
data) and using visualizations.
Real-LifeExample: Analyze sales data
from a supermarket to find seasonal buying
patterns.