0% found this document useful (0 votes)
6 views32 pages

Data Visualization

Uploaded by

neerav206
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views32 pages

Data Visualization

Uploaded by

neerav206
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

INDUSTRY 4.

0 TECHNOLOGIES
(EX-20001)

SLIDE NO. - 1 Date: August-2023


1. Data Visualization
❑ Data visualization is the representation of data through the use of
common graphics, such as charts, plots, infographics, and even
animations.
❑ Data presented through visual elements is easy to understand and
analyze, enabling the effective extraction of actionable insights from the
data.
❑ The main goal of data visualization is to make it easier to identify
patterns, trends and outliers in large data sets.
Cont...

❖ General Types of Visualizations: -


❑ Chart: Information presented in a tabular, graphical form
with data displayed along two axes. It can be in the form of a
graph, diagram, or map.
❑ Table: A set of figures displayed in rows and columns.
❑ Graph: A diagram of points, lines, segments, curves, or areas
that represents certain variables in comparison to each other,
usually along two axes at a right angle.
❑ Geospatial: A visualization that shows data in map form
using different shapes and colors to show the relationship
between pieces of data and specific locations.
Cont...
❖ Seven common types of data visualization: -
❑ Scatterplots: Scatterplots (or scatter graphs) visualize the
relationship between two variables. One variable is shown on
the x-axis, and the other on the y-axis, with each data point
depicted as a single “dot” or item on the graph.
❑ This plot simply describes the correlation between two
variables.
❑ For example, if you wanted to visualize the relationship
between a person’s height and weight, etc.
Cont...

Figure 4: Scatterplot example.


Cont...
❑ Bar charts: Bar charts are used to plot categorical data
against discrete values. Categorical data refers to data that is
not numeric, and it’s often used to describe certain traits or
characteristics.
❑ Some examples of categorical data include things like
education level (e.g. high school, undergrad, or post-grad)
and age group (e.g. under 30, under 40, under 50, or 50 and
over).
Cont...

Figure 5: Bar chart example.


Cont...
❑ Pie charts: Just like bar charts, pie charts are used to
visualize categorical data.
❑ However, while bar charts represent multiple categories of
data, pie charts are used to visualize just one single variable
broken down into percentages or proportions.
❑ A pie chart is essentially a circle divided into different
“slices,” with each slice representing the percentage it
contributes to the whole. Thus, the size of each pie slice is
proportional to how much it contributes to the whole “pie.”
Cont...

Figure 6: Pie chart example.


Cont...
❑ Geographical maps: Geo maps are used to visualize the
distribution of data in relation to a physical, geographical
area.
❑ For example, you could use a color-coded map to see how
natural oil reserves are distributed across the world, or to
visualize how different states voted in a political election.
Maps are an extremely versatile form of data visualization,
and are an excellent way of communicating all kinds of
location-related data.
❑ Some other types of maps used in data visualization include
dot distribution maps (think scatterplots combined with a
map), and cartograms which distort the size of geographical
areas to proportionally represent a given variable
(population density, for example).
Cont...

Figure 6: Geographical map example.


Cont...
❑ Line Graph: It is preferred when time-dependent data has to
be presented. It is best suited to analyse the trend.
Cont...
❑ Histogram: A histogram is a frequency chart that records the
number of occurrences of an entry in a data set. It is useful
when you want to understand the distribution of a series.
Cont...

Figure 7: Histogram versus Bar chart.


Cont...
❑ Box Plot: Box plots are effective in summarizing the spread of
large data. They use percentile to divide the data range. This
helps us to understand data point which falls below or above a
chosen data point. It helps us to identify outliers in the data.
❑ Box plot divides entire data into three categories: -
✔ Median value – it divides the data into two equal halves
✔ Inter Quartile Range (IQR) – It ranges between Q1 and
Q3 percentile values. IQR = Q3-Q1
✔ Outliers – This data differ significantly and lie outside
the whiskers. It can be calculated through the following
formulas: -
1. Q1-1.5*IQR (to search minimum value in data set)
2. Q3+1.5*IQR (to search maximum value in data set)
Cont...

Figure 8: Different terms of a Box plot.


Cont...
Question: - The owner of a restaurant wants to find out more about
where his patrons are coming from. One day he decided to gather
data about the distance (in miles) that people commuted to get his
restaurant. People reported the following distance travelled: -
14, 6, 3, 2, 4, 15, 11, 8, 1, 7, 2, 1, 3, 4, 10, 22, 20
He wants to create a graph that helps him understand the spread of
distances (and the median distance) that people travel.
Step 1: Order the Data Answer
The distances in increasing order are: 1, 1, 2, 2, 3, 3, 4, 4, 6, 7, 8, 10, 11, 14, 15, 20, 22
Step 2: Find Key Values
Minimum: The smallest value in the data is 1.
Maximum: The largest value in the data is 22.
Median: The median is the middle value. Since there are 17 data points (an odd number),
the median is the 9th value in the ordered list. The 9th value is 6.
First Quartile (Q1): This is the median of the lower half of the data. The first 8 numbers
are: 1, 1, 2, 2, 3, 3, 4, 4 The median of this subset is the average of the 4th and 5th values:
(2 + 3)/2 = 2.5.
Third Quartile (Q3): This is the median of the upper half of the data. The last 8 numbers
are: 7, 8, 10, 11, 14, 15, 20, 22 The median of this subset is the average of the 4th and 5th
values: (11 + 14)/2 = 12.5.
Step 3: Calculate the Interquartile Range (IQR)
The IQR is the difference between the third quartile (Q3) and the first quartile (Q1):
IQR=Q3−Q1=12.5−2.5=10
Step 4: Construct the Box Plot
The box plot would show: Minimum: 1 Q1: 2.5 Median: 6 Q3: 12.5
Maximum: 22
• Question:
• A teacher collected the following data for the
number of books read by 25 students in a year:
• Data:
5, 7, 3, 8, 6, 2, 10, 4, 9, 6, 11, 5, 7, 6, 8, 10, 7, 3,
4, 6, 7, 5, 9, 12, 11
• The teacher wants to create a box plot to better
understand the distribution of books read by
the students.
Answer:
Step 1: Order the Data First, let's arrange the data in increasing order:
Ordered Data: 2, 3, 3, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12
Step 2: Find Key Values for the Box Plot
Minimum: The smallest value in the data is 2.
Maximum: The largest value in the data is 12.
Median: The middle value of the data. Since there are 25 data points (an odd number), the
median is the 13th value: The 13th value is 7.
First Quartile (Q1): The median of the lower half of the data (the first 12 numbers):
2,3,3,4,4,5,5,5,6,6,6,62, 3, 3, 4, 4, 5, 5, 5, 6, 6, 6, 62,3,3,4,4,5,5,5,6,6,6,6
The median of this subset is the average of the 6th and 7th values: 5+5/2=5
Third Quartile (Q3): The median of the upper half of the data (the last 12 numbers):
7,7,7,8,8,9,9,10,10,11,11,127, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 127,7,7,8,8,9,9,10,10,11,11,12
The median of this subset is the average of the 6th and 7th values: 9+9/2=9
Step 3: Calculate the Interquartile Range (IQR)
The IQR is the difference between the third quartile (Q3) and the first quartile (Q1):
IQR=Q3−Q1=9−5=4
Step 4: Draw the Box Plot Minimum = 2 First Quartile (Q1) = 5 Median = 7
Third Quartile (Q3) = 9 Maximum = 12 Interquartile Range (IQR) = 4
Cont...
Question: - The following data are the heights of 40 students in a
statistics class.
59; 60; 61; 62; 62; 63; 63; 64; 64; 64; 65; 65; 65; 65; 65; 65; 65; 65;
65; 66; 66; 67; 67; 68; 68; 69; 70; 70; 70; 70; 70; 71; 71; 72; 72; 73;
74; 74; 75; 77
Construct a box plot with the following properties: the minimum
and maximum values as well as the quartiles. Calculate the
Interquartile Range (IQR).
Cont...
Cont...
Cont...

Figure 4: Data analysis process.

SLIDE NO. – 63
Cont...
❑ Outliers:
❑ Outliers are data points that are far from other data points. In
other words, they’re unusual values in a dataset. Outliers are
problematic for many statistical analyses because they can
cause tests to either miss significant findings or distort real
results.
❑ Unfortunately, there are no strict statistical rules for
definitively identifying outliers. Finding outliers depends on
subject-area knowledge and an understanding of the data
collection process. While there is no solid mathematical
definition, there are guidelines and statistical tests you can use
to find outlier candidates.
❑ Sorting your datasheet is a simple but effective way to
highlight unusual values. Simply sort your data sheet for each
variable and then look for unusually high or low values.
Cont...

❑ Define the question: What problem are you trying to solve?


❑ Collect the data: Determine what kind of data you need and
where you’ll find it.
❑ Clean the data: Remove errors, duplicates, outliers, and
unwanted data points.
❑ Analyze the data: Determine the type of data analysis you need to
carry out in order to find the insights you’re looking for.
❑ Visualize the data and share your findings: Translate your key
insights into visual format (e.g. graphs, charts, etc.) and present
them to the relevant audience(s).
Cont...
Question
A local university tracks the distribution of students across
various departments for a semester. The number of students
enrolled in each department is as follows:
Computer Science: 320 students
Mechanical Engineering: 250 students
Electrical Engineering: 180 students
Civil Engineering: 150 students
Biology: 120 students
Business Administration: 200 students
Mathematics: 100 students
Physics: 80 students
Chemistry: 70 students
The total student enrollment in the university is 1,470
students. Construct a piechart based on above information
Answer
• Computer Science:
• 21.8
• Mechanical Engineering: 17
• Electrical Engineering:12.2
• Civil Engineering:10.2
• Biology:8.2
• Business Administration:13.6
• Mathematics:6.8
• Physics:5.4
• Chemistry:4.8
• Step 3: Creating the Pie Chart
• Now that we have the percentage distribution, we can
create a pie chart to visually represent the data. The pie
chart will show how each department contributes to the
total student population.
Question
A local bookstore tracks its sales data for five different categories of books over
the first six months of the year 2024.
The sales figures (in dollars) for each category of books are as follows:
construct a bar-chart based on the above information
Science
Month Fiction Non-fiction History Comedy
Fiction
January 8,000 5,000 4,000 3,500 6,500
February 8,500 5,500 4,200 4,000 7,000
March 9,000 6,000 4,500 4,200 7,500
April 9,500 6,500 4,800 4,500 8,000
May 10,000 7,000 5,000 5,000 8,500
June 10,500 7,500 5,200 5,200 9,000
Question
A school tracks the scores of 100 students on their final exam.
The scores are distributed across several ranges, and the school wants to
visualize how many students fall into each range.
The following data represents the number of students who scored within each
range: Construct a histogram based on the above information
Score Range Number of Students
0 - 10 5
11 - 20 8
21 - 30 12
31 - 40 15
41 - 50 18
51 - 60 20
61 - 70 10
71 - 80 7
81 - 90 3
91 - 100 3
Thank
you

You might also like