0% found this document useful (0 votes)
40 views9 pages

Unit III Dev Data Exploration and Visualization

The document discusses univariate and bivariate data, focusing on descriptive statistics such as measures of central tendency (mean, median, mode) and measures of variability (range, variance, standard deviation). It provides examples and Python code for calculating these statistics. Additionally, it emphasizes the importance of visual representations like histograms and pie charts in analyzing single-variable data.

Uploaded by

kumaresan7751
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views9 pages

Unit III Dev Data Exploration and Visualization

The document discusses univariate and bivariate data, focusing on descriptive statistics such as measures of central tendency (mean, median, mode) and measures of variability (range, variance, standard deviation). It provides examples and Python code for calculating these statistics. Additionally, it emphasizes the importance of visual representations like histograms and pie charts in analyzing single-variable data.

Uploaded by

kumaresan7751
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

UNIT III UNIVARIATE ANALYSIS

Introduction to Single variable: Distributions and Variables –


Numerical Summaries of Level and Spread – Scaling and
Standardizing – Inequality – Smoothing Time Series.
1. Univariate data (Introduction to Single Variable)

This type of data consists of only one variable

Suppose that the heights of seven students of a class is recorded(figure),there is only


one variable that is height and it is not dealing with any cause or relationship.

The description of patterns found in this type of data can be made by drawing
conclusions using central tendency measures (mean, median and mode),
dispersion or spread of data (range, minimum, maximum, quartiles, variance and
standard deviation) and by using frequency distribution tables, histograms, pie charts,
frequency polygon and bar charts.
Bivariate data

This type of data involves two different variables.

The analysis of this type of data deals with causes and relationships and the
analysis is done to find out the relationship among the two variables.

Example of bivariate data can be temperature and ice cream sales in


summer season.
[Link] and Variables

Types of Descriptive Statistics


● Measures of Central Tendency
● Measure of Variability
● Measures of Frequency Distribution

2.1 Measures of Central Tendency

It represents the whole set of data by a single value. It gives us the location of the central points. There
are three main measures of central tendency:

● Mean
● Mode
● Median
[Link] and Variables
Example Program (Mean. Median, Mode)
Mean: (To calculate the mean, find the sum of all values, and divide the sum by the number of values)
import numpy
speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]
x = [Link](speed)
print(x)

Median:(The median value is the value in the middle, after you have sorted all the values)
import numpy
speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]
x = [Link](speed)
print(x)

Mode:(The Mode value is the value that appears the most number of times)
from scipy import stats
speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]
x = [Link](speed)
print(x)
[Link] and Variables

2.2 Measure of Variability


● Range
● Variance
● Standard deviation
2.2.1 Range
The range describes the difference between the largest and smallest data point in our data set. The
bigger the range, the more the spread of data and vice versa.
Range = Largest data value – smallest data value

Example Program:

import numpy as np
arr = [1, 2, 3, 4, 5]
Maximum = max(arr)
Minimum = min(arr)
Range = Maximum-Minimum
print('Range of your Data',Range)
[Link] and Variables
2.2.2 Variance

It is defined as an average squared deviation from the mean. It is calculated by finding the
difference between every data point and the average which is also known as the mean, squaring
them, adding all of them, and then dividing by the number of data points present in our data set.
[Link] and Variables
2.2.2 Variance
Example Program:
import statistics
arr = [1, 2, 3, 4, 5]
print("Var = ", ([Link](arr)))
Output:
Var = 2.5

2.2.3 Standard Deviation


Example Program
import statistics
arr = [1, 2, 3, 4, 5]
print("Std = ", ([Link](arr)))
Output:
Std = 1.58
[Link] and Variables

How to calculate Variance:

You might also like