UNIT III UNIVARIATE ANALYSIS
Introduction to Single variable: Distributions and Variables –
Numerical Summaries of Level and Spread – Scaling and
Standardizing – Inequality – Smoothing Time Series.
1. Univariate data (Introduction to Single Variable)
This type of data consists of only one variable
Suppose that the heights of seven students of a class is recorded(figure),there is only
one variable that is height and it is not dealing with any cause or relationship.
The description of patterns found in this type of data can be made by drawing
conclusions using central tendency measures (mean, median and mode),
dispersion or spread of data (range, minimum, maximum, quartiles, variance and
standard deviation) and by using frequency distribution tables, histograms, pie charts,
frequency polygon and bar charts.
Bivariate data
This type of data involves two different variables.
The analysis of this type of data deals with causes and relationships and the
analysis is done to find out the relationship among the two variables.
Example of bivariate data can be temperature and ice cream sales in
summer season.
[Link] and Variables
Types of Descriptive Statistics
● Measures of Central Tendency
● Measure of Variability
● Measures of Frequency Distribution
2.1 Measures of Central Tendency
It represents the whole set of data by a single value. It gives us the location of the central points. There
are three main measures of central tendency:
● Mean
● Mode
● Median
[Link] and Variables
Example Program (Mean. Median, Mode)
Mean: (To calculate the mean, find the sum of all values, and divide the sum by the number of values)
import numpy
speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]
x = [Link](speed)
print(x)
Median:(The median value is the value in the middle, after you have sorted all the values)
import numpy
speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]
x = [Link](speed)
print(x)
Mode:(The Mode value is the value that appears the most number of times)
from scipy import stats
speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]
x = [Link](speed)
print(x)
[Link] and Variables
2.2 Measure of Variability
● Range
● Variance
● Standard deviation
2.2.1 Range
The range describes the difference between the largest and smallest data point in our data set. The
bigger the range, the more the spread of data and vice versa.
Range = Largest data value – smallest data value
Example Program:
import numpy as np
arr = [1, 2, 3, 4, 5]
Maximum = max(arr)
Minimum = min(arr)
Range = Maximum-Minimum
print('Range of your Data',Range)
[Link] and Variables
2.2.2 Variance
It is defined as an average squared deviation from the mean. It is calculated by finding the
difference between every data point and the average which is also known as the mean, squaring
them, adding all of them, and then dividing by the number of data points present in our data set.
[Link] and Variables
2.2.2 Variance
Example Program:
import statistics
arr = [1, 2, 3, 4, 5]
print("Var = ", ([Link](arr)))
Output:
Var = 2.5
2.2.3 Standard Deviation
Example Program
import statistics
arr = [1, 2, 3, 4, 5]
print("Std = ", ([Link](arr)))
Output:
Std = 1.58
[Link] and Variables
How to calculate Variance: