0% found this document useful (0 votes)

18 views8 pages

01 Statistics With Python

for mca

Uploaded by

Saravanan Chidambaram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views8 pages

01 Statistics With Python

for mca

Uploaded by

Saravanan Chidambaram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Statistics with Python

Statistics, in general, is the method of collection of data, tabulation, and interpretation of

numerical data. It is an area of applied mathematics concerned with data collection analysis,
interpretation, and presentation. With statistics, we can see how data can be used to solve complex
problems.

Understanding the Descriptive Statistics

Descriptive statistics is about describing and summarizing data. It uses two main approaches:

1. The quantitative approach describes and summarizes data numerically.

2. The visual approach illustrates data with charts, plots, histograms, and other graphs.

You can apply descriptive statistics to one or many datasets or variables. When you describe
and summarize a single variable, you’re performing univariate analysis. When you search for
statistical relationships among a pair of variables, you’re doing a bivariate analysis. Similarly,
a multivariate analysis is concerned with multiple variables at once.

Types of Measures

In this tutorial, you’ll learn about the following types of measures in descriptive statistics:

 Central tendency tells you about the centers of the data. Useful measures include the mean,
median, and mode.
 Variability tells you about the spread of the data. Useful measures include variance and
standard deviation.
 Correlation or joint variability tells you about the relation between a pair of variables in a
dataset. Useful measures include covariance and the correlation coefficient.

There are two types of Descriptive Statistics:

 The measure of central tendency

 Measure of variability

Types of Descriptive Statistics

1. Measure of Central Tendency

The measure of central tendency is a single value that attempts to describe the whole set of
data. There are three main features of central tendency:

 Mean
 Median
 Median Low
 Median High
 Mode
The measure of Central Tendency

Mean

It is the sum of observations divided by the total number of observations. It is also defined
as average which is the sum divided by count. The mean() function returns the mean or average of
the data passed in its arguments. If the passed argument is empty, StatisticsError is raised.

Example: Python code to calculate mean

# Python code to demonstrate the working of

# mean()

# importing statistics to handle statistical

# operations
import statistics

# initializing list
li = [1, 2, 3, 3, 2, 2, 2, 1]

# using mean() to calculate average of list

# elements
print ("The average of list values is : ",end="")
print (statistics.mean(li))

Output:

The average of list values is : 2

Median

It is the middle value of the data set. It splits the data into two halves. If the number of
elements in the data set is odd then the center element is the median and if it is even then the
median would be the average of two central elements. it first sorts the data i=and then performs the
median operation. The median() function is used to calculate the median, i.e middle element of
data. If the passed argument is empty, Statistics Error is raised.

Example: Python code to calculate Median

# Python code to demonstrate the

# working of median() on various
# range of data-sets

# importing the statistics module

from statistics import median

# Importing fractions module as fr

from fractions import Fraction as fr

# tuple of positive integer numbers

data1 = (2, 3, 4, 5, 7, 9, 11)

# tuple of floating point values

data2 = (2.4, 5.1, 6.7, 8.9)
# tuple of fractional numbers
data3 = (fr(1, 2), fr(44, 12),
fr(10, 3), fr(2, 3))

# tuple of a set of negative integers

data4 = (-5, -1, -12, -19, -3)

# tuple of set of positive

# and negative integers
data5 = (-1, -2, -3, -4, 4, 3, 2, 1)

# Printing the median of above datasets

print("Median of data-set 1 is % s" % (median(data1)))
print("Median of data-set 2 is % s" % (median(data2)))
print("Median of data-set 3 is % s" % (median(data3)))
print("Median of data-set 4 is % s" % (median(data4)))
print("Median of data-set 5 is % s" % (median(data5)))

Output:

Median of data-set 1 is 5

Median of data-set 2 is 5.9

Median of data-set 3 is 2

Median of data-set 4 is -5

Median of data-set 5 is 0.0

Median Low

The median_low() function returns the median of data in case of odd number of elements,
but in case of even number of elements, returns the lower of two middle elements. If the passed
argument is empty, StatisticsError is raised

Example: Python code to calculate Median Low

# Python code to demonstrate the

# working of median_low()

# importing the statistics module

import statistics

# simple list of a set of integers

set1 = [1, 3, 3, 4, 5, 7]

# Print median of the data-set

# Median value may or may not

# lie within the data-set
print("Median of the set is % s"
% (statistics.median(set1)))
# Print low median of the data-set
print("Low Median of the set is % s "
% (statistics.median_low(set1)))

Output:
Median of the set is 3.5
Low Median of the set is 3

Median High

The median_high() function returns the median of data in case of odd number of elements,
but in case of even number of elements, returns the higher of two middle elements. If passed
argument is empty, StatisticsError is raised.

Example: Python code to calculate Median High

# Working of median_high() and median() to

# demonstrate the difference between them.

# importing the statistics module

import statistics

# simple list of a set of integers

set1 = [1, 3, 3, 4, 5, 7]

# Print median of the data-set

# Median value may or may not

# lie within the data-set
print("Median of the set is %s"
% (statistics.median(set1)))

# Print high median of the data-set

print("High Median of the set is %s "
% (statistics.median_high(set1)))

Output:

Median of the set is 3.5

High Median of the set is 4

Mode

It is the value that has the highest frequency in the given data set. The data set may have no
mode if the frequency of all data points is the same. Also, we can have more than one mode if we
encounter two or more data points having the same frequency.

The mode() function returns the number with the maximum number of occurrences. If the
passed argument is empty, StatisticsError is raised.

Example: Python code to calculate Mode

# Python code to demonstrate the
# working of mode() function
# on a various range of data types

# Importing the statistics module

from statistics import mode

# Importing fractions module as fr

# Enables to calculate harmonic_mean of a
# set in Fraction
from fractions import Fraction as fr

# tuple of positive integer numbers

data1 = (2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 7)

# tuple of a set of floating point values

data2 = (2.4, 1.3, 1.3, 1.3, 2.4, 4.6)

# tuple of a set of fractional numbers

data3 = (fr(1, 2), fr(1, 2), fr(10, 3), fr(2, 3))

# tuple of a set of negative integers

data4 = (-1, -2, -2, -2, -7, -7, -9)

# tuple of strings
data5 = ("red", "blue", "black", "blue", "black", "black", "brown")

# Printing out the mode of the above data-sets

print("Mode of data set 1 is % s" % (mode(data1)))
print("Mode of data set 2 is % s" % (mode(data2)))
print("Mode of data set 3 is % s" % (mode(data3)))
print("Mode of data set 4 is % s" % (mode(data4)))
print("Mode of data set 5 is % s" % (mode(data5)))

Output:
Mode of data set 1 is 5
Mode of data set 2 is 1.3
Mode of data set 3 is 1/2
Mode of data set 4 is -2
Mode of data set 5 is black
Measure of Variability

Till now, we have studied the measure of central tendency but this alone is not sufficient to
describe the data. To overcome this we need the measure of variability. The measure of variability
is known as the spread of data or how well our data is distributed.

The most common variability measures are:

 Range
 Variance
 Standard deviation
Range

The difference between the largest and smallest data point in our data set is known as the
range. The range is directly proportional to the spread of data which means the bigger the range, the
more the spread of data and vice versa.

Range = Largest data value – smallest data value

We can calculate the maximum and minimum values using the max() and min() methods
respectively.

Example: Python code to calculate Range

# Sample Data
arr = [1, 2, 3, 4, 5]

#Finding Max
Maximum = max(arr)
# Finding Min
Minimum = min(arr)

# Difference Of Max and Min

Range = Maximum-Minimum
print("Maximum = {}, Minimum = {} and Range = {}".format(
Maximum, Minimum, Range))

Output:

Maximum = 5, Minimum = 1 and Range = 4

Variance

It is defined as an average squared deviation from the mean. It is calculated by finding the
difference between every data point and the average which is also known as the mean, squaring
them, adding all of them, and then dividing by the number of data points present in our data set.

where N = number of terms

u = Mean

The statistics module provides the variance() method that does all the maths behind the scene. If the
passed argument is empty, StatisticsError is raised.

Example: Python code to calculate Variance

# Python code to demonstrate variance()

# function on varying range of data-types

# importing statistics module

from statistics import variance

# importing fractions as parameter values

from fractions import Fraction as fr

# tuple of a set of positive integers

# numbers are spread apart but not very much
sample1 = (1, 2, 5, 4, 8, 9, 12)

# tuple of a set of negative integers

sample2 = (-2, -4, -3, -1, -5, -6)

# tuple of a set of positive and negative numbers

# data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)

# tuple of a set of fractional numbers

sample4 = (fr(1, 2), fr(2, 3), fr(3, 4),
fr(5, 6), fr(7, 8))

# tuple of a set of floating point values

sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)

# Print the variance of each samples

print("Variance of Sample1 is % s " % (variance(sample1)))
print("Variance of Sample2 is % s " % (variance(sample2)))
print("Variance of Sample3 is % s " % (variance(sample3)))
print("Variance of Sample4 is % s " % (variance(sample4)))
print("Variance of Sample5 is % s " % (variance(sample5)))

Output:

Variance of Sample1 is 15.80952380952381

Variance of Sample2 is 3.5

Variance of Sample3 is 61.125

Variance of Sample4 is 1/45

Variance of Sample5 is 0.17613000000000006

Standard Deviation

It is defined as the square root of the variance. It is calculated by finding the Mean, then
subtracting each number from the Mean which is also known as the average, and squaring the
result. Adding all the values and then dividing by the no of terms followed by the square root.

where N = number of terms

u = Mean

The stdev() method of the statistics module returns the standard deviation of the data. If the passed
argument is empty, StatisticsError is raised.
Example: Python code to calculate Standard Deviation

# Python code to demonstrate stdev()

# function on various range of datasets

# importing the statistics module

from statistics import stdev

# importing fractions as parameter values

from fractions import Fraction as fr

# creating a varying range of sample sets

# numbers are spread apart but not very much
sample1 = (1, 2, 5, 4, 8, 9, 12)

# tuple of a set of negative integers

sample2 = (-2, -4, -3, -1, -5, -6)

# tuple of a set of positive and negative numbers

# data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)

# tuple of a set of floating point values

sample4 = (1.23, 1.45, 2.1, 2.2, 1.9)

# Print the standard deviation of

# following sample sets of observations
print("The Standard Deviation of Sample1 is % s"
% (stdev(sample1)))

print("The Standard Deviation of Sample2 is % s"

% (stdev(sample2)))

print("The Standard Deviation of Sample3 is % s"

% (stdev(sample3)))

print("The Standard Deviation of Sample4 is % s"

% (stdev(sample4)))

Output:

The Standard Deviation of Sample1 is 3.9761191895520196

The Standard Deviation of Sample2 is 1.8708286933869707

The Standard Deviation of Sample3 is 7.8182478855559445

The Standard Deviation of Sample4 is 0.41967844833872525

Data Mining Lab Maual Through Python 031023
No ratings yet
Data Mining Lab Maual Through Python 031023
22 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
Random Variable
No ratings yet
Random Variable
10 pages
DA Practical Lab 02 Statistical Functions
No ratings yet
DA Practical Lab 02 Statistical Functions
6 pages
DS Chapter - 2
No ratings yet
DS Chapter - 2
73 pages
ML Lab Manual
No ratings yet
ML Lab Manual
27 pages
Notebook Statistics
No ratings yet
Notebook Statistics
6 pages
Stats Lect
No ratings yet
Stats Lect
77 pages
Intro to Statistics with Python
No ratings yet
Intro to Statistics with Python
54 pages
2.DescriptiveAnalytics v2
No ratings yet
2.DescriptiveAnalytics v2
10 pages
Project Report Writing Guidelines
No ratings yet
Project Report Writing Guidelines
31 pages
Parc 6
No ratings yet
Parc 6
3 pages
3 Data Visualization
No ratings yet
3 Data Visualization
75 pages
Machine Learning Lab Word 12-1-2025. Document
No ratings yet
Machine Learning Lab Word 12-1-2025. Document
68 pages
Session 12
No ratings yet
Session 12
8 pages
Data Mining and Predictive Modelling Assignment
No ratings yet
Data Mining and Predictive Modelling Assignment
34 pages
Nummerical Summaries
No ratings yet
Nummerical Summaries
11 pages
Statistical Measures in Data Analysis
No ratings yet
Statistical Measures in Data Analysis
70 pages
Rahul ML File' (1) 2
No ratings yet
Rahul ML File' (1) 2
30 pages
Statistics for Computer Science Students
No ratings yet
Statistics for Computer Science Students
6 pages
EDA: Key Stats & Visualizations in Python
No ratings yet
EDA: Key Stats & Visualizations in Python
15 pages
ML Lab Manual
No ratings yet
ML Lab Manual
37 pages
Statistics Module
No ratings yet
Statistics Module
2 pages
Unit 5 Descriptive Statistics
No ratings yet
Unit 5 Descriptive Statistics
7 pages
5 - Data Summaries and Visualization
No ratings yet
5 - Data Summaries and Visualization
97 pages
Statistical Analysis of Bridge Conditions
No ratings yet
Statistical Analysis of Bridge Conditions
9 pages
It B.tech II Year II Sem DV (R18a0555)
No ratings yet
It B.tech II Year II Sem DV (R18a0555)
73 pages
Ai - Ssmda
No ratings yet
Ai - Ssmda
142 pages
Staff Manual 03
No ratings yet
Staff Manual 03
3 pages
5 - Data Summaries and Visualization
No ratings yet
5 - Data Summaries and Visualization
87 pages
Lecture Notes 2 - Descriptive Statistics-1720598791715
No ratings yet
Lecture Notes 2 - Descriptive Statistics-1720598791715
21 pages
Q & A - Unit 1 - Introduction To Statistics
No ratings yet
Q & A - Unit 1 - Introduction To Statistics
20 pages
ML Experiment - 1
No ratings yet
ML Experiment - 1
1 page
Describing Data: Centre Mean Is The Technical Term For What Most People Call An Average. in Statistics, "Average"
No ratings yet
Describing Data: Centre Mean Is The Technical Term For What Most People Call An Average. in Statistics, "Average"
4 pages
Statistics
No ratings yet
Statistics
21 pages
Statistics, Statistical Modelling & Data Analytics
No ratings yet
Statistics, Statistical Modelling & Data Analytics
68 pages
Lecture 3 - Stat HO
No ratings yet
Lecture 3 - Stat HO
21 pages
Chapter 1
No ratings yet
Chapter 1
44 pages
EE311 Lecture #2 Descriptive Statistics
No ratings yet
EE311 Lecture #2 Descriptive Statistics
47 pages
02 Exploratory Data Analytics
No ratings yet
02 Exploratory Data Analytics
41 pages
Session 3
No ratings yet
Session 3
61 pages
4 Compressed
No ratings yet
4 Compressed
18 pages
ML Exp-2 22
No ratings yet
ML Exp-2 22
18 pages
Ssmda End Sem
No ratings yet
Ssmda End Sem
152 pages
Stat Python
No ratings yet
Stat Python
4 pages
Programming Python Statistics
No ratings yet
Programming Python Statistics
7 pages
Statistics
No ratings yet
Statistics
152 pages
Session 1 On Descriptive Statistics
No ratings yet
Session 1 On Descriptive Statistics
24 pages
H1.1 Definitions, Measures, Plots, CLT
No ratings yet
H1.1 Definitions, Measures, Plots, CLT
83 pages
Stats 1 Module Updated
No ratings yet
Stats 1 Module Updated
53 pages
ML2 Math Algo
No ratings yet
ML2 Math Algo
72 pages
New Chapter 13 Elementary Statistics
No ratings yet
New Chapter 13 Elementary Statistics
15 pages
Chapter 2 BSC TY Statistical Data Analysis
No ratings yet
Chapter 2 BSC TY Statistical Data Analysis
124 pages
Exp 3
No ratings yet
Exp 3
16 pages
Unit II TYCS DS
No ratings yet
Unit II TYCS DS
176 pages
ML Programs
No ratings yet
ML Programs
41 pages
Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
38 pages
Data Analysis and Visualization EDA
No ratings yet
Data Analysis and Visualization EDA
51 pages
Stats - The Theory 2
No ratings yet
Stats - The Theory 2
25 pages
02 Unit-Ii - ML
No ratings yet
02 Unit-Ii - ML
45 pages
04 Unit-Iv - ML
No ratings yet
04 Unit-Iv - ML
23 pages
01 Unit-I - ML
No ratings yet
01 Unit-I - ML
50 pages
05 Unit-V - ML
No ratings yet
05 Unit-V - ML
1 page
ACNUnit2Notes23 1
No ratings yet
ACNUnit2Notes23 1
41 pages
Karan Balance Sheet 31.03.204
No ratings yet
Karan Balance Sheet 31.03.204
1 page
HDFC Life Insurance: Market Overview 2024
No ratings yet
HDFC Life Insurance: Market Overview 2024
59 pages
ISN404 Research Thesis 1 Unit Outline 2025 S1
No ratings yet
ISN404 Research Thesis 1 Unit Outline 2025 S1
5 pages
Waybill-2023-06-21 09 - 33 - 41
No ratings yet
Waybill-2023-06-21 09 - 33 - 41
10 pages
Economics: Number Key Number Key
No ratings yet
Economics: Number Key Number Key
30 pages
Information Systems 1A Exam
No ratings yet
Information Systems 1A Exam
7 pages
Class Notes of Unit 4 - Fashion Merchandising
No ratings yet
Class Notes of Unit 4 - Fashion Merchandising
38 pages
Linux Partitioning Guide for x86 Systems
No ratings yet
Linux Partitioning Guide for x86 Systems
6 pages
Vocabulary & Grammar Test Unit 9 Test A
100% (5)
Vocabulary & Grammar Test Unit 9 Test A
5 pages
Antibiotics: Success and Failures
No ratings yet
Antibiotics: Success and Failures
43 pages
Icap Sales Tax Past Papers 2014 To 2024
No ratings yet
Icap Sales Tax Past Papers 2014 To 2024
24 pages
Ppt-Q1-Prefixes &suffixes
No ratings yet
Ppt-Q1-Prefixes &suffixes
32 pages
MRP Thesis
100% (3)
MRP Thesis
6 pages
Maximum Price
No ratings yet
Maximum Price
3 pages
09-Bomba de Aceite PDF
No ratings yet
09-Bomba de Aceite PDF
212 pages
"B" Shifting Tool Specifications
No ratings yet
"B" Shifting Tool Specifications
3 pages
Jurisdiction in Cheque Dishonour Cases
No ratings yet
Jurisdiction in Cheque Dishonour Cases
4 pages
On Point 2 2e Word List en
No ratings yet
On Point 2 2e Word List en
25 pages
General Awareness Sample Question Paper
No ratings yet
General Awareness Sample Question Paper
24 pages
Zatka Machine
No ratings yet
Zatka Machine
6 pages
ECE3009
No ratings yet
ECE3009
2 pages
Zamil Steel Buildings Design Manual: Peb Division
No ratings yet
Zamil Steel Buildings Design Manual: Peb Division
1 page
General Principles of Food Hygiene CXC 1-1969
No ratings yet
General Principles of Food Hygiene CXC 1-1969
35 pages
Tunisia 4
No ratings yet
Tunisia 4
1 page
Computer Applications Radiology
No ratings yet
Computer Applications Radiology
9 pages
Binomial and Multinomial Theorems
No ratings yet
Binomial and Multinomial Theorems
4 pages
Topic 2 Scratch
No ratings yet
Topic 2 Scratch
47 pages
Paediatric Bronchoscopy Progress in Respiratory Research Kostas N. Priftis Download
No ratings yet
Paediatric Bronchoscopy Progress in Respiratory Research Kostas N. Priftis Download
53 pages
Siemens ASD Product Training
100% (1)
Siemens ASD Product Training
42 pages
Psychological Theories in Values Education
No ratings yet
Psychological Theories in Values Education
14 pages