0% found this document useful (0 votes)

24 views7 pages

Python Statisc

This document outlines the computation of basic statistical parameters using Python libraries Pandas, NumPy, and SciPy, including measures of central tendency, dispersion, and distribution shape. It provides formulas and code examples for calculating mean, median, mode, variance, standard deviation, range, skewness, and kurtosis. Additionally, it includes a real-life use case example analyzing student exam scores.

Uploaded by

themanhector24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views7 pages

Python Statisc

Uploaded by

themanhector24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Assignment 7: Study and Computation of Basic

Statistical Parameters of Variables in Python using

Pandas, NumPy, and SciPy
Objective:

The goal of this topic is to help you understand how to compute basic statistical measures for
variables, such as:

 Central Tendency: Mean, Median, Mode

 Dispersion: Variance, Standard Deviation, Range

 Shape of the Distribution: Skewness, Kurtosis

We will use Pandas, NumPy, and SciPy to calculate and interpret these parameters.

✅ Key Statistical Parameters & Computation

1. Mean (Average)

The mean is the sum of all values divided by the number of values.

Formula:

Mean=∑Xn\text{Mean} = \frac{\sum{X}}{n}Mean=n∑X

Pandas Example:

df['mean'] = df['column_name'].mean()

2. Median

The median is the middle value when the data is sorted in order.

Pandas Example:

df['median'] = df['column_name'].median()

3. Mode

The mode is the value that appears most frequently in the data.

Pandas Example:

df['mode'] = df['column_name'].mode()

4. Variance

Variance measures how far the data points are spread out from the mean.

Formula:

Variance=∑(X−μ)2n\text{Variance} = \frac{\sum{(X - \mu)^2}}{n}Variance=n∑(X−μ)2

Pandas Example:

df['variance'] = df['column_name'].var()

5. Standard Deviation

Standard deviation is the square root of variance and gives the spread of data in the same units as
the data.

Formula:

Standard Deviation=Variance\text{Standard Deviation} =

\sqrt{\text{Variance}}Standard Deviation=Variance

Pandas Example:

df['std_dev'] = df['column_name'].std()

6. Range

The range is the difference between the maximum and minimum values in the data.

Formula:

Range=Max−Min\text{Range} = \text{Max} - \text{Min}Range=Max−Min

Pandas Example:

df['range'] = df['column_name'].max() - df['column_name'].min()

7. Skewness

Skewness measures the asymmetry of the data around its mean.

Formula:

Skewness=∑(X−μ)3n⋅σ3\text{Skewness} = \frac{\sum{(X - \mu)^3}}{n \cdot

\sigma^3}Skewness=n⋅σ3∑(X−μ)3

SciPy Example:

from scipy.stats import skew

skew_value = skew(df['column_name'])

8. Kurtosis

Kurtosis measures the "tailedness" of the data distribution (whether it has outliers or not).

Formula:

Kurtosis=∑(X−μ)4n⋅σ4−3\text{Kurtosis} = \frac{\sum{(X - \mu)^4}}{n \cdot \sigma^4} -

3Kurtosis=n⋅σ4∑(X−μ)4−3

SciPy Example:

from scipy.stats import kurtosis

kurt_value = kurtosis(df['column_name'])
✅ Using Pandas, NumPy, and SciPy to Compute Statistical Parameters

import pandas as pd

import numpy as np

from scipy.stats import skew, kurtosis

# Sample DataFrame

data = {'Age': [22, 34, 25, 33, 22, 45, 33, 29, 41, 36]}

df = pd.DataFrame(data)

# Mean

mean_age = df['Age'].mean()

# Median

median_age = df['Age'].median()

# Mode

mode_age = df['Age'].mode()[0]

# Variance

variance_age = df['Age'].var()

# Standard Deviation

std_dev_age = df['Age'].std()

# Range

range_age = df['Age'].max() - df['Age'].min()

# Skewness

skew_value = skew(df['Age'])
# Kurtosis

kurt_value = kurtosis(df['Age'])

print(f"Mean: {mean_age}")

print(f"Median: {median_age}")

print(f"Mode: {mode_age}")

print(f"Variance: {variance_age}")

print(f"Standard Deviation: {std_dev_age}")

print(f"Range: {range_age}")

print(f"Skewness: {skew_value}")

print(f"Kurtosis: {kurt_value}")

Questions and Answers

Q1: What is the difference between mean and median?

Answer:

 Mean is the average of all values, which can be affected by outliers.

 Median is the middle value when the data is sorted, and it is not affected by outliers.

Q2: How do you compute variance and why is it important?

Answer:
Variance is computed by averaging the squared deviations of each data point from the mean. It
measures how spread out the data is. Higher variance means the data points are more dispersed.

Q3: What is the formula for standard deviation, and how is it different from variance?

Answer:

 Standard Deviation = Variance\sqrt{\text{Variance}}Variance

 It is the square root of variance and provides a measure of spread in the same units as the
data.

Q4: How would you calculate skewness and what does it indicate about data?
Answer:
Skewness measures the asymmetry of data around its mean:

 Positive skew means the tail is on the right.

 Negative skew means the tail is on the left.

 A value of 0 indicates a symmetric distribution.

Q5: What is kurtosis, and why is it used?

Answer:
Kurtosis measures the "tailedness" of the data distribution. A higher kurtosis indicates more extreme
values (outliers), while a lower kurtosis indicates fewer extreme values.

Q6: How do you interpret the skewness value?

Answer:

 Skewness > 0: Right-skewed distribution (long tail on the right).

 Skewness < 0: Left-skewed distribution (long tail on the left).

 Skewness ≈ 0: Symmetric distribution.

Q7: How can you find the range of a dataset in Python?

Answer:
Range is the difference between the maximum and minimum values of a dataset:

range_value = df['column_name'].max() - df['column_name'].min()

Q8: What does a negative kurtosis indicate?

Answer:
A negative kurtosis indicates a platykurtic distribution (flatter than normal distribution), meaning
fewer and less extreme outliers.

Q9: How can you calculate the mode of a dataset?

Answer:
You can use Pandas' mode() function, which returns the most frequent value(s):

mode_value = df['column_name'].mode()[0]

Q10: What does it mean if the standard deviation is very high?

Answer:
A high standard deviation indicates that the data points are spread out widely from the mean,
meaning there's high variability in the data.

📋 Summary Table:

Statistic Code Example

Mean df['col'].mean()

Median df['col'].median()

Mode df['col'].mode()

Variance df['col'].var()

Standard Dev. df['col'].std()

Range df['col'].max() - df['col'].min()

Skewness skew(df['col'])

Kurtosis kurtosis(df['col'])

🏆 Real-Life Use Case Example:

Imagine you're working with a dataset of student scores in an exam. You want to analyze the scores
by calculating the mean, median, standard deviation, and skewness to understand the distribution of
the scores.

python

import pandas as pd

import numpy as np

from scipy.stats import skew, kurtosis

# Sample Data

data = {'Scores': [95, 84, 72, 88, 91, 78, 85, 92, 76, 79]}

df = pd.DataFrame(data)

# Mean

mean_score = df['Scores'].mean()
# Median

median_score = df['Scores'].median()

# Standard Deviation

std_dev_score = df['Scores'].std()

# Skewness

skew_value = skew(df['Scores'])

print(f"Mean Score: {mean_score}")

print(f"Median Score: {median_score}")

print(f"Standard Deviation: {std_dev_score}")

print(f"Skewness: {skew_value}")

Unit 3
No ratings yet
Unit 3
20 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
DA Practical Lab 02 Statistical Functions
No ratings yet
DA Practical Lab 02 Statistical Functions
6 pages
EDA: Key Stats & Visualizations in Python
No ratings yet
EDA: Key Stats & Visualizations in Python
15 pages
Nummerical Summaries
No ratings yet
Nummerical Summaries
11 pages
DS Chapter - 2
No ratings yet
DS Chapter - 2
73 pages
Week - 6-7
No ratings yet
Week - 6-7
9 pages
Variance, Standard Variance
No ratings yet
Variance, Standard Variance
33 pages
Notes Stats Quiz 2
No ratings yet
Notes Stats Quiz 2
10 pages
Unit 3 DS
No ratings yet
Unit 3 DS
30 pages
Maths
No ratings yet
Maths
30 pages
Measures of Variability
No ratings yet
Measures of Variability
23 pages
3-Measures of Dispersion
No ratings yet
3-Measures of Dispersion
33 pages
Intro to Statistics with Python
No ratings yet
Intro to Statistics with Python
54 pages
ML Lab Manual
No ratings yet
ML Lab Manual
27 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
2 pages
Chapter - 3
No ratings yet
Chapter - 3
11 pages
Bs 2&7 Marks
No ratings yet
Bs 2&7 Marks
4 pages
Full Statistics
No ratings yet
Full Statistics
108 pages
Numerical Summary Statistics
No ratings yet
Numerical Summary Statistics
19 pages
Understanding Statistics Fundamentals
No ratings yet
Understanding Statistics Fundamentals
16 pages
Edaunit IV
No ratings yet
Edaunit IV
15 pages
Business Statstics Complete
No ratings yet
Business Statstics Complete
13 pages
Unit-2 Data Analytics Approaches
No ratings yet
Unit-2 Data Analytics Approaches
24 pages
Measures of Variation, Quartiles and Percentiles, Skewness and Kurtosis
No ratings yet
Measures of Variation, Quartiles and Percentiles, Skewness and Kurtosis
16 pages
Descriptive Statistics - Practical1
No ratings yet
Descriptive Statistics - Practical1
12 pages
04 - Measures of Variations
No ratings yet
04 - Measures of Variations
24 pages
Data Mining Lab Maual Through Python 031023
No ratings yet
Data Mining Lab Maual Through Python 031023
22 pages
Descriptive Statistics Guide for Students
No ratings yet
Descriptive Statistics Guide for Students
40 pages
Data Analysis Essentials
No ratings yet
Data Analysis Essentials
19 pages
Data Analysis and Visualization EDA
No ratings yet
Data Analysis and Visualization EDA
51 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
63 pages
Business Analytics
No ratings yet
Business Analytics
44 pages
Descriptive Measure of Scale Variables
No ratings yet
Descriptive Measure of Scale Variables
33 pages
Analysis Basic Statistics Descriptive Statistics
No ratings yet
Analysis Basic Statistics Descriptive Statistics
8 pages
Chapter 2 BSC TY Statistical Data Analysis
No ratings yet
Chapter 2 BSC TY Statistical Data Analysis
124 pages
Basic Statistics
No ratings yet
Basic Statistics
7 pages
Module V 1
No ratings yet
Module V 1
7 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
Lecture 06-Describing Data Visual Information
No ratings yet
Lecture 06-Describing Data Visual Information
49 pages
DSBDAL - Assignment No 3
No ratings yet
DSBDAL - Assignment No 3
4 pages
3 Dispersion Skewness Kurtosis PDF
No ratings yet
3 Dispersion Skewness Kurtosis PDF
42 pages
RM EBBA Class 8 CH0 11 Quatitative Analysis
No ratings yet
RM EBBA Class 8 CH0 11 Quatitative Analysis
37 pages
Chapter 1
No ratings yet
Chapter 1
44 pages
Understanding Data Variability
No ratings yet
Understanding Data Variability
79 pages
Intro to Normal Distribution
No ratings yet
Intro to Normal Distribution
16 pages
Week - 1 Day - 1 Descriptive Statistics
No ratings yet
Week - 1 Day - 1 Descriptive Statistics
40 pages
Unit 1 Computational Statistics
No ratings yet
Unit 1 Computational Statistics
58 pages
Lecture Notes 2 - Descriptive Statistics-1720598791715
No ratings yet
Lecture Notes 2 - Descriptive Statistics-1720598791715
21 pages
Descriptive Stat
No ratings yet
Descriptive Stat
13 pages
UNIT 2 Descriptive Statistics (BSC)
No ratings yet
UNIT 2 Descriptive Statistics (BSC)
28 pages
Unit 4 - Measures of Variability
No ratings yet
Unit 4 - Measures of Variability
24 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
23 pages
Muthayammal College of Arts and Science Rasipuram: Assignment No - 3
No ratings yet
Muthayammal College of Arts and Science Rasipuram: Assignment No - 3
8 pages
BA20 Session2 M
No ratings yet
BA20 Session2 M
40 pages
FINAL (PS) - PR2 11 - 12 - UNIT 7 - LESSON 1 - Descriptive Statistics For Quantitative Data
No ratings yet
FINAL (PS) - PR2 11 - 12 - UNIT 7 - LESSON 1 - Descriptive Statistics For Quantitative Data
59 pages
Biostat Ch-5
No ratings yet
Biostat Ch-5
58 pages
Session 3
No ratings yet
Session 3
11 pages
Raihan Ilham Ramadhan - Tugas Minggu 3 - Statistika 1C PDF
No ratings yet
Raihan Ilham Ramadhan - Tugas Minggu 3 - Statistika 1C PDF
3 pages
Column Level Inspection Report
No ratings yet
Column Level Inspection Report
4 pages
Standard Deviation and Variance
No ratings yet
Standard Deviation and Variance
4 pages
ISO Tolerance Zones and Dimensions
No ratings yet
ISO Tolerance Zones and Dimensions
1 page
Esc2023 TP07 M3 Iso1 en A2
No ratings yet
Esc2023 TP07 M3 Iso1 en A2
2 pages
Catalogue Profil
No ratings yet
Catalogue Profil
104 pages
Statistika Bisnis UTS 2021
No ratings yet
Statistika Bisnis UTS 2021
8 pages
Measures of Variation in Statistics
No ratings yet
Measures of Variation in Statistics
22 pages
Excel Assignment Raw Data
No ratings yet
Excel Assignment Raw Data
9 pages
Evaluating Regression Models: 5 Metrics
No ratings yet
Evaluating Regression Models: 5 Metrics
11 pages
Notes Mean, Median, Standard Deviation
No ratings yet
Notes Mean, Median, Standard Deviation
25 pages
Assem - Belt+Roller+Support 2
No ratings yet
Assem - Belt+Roller+Support 2
6 pages
Variability and Its Measures Upto Group Series SD Exercise
No ratings yet
Variability and Its Measures Upto Group Series SD Exercise
48 pages
Tolerances and Adjustments For Holes and Shafts
No ratings yet
Tolerances and Adjustments For Holes and Shafts
14 pages
Psychology Stats: Dispersion Basics
No ratings yet
Psychology Stats: Dispersion Basics
8 pages
(4.3) GE 112 Data Management - Measures of Dispersion
No ratings yet
(4.3) GE 112 Data Management - Measures of Dispersion
20 pages
Math Exam Paper Analysis
No ratings yet
Math Exam Paper Analysis
3 pages
AFLFibonacci Support Resistance Trend Line
No ratings yet
AFLFibonacci Support Resistance Trend Line
10 pages
Solved Practice Questions Lecture 26-28
100% (1)
Solved Practice Questions Lecture 26-28
9 pages
Chapter 3-Forecast Error
No ratings yet
Chapter 3-Forecast Error
7 pages
Ggshot - Original Script
No ratings yet
Ggshot - Original Script
11 pages
Sales Forecasting Methods Comparison
No ratings yet
Sales Forecasting Methods Comparison
16 pages
F-Test Analysis for Variance Data
No ratings yet
F-Test Analysis for Variance Data
14 pages
Starbucks Case Data
No ratings yet
Starbucks Case Data
32 pages
Chapter 4 Lesson 3 Mesaures of Dispersion 1
No ratings yet
Chapter 4 Lesson 3 Mesaures of Dispersion 1
9 pages
Sir JJ 2025
No ratings yet
Sir JJ 2025
3 pages
Geo Prac Exercise 11
No ratings yet
Geo Prac Exercise 11
3 pages
2019 Chemical Budget Variance Summary
No ratings yet
2019 Chemical Budget Variance Summary
17 pages
Aiag SPC
No ratings yet
Aiag SPC
31 pages
MRF LTD Price Trends 2010-2019
No ratings yet
MRF LTD Price Trends 2010-2019
23 pages

Python Statisc

Uploaded by

Python Statisc

Uploaded by

Assignment 7: Study and Computation of Basic

Statistical Parameters of Variables in Python using

 Central Tendency: Mean, Median, Mode

 Dispersion: Variance, Standard Deviation, Range

 Shape of the Distribution: Skewness, Kurtosis

✅ Key Statistical Parameters & Computation

Variance=∑(X−μ)2n\text{Variance} = \frac{\sum{(X - \mu)^2}}{n}Variance=n∑(X−μ)2

Standard Deviation=Variance\text{Standard Deviation} =

Range=Max−Min\text{Range} = \text{Max} - \text{Min}Range=Max−Min

df['range'] = df['column_name'].max() - df['column_name'].min()

Skewness measures the asymmetry of the data around its mean.

Skewness=∑(X−μ)3n⋅σ3\text{Skewness} = \frac{\sum{(X - \mu)^3}}{n \cdot

from scipy.stats import skew

Kurtosis=∑(X−μ)4n⋅σ4−3\text{Kurtosis} = \frac{\sum{(X - \mu)^4}}{n \cdot \sigma^4} -

from scipy.stats import kurtosis

from scipy.stats import skew, kurtosis

range_age = df['Age'].max() - df['Age'].min()

print(f"Standard Deviation: {std_dev_age}")

Questions and Answers

Q1: What is the difference between mean and median?

 Mean is the average of all values, which can be affected by outliers.

Q2: How do you compute variance and why is it important?

 Standard Deviation = Variance\sqrt{\text{Variance}}Variance

 Positive skew means the tail is on the right.

 Negative skew means the tail is on the left.

 A value of 0 indicates a symmetric distribution.

Q5: What is kurtosis, and why is it used?

Q6: How do you interpret the skewness value?

 Skewness > 0: Right-skewed distribution (long tail on the right).

 Skewness < 0: Left-skewed distribution (long tail on the left).

 Skewness ≈ 0: Symmetric distribution.

Q7: How can you find the range of a dataset in Python?

range_value = df['column_name'].max() - df['column_name'].min()

Q8: What does a negative kurtosis indicate?

Q9: How can you calculate the mode of a dataset?

Q10: What does it mean if the standard deviation is very high?

Statistic Code Example

Standard Dev. df['col'].std()

Range df['col'].max() - df['col'].min()

🏆 Real-Life Use Case Example:

from scipy.stats import skew, kurtosis

print(f"Mean Score: {mean_score}")

print(f"Median Score: {median_score}")

print(f"Standard Deviation: {std_dev_score}")

You might also like