Assignment 7: Study and Computation of Basic
Statistical Parameters of Variables in Python using
Pandas, NumPy, and SciPy
Objective:
The goal of this topic is to help you understand how to compute basic statistical measures for
variables, such as:
Central Tendency: Mean, Median, Mode
Dispersion: Variance, Standard Deviation, Range
Shape of the Distribution: Skewness, Kurtosis
We will use Pandas, NumPy, and SciPy to calculate and interpret these parameters.
✅ Key Statistical Parameters & Computation
1. Mean (Average)
The mean is the sum of all values divided by the number of values.
Formula:
Mean=∑Xn\text{Mean} = \frac{\sum{X}}{n}Mean=n∑X
Pandas Example:
df['mean'] = df['column_name'].mean()
2. Median
The median is the middle value when the data is sorted in order.
Pandas Example:
df['median'] = df['column_name'].median()
3. Mode
The mode is the value that appears most frequently in the data.
Pandas Example:
df['mode'] = df['column_name'].mode()
4. Variance
Variance measures how far the data points are spread out from the mean.
Formula:
Variance=∑(X−μ)2n\text{Variance} = \frac{\sum{(X - \mu)^2}}{n}Variance=n∑(X−μ)2
Pandas Example:
df['variance'] = df['column_name'].var()
5. Standard Deviation
Standard deviation is the square root of variance and gives the spread of data in the same units as
the data.
Formula:
Standard Deviation=Variance\text{Standard Deviation} =
\sqrt{\text{Variance}}Standard Deviation=Variance
Pandas Example:
df['std_dev'] = df['column_name'].std()
6. Range
The range is the difference between the maximum and minimum values in the data.
Formula:
Range=Max−Min\text{Range} = \text{Max} - \text{Min}Range=Max−Min
Pandas Example:
df['range'] = df['column_name'].max() - df['column_name'].min()
7. Skewness
Skewness measures the asymmetry of the data around its mean.
Formula:
Skewness=∑(X−μ)3n⋅σ3\text{Skewness} = \frac{\sum{(X - \mu)^3}}{n \cdot
\sigma^3}Skewness=n⋅σ3∑(X−μ)3
SciPy Example:
from scipy.stats import skew
skew_value = skew(df['column_name'])
8. Kurtosis
Kurtosis measures the "tailedness" of the data distribution (whether it has outliers or not).
Formula:
Kurtosis=∑(X−μ)4n⋅σ4−3\text{Kurtosis} = \frac{\sum{(X - \mu)^4}}{n \cdot \sigma^4} -
3Kurtosis=n⋅σ4∑(X−μ)4−3
SciPy Example:
from scipy.stats import kurtosis
kurt_value = kurtosis(df['column_name'])
✅ Using Pandas, NumPy, and SciPy to Compute Statistical Parameters
import pandas as pd
import numpy as np
from scipy.stats import skew, kurtosis
# Sample DataFrame
data = {'Age': [22, 34, 25, 33, 22, 45, 33, 29, 41, 36]}
df = pd.DataFrame(data)
# Mean
mean_age = df['Age'].mean()
# Median
median_age = df['Age'].median()
# Mode
mode_age = df['Age'].mode()[0]
# Variance
variance_age = df['Age'].var()
# Standard Deviation
std_dev_age = df['Age'].std()
# Range
range_age = df['Age'].max() - df['Age'].min()
# Skewness
skew_value = skew(df['Age'])
# Kurtosis
kurt_value = kurtosis(df['Age'])
print(f"Mean: {mean_age}")
print(f"Median: {median_age}")
print(f"Mode: {mode_age}")
print(f"Variance: {variance_age}")
print(f"Standard Deviation: {std_dev_age}")
print(f"Range: {range_age}")
print(f"Skewness: {skew_value}")
print(f"Kurtosis: {kurt_value}")
Questions and Answers
Q1: What is the difference between mean and median?
Answer:
Mean is the average of all values, which can be affected by outliers.
Median is the middle value when the data is sorted, and it is not affected by outliers.
Q2: How do you compute variance and why is it important?
Answer:
Variance is computed by averaging the squared deviations of each data point from the mean. It
measures how spread out the data is. Higher variance means the data points are more dispersed.
Q3: What is the formula for standard deviation, and how is it different from variance?
Answer:
Standard Deviation = Variance\sqrt{\text{Variance}}Variance
It is the square root of variance and provides a measure of spread in the same units as the
data.
Q4: How would you calculate skewness and what does it indicate about data?
Answer:
Skewness measures the asymmetry of data around its mean:
Positive skew means the tail is on the right.
Negative skew means the tail is on the left.
A value of 0 indicates a symmetric distribution.
Q5: What is kurtosis, and why is it used?
Answer:
Kurtosis measures the "tailedness" of the data distribution. A higher kurtosis indicates more extreme
values (outliers), while a lower kurtosis indicates fewer extreme values.
Q6: How do you interpret the skewness value?
Answer:
Skewness > 0: Right-skewed distribution (long tail on the right).
Skewness < 0: Left-skewed distribution (long tail on the left).
Skewness ≈ 0: Symmetric distribution.
Q7: How can you find the range of a dataset in Python?
Answer:
Range is the difference between the maximum and minimum values of a dataset:
range_value = df['column_name'].max() - df['column_name'].min()
Q8: What does a negative kurtosis indicate?
Answer:
A negative kurtosis indicates a platykurtic distribution (flatter than normal distribution), meaning
fewer and less extreme outliers.
Q9: How can you calculate the mode of a dataset?
Answer:
You can use Pandas' mode() function, which returns the most frequent value(s):
mode_value = df['column_name'].mode()[0]
Q10: What does it mean if the standard deviation is very high?
Answer:
A high standard deviation indicates that the data points are spread out widely from the mean,
meaning there's high variability in the data.
📋 Summary Table:
Statistic Code Example
Mean df['col'].mean()
Median df['col'].median()
Mode df['col'].mode()
Variance df['col'].var()
Standard Dev. df['col'].std()
Range df['col'].max() - df['col'].min()
Skewness skew(df['col'])
Kurtosis kurtosis(df['col'])
🏆 Real-Life Use Case Example:
Imagine you're working with a dataset of student scores in an exam. You want to analyze the scores
by calculating the mean, median, standard deviation, and skewness to understand the distribution of
the scores.
python
import pandas as pd
import numpy as np
from scipy.stats import skew, kurtosis
# Sample Data
data = {'Scores': [95, 84, 72, 88, 91, 78, 85, 92, 76, 79]}
df = pd.DataFrame(data)
# Mean
mean_score = df['Scores'].mean()
# Median
median_score = df['Scores'].median()
# Standard Deviation
std_dev_score = df['Scores'].std()
# Skewness
skew_value = skew(df['Scores'])
print(f"Mean Score: {mean_score}")
print(f"Median Score: {median_score}")
print(f"Standard Deviation: {std_dev_score}")
print(f"Skewness: {skew_value}")