0% found this document useful (0 votes)
24 views7 pages

Python Statisc

This document outlines the computation of basic statistical parameters using Python libraries Pandas, NumPy, and SciPy, including measures of central tendency, dispersion, and distribution shape. It provides formulas and code examples for calculating mean, median, mode, variance, standard deviation, range, skewness, and kurtosis. Additionally, it includes a real-life use case example analyzing student exam scores.

Uploaded by

themanhector24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views7 pages

Python Statisc

This document outlines the computation of basic statistical parameters using Python libraries Pandas, NumPy, and SciPy, including measures of central tendency, dispersion, and distribution shape. It provides formulas and code examples for calculating mean, median, mode, variance, standard deviation, range, skewness, and kurtosis. Additionally, it includes a real-life use case example analyzing student exam scores.

Uploaded by

themanhector24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Assignment 7: Study and Computation of Basic

Statistical Parameters of Variables in Python using


Pandas, NumPy, and SciPy
Objective:

The goal of this topic is to help you understand how to compute basic statistical measures for
variables, such as:

 Central Tendency: Mean, Median, Mode

 Dispersion: Variance, Standard Deviation, Range

 Shape of the Distribution: Skewness, Kurtosis

We will use Pandas, NumPy, and SciPy to calculate and interpret these parameters.

✅ Key Statistical Parameters & Computation

1. Mean (Average)

The mean is the sum of all values divided by the number of values.

Formula:

Mean=∑Xn\text{Mean} = \frac{\sum{X}}{n}Mean=n∑X

Pandas Example:

df['mean'] = df['column_name'].mean()

2. Median

The median is the middle value when the data is sorted in order.

Pandas Example:

df['median'] = df['column_name'].median()

3. Mode

The mode is the value that appears most frequently in the data.

Pandas Example:

df['mode'] = df['column_name'].mode()

4. Variance

Variance measures how far the data points are spread out from the mean.

Formula:

Variance=∑(X−μ)2n\text{Variance} = \frac{\sum{(X - \mu)^2}}{n}Variance=n∑(X−μ)2


Pandas Example:

df['variance'] = df['column_name'].var()

5. Standard Deviation

Standard deviation is the square root of variance and gives the spread of data in the same units as
the data.

Formula:

Standard Deviation=Variance\text{Standard Deviation} =


\sqrt{\text{Variance}}Standard Deviation=Variance

Pandas Example:

df['std_dev'] = df['column_name'].std()

6. Range

The range is the difference between the maximum and minimum values in the data.

Formula:

Range=Max−Min\text{Range} = \text{Max} - \text{Min}Range=Max−Min

Pandas Example:

df['range'] = df['column_name'].max() - df['column_name'].min()

7. Skewness

Skewness measures the asymmetry of the data around its mean.

Formula:

Skewness=∑(X−μ)3n⋅σ3\text{Skewness} = \frac{\sum{(X - \mu)^3}}{n \cdot


\sigma^3}Skewness=n⋅σ3∑(X−μ)3

SciPy Example:

from scipy.stats import skew

skew_value = skew(df['column_name'])

8. Kurtosis

Kurtosis measures the "tailedness" of the data distribution (whether it has outliers or not).

Formula:

Kurtosis=∑(X−μ)4n⋅σ4−3\text{Kurtosis} = \frac{\sum{(X - \mu)^4}}{n \cdot \sigma^4} -


3Kurtosis=n⋅σ4∑(X−μ)4−3

SciPy Example:

from scipy.stats import kurtosis

kurt_value = kurtosis(df['column_name'])
✅ Using Pandas, NumPy, and SciPy to Compute Statistical Parameters

import pandas as pd

import numpy as np

from scipy.stats import skew, kurtosis

# Sample DataFrame

data = {'Age': [22, 34, 25, 33, 22, 45, 33, 29, 41, 36]}

df = pd.DataFrame(data)

# Mean

mean_age = df['Age'].mean()

# Median

median_age = df['Age'].median()

# Mode

mode_age = df['Age'].mode()[0]

# Variance

variance_age = df['Age'].var()

# Standard Deviation

std_dev_age = df['Age'].std()

# Range

range_age = df['Age'].max() - df['Age'].min()

# Skewness

skew_value = skew(df['Age'])
# Kurtosis

kurt_value = kurtosis(df['Age'])

print(f"Mean: {mean_age}")

print(f"Median: {median_age}")

print(f"Mode: {mode_age}")

print(f"Variance: {variance_age}")

print(f"Standard Deviation: {std_dev_age}")

print(f"Range: {range_age}")

print(f"Skewness: {skew_value}")

print(f"Kurtosis: {kurt_value}")

Questions and Answers

Q1: What is the difference between mean and median?

Answer:

 Mean is the average of all values, which can be affected by outliers.

 Median is the middle value when the data is sorted, and it is not affected by outliers.

Q2: How do you compute variance and why is it important?

Answer:
Variance is computed by averaging the squared deviations of each data point from the mean. It
measures how spread out the data is. Higher variance means the data points are more dispersed.

Q3: What is the formula for standard deviation, and how is it different from variance?

Answer:

 Standard Deviation = Variance\sqrt{\text{Variance}}Variance

 It is the square root of variance and provides a measure of spread in the same units as the
data.

Q4: How would you calculate skewness and what does it indicate about data?
Answer:
Skewness measures the asymmetry of data around its mean:

 Positive skew means the tail is on the right.

 Negative skew means the tail is on the left.

 A value of 0 indicates a symmetric distribution.

Q5: What is kurtosis, and why is it used?

Answer:
Kurtosis measures the "tailedness" of the data distribution. A higher kurtosis indicates more extreme
values (outliers), while a lower kurtosis indicates fewer extreme values.

Q6: How do you interpret the skewness value?

Answer:

 Skewness > 0: Right-skewed distribution (long tail on the right).

 Skewness < 0: Left-skewed distribution (long tail on the left).

 Skewness ≈ 0: Symmetric distribution.

Q7: How can you find the range of a dataset in Python?

Answer:
Range is the difference between the maximum and minimum values of a dataset:

range_value = df['column_name'].max() - df['column_name'].min()

Q8: What does a negative kurtosis indicate?

Answer:
A negative kurtosis indicates a platykurtic distribution (flatter than normal distribution), meaning
fewer and less extreme outliers.

Q9: How can you calculate the mode of a dataset?

Answer:
You can use Pandas' mode() function, which returns the most frequent value(s):

mode_value = df['column_name'].mode()[0]

Q10: What does it mean if the standard deviation is very high?


Answer:
A high standard deviation indicates that the data points are spread out widely from the mean,
meaning there's high variability in the data.

📋 Summary Table:

Statistic Code Example

Mean df['col'].mean()

Median df['col'].median()

Mode df['col'].mode()

Variance df['col'].var()

Standard Dev. df['col'].std()

Range df['col'].max() - df['col'].min()

Skewness skew(df['col'])

Kurtosis kurtosis(df['col'])

🏆 Real-Life Use Case Example:

Imagine you're working with a dataset of student scores in an exam. You want to analyze the scores
by calculating the mean, median, standard deviation, and skewness to understand the distribution of
the scores.

python

import pandas as pd

import numpy as np

from scipy.stats import skew, kurtosis

# Sample Data

data = {'Scores': [95, 84, 72, 88, 91, 78, 85, 92, 76, 79]}

df = pd.DataFrame(data)

# Mean

mean_score = df['Scores'].mean()
# Median

median_score = df['Scores'].median()

# Standard Deviation

std_dev_score = df['Scores'].std()

# Skewness

skew_value = skew(df['Scores'])

print(f"Mean Score: {mean_score}")

print(f"Median Score: {median_score}")

print(f"Standard Deviation: {std_dev_score}")

print(f"Skewness: {skew_value}")

You might also like