0% found this document useful (0 votes)
15 views8 pages

Statistics Practice Guide

Uploaded by

katasanipandu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views8 pages

Statistics Practice Guide

Uploaded by

katasanipandu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Comprehensive Practice Guide for Revising

Statistics and Implementation in Python


Day 1: Descriptive Statistics
Theory:
 Topics to Revise:
o Mean, Median, Mode
o Variance, Standard Deviation
o Range, Quartiles, Percentiles
o Skewness and Kurtosis

Python Practice:
1. Calculating Descriptive Statistics:

import numpy as np
import scipy.stats as stats

data = [15, 20, 35, 40, 50]


print("Mean:", np.mean(data))
print("Median:", np.median(data))
print("Mode:", stats.mode(data))
print("Variance:", np.var(data))
print("Standard Deviation:", np.std(data))

2. Box Plot and Quartiles:

import matplotlib.pyplot as plt

plt.boxplot(data)
plt.title("Box Plot")
plt.show()

Day 2: Probability
Theory:
 Topics to Revise:
o Basic Probability Rules
o Conditional Probability
o Bayes’ Theorem
o Random Variables
Python Practice:
1. Simulating Coin Tosses:

import random

results = [random.choice(["Heads", "Tails"]) for _ in


range(1000)]
print("Heads:", results.count("Heads"))
print("Tails:", results.count("Tails"))

2. Conditional Probability Using Pandas:

import pandas as pd

data = pd.DataFrame({
"Event": ["A", "A", "B", "B"],
"Condition": ["X", "Y", "X", "Y"],
"Frequency": [30, 20, 50, 10]
})

prob_a_given_x = data[(data["Event"] == "A") & (data["Condition"]


== "X")]["Frequency"].sum() / data[data["Condition"] == "X"]
["Frequency"].sum()
print("P(A|X):", prob_a_given_x)

Day 3: Probability Distributions


Theory:
 Topics to Revise:
o Binomial Distribution
o Normal Distribution
o Poisson Distribution

Python Practice:
1. Binomial Distribution:

from scipy.stats import binom

n, p = 10, 0.5
k = 5
print("P(X=5):", binom.pmf(k, n, p))

2. Normal Distribution Visualization:

import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-3, 3, 100)
y = stats.norm.pdf(x, loc=0, scale=1)
plt.plot(x, y)
plt.title("Normal Distribution")
plt.show()

Day 4: Hypothesis Testing


Theory:
 Topics to Revise:
o Null and Alternative Hypotheses
o Types of Errors (Type I and II)
o p-value
o t-test, z-test, ANOVA

Python Practice:
1. t-test:

from scipy.stats import ttest_ind

group1 = [20, 21, 19, 22, 20]


group2 = [30, 31, 29, 32, 30]
t_stat, p_val = ttest_ind(group1, group2)
print("t-statistic:", t_stat, "p-value:", p_val)

2. ANOVA:

from scipy.stats import f_oneway

group1 = [20, 21, 19, 22, 20]


group2 = [30, 31, 29, 32, 30]
group3 = [25, 26, 24, 27, 25]
f_stat, p_val = f_oneway(group1, group2, group3)
print("F-statistic:", f_stat, "p-value:", p_val)

Day 5: Regression Analysis


Theory:
 Topics to Revise:
o Linear Regression
o Multiple Linear Regression
o Assumptions of Regression
Python Practice:
1. Simple Linear Regression:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

x = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 5, 4, 5])

model = LinearRegression()
model.fit(x, y)
plt.scatter(x, y, color='blue')
plt.plot(x, model.predict(x), color='red')
plt.title("Linear Regression")
plt.show()

2. Multiple Linear Regression:

from sklearn.linear_model import LinearRegression


import pandas as pd

data = pd.DataFrame({
"X1": [1, 2, 3, 4, 5],
"X2": [5, 4, 3, 2, 1],
"Y": [2, 4, 5, 4, 5]
})

X = data[["X1", "X2"]]
y = data["Y"]
model = LinearRegression()
model.fit(X, y)
print("Coefficients:", model.coef_)

Day 6: Clustering and Visualization


Theory:
 Topics to Revise:
o k-Means Clustering
o Hierarchical Clustering

Python Practice:
1. k-Means Clustering:

from sklearn.cluster import KMeans


import numpy as np
import matplotlib.pyplot as plt

data = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10,
0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(data)

plt.scatter(data[:, 0], data[:, 1], c=kmeans.labels_)


plt.scatter(kmeans.cluster_centers_[:, 0],
kmeans.cluster_centers_[:, 1], s=200, c='red', marker='X')
plt.title("k-Means Clustering")
plt.show()

2. Hierarchical Clustering:

from scipy.cluster.hierarchy import dendrogram, linkage


import numpy as np
import matplotlib.pyplot as plt

data = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10,
0]])
linked = linkage(data, method='ward')

dendrogram(linked)
plt.title("Hierarchical Clustering")
plt.show()

Day 7: Time Series Analysis


Theory:
 Topics to Revise:
o Moving Averages
o ARIMA Model

Python Practice:
1. Simple Moving Average:

import pandas as pd
data = pd.Series([1, 2, 3, 4, 5, 6, 7])
print(data.rolling(window=3).mean())

2. ARIMA Model:

from statsmodels.tsa.arima.model import ARIMA

series = [1, 2, 3, 4, 5, 6, 7]
model = ARIMA(series, order=(1, 0, 0))
model_fit = model.fit()
print(model_fit.summary())

Day 8: Data Visualization Techniques


Theory:
 Topics to Revise:
o Effective Chart Selection
o Misleading Visualizations

Python Practice:
1. Comparative Bar Chart:

import matplotlib.pyplot as plt

categories = ["A", "B", "C"]


values1 = [3, 7, 8]
values2 = [2, 6, 9]

x = range(len(categories))
plt.bar(x, values1, width=0.4, label='Group 1', align='center')
plt.bar(x, values2, width=0.4, label='Group 2', align='edge')
plt.xticks(x, categories)
plt.legend()
plt.show()

2. Heatmap with Seaborn:

import seaborn as sns


import numpy as np
import matplotlib.pyplot as plt

data = np.random.rand(10, 10)


sns.heatmap(data, annot=True, cmap="coolwarm")
plt.show()

Day 9: Advanced Probability Models


Theory:
 Topics to Revise:
o Markov Chains
o Hidden Markov Models
Python Practice:
1. Markov Chain Transition Matrix:

import numpy as np

P = np.array([[0.7, 0.3], [0.4, 0.6]])


state = np.array([1, 0])
print("Next State:", np.dot(state, P))

2. HMM with hmmlearn:

from hmmlearn import hmm


import numpy as np

model = hmm.GaussianHMM(n_components=2, covariance_type="diag")


data = np.random.rand(100, 1)
model.fit(data)
print("Transition Matrix:", model.transmat_)

Day 10: Optimization Techniques


Theory:
 Topics to Revise:
o Linear Programming
o Gradient Descent

Python Practice:
1. Linear Programming:

from scipy.optimize import linprog

c = [-1, -2]
A = [[2, 1], [1, 1]]
b = [20, 16]
bounds = [(0, None), (0, None)]

res = linprog(c, A_ub=A, b_ub=b, bounds=bounds, method='highs')


print("Optimal Solution:", res.x)

2. Gradient Descent:

import numpy as np

def gradient_descent(x0, lr, num_iter):


x = x0
for _ in range(num_iter):
grad = 2 * x
x -= lr * grad
return x

print("Minimum Point:", gradient_descent(10, 0.1, 100))

You might also like