Comprehensive Practice Guide for Revising
Statistics and Implementation in Python
Day 1: Descriptive Statistics
Theory:
Topics to Revise:
o Mean, Median, Mode
o Variance, Standard Deviation
o Range, Quartiles, Percentiles
o Skewness and Kurtosis
Python Practice:
1. Calculating Descriptive Statistics:
import numpy as np
import scipy.stats as stats
data = [15, 20, 35, 40, 50]
print("Mean:", np.mean(data))
print("Median:", np.median(data))
print("Mode:", stats.mode(data))
print("Variance:", np.var(data))
print("Standard Deviation:", np.std(data))
2. Box Plot and Quartiles:
import matplotlib.pyplot as plt
plt.boxplot(data)
plt.title("Box Plot")
plt.show()
Day 2: Probability
Theory:
Topics to Revise:
o Basic Probability Rules
o Conditional Probability
o Bayes’ Theorem
o Random Variables
Python Practice:
1. Simulating Coin Tosses:
import random
results = [random.choice(["Heads", "Tails"]) for _ in
range(1000)]
print("Heads:", results.count("Heads"))
print("Tails:", results.count("Tails"))
2. Conditional Probability Using Pandas:
import pandas as pd
data = pd.DataFrame({
"Event": ["A", "A", "B", "B"],
"Condition": ["X", "Y", "X", "Y"],
"Frequency": [30, 20, 50, 10]
})
prob_a_given_x = data[(data["Event"] == "A") & (data["Condition"]
== "X")]["Frequency"].sum() / data[data["Condition"] == "X"]
["Frequency"].sum()
print("P(A|X):", prob_a_given_x)
Day 3: Probability Distributions
Theory:
Topics to Revise:
o Binomial Distribution
o Normal Distribution
o Poisson Distribution
Python Practice:
1. Binomial Distribution:
from scipy.stats import binom
n, p = 10, 0.5
k = 5
print("P(X=5):", binom.pmf(k, n, p))
2. Normal Distribution Visualization:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-3, 3, 100)
y = stats.norm.pdf(x, loc=0, scale=1)
plt.plot(x, y)
plt.title("Normal Distribution")
plt.show()
Day 4: Hypothesis Testing
Theory:
Topics to Revise:
o Null and Alternative Hypotheses
o Types of Errors (Type I and II)
o p-value
o t-test, z-test, ANOVA
Python Practice:
1. t-test:
from scipy.stats import ttest_ind
group1 = [20, 21, 19, 22, 20]
group2 = [30, 31, 29, 32, 30]
t_stat, p_val = ttest_ind(group1, group2)
print("t-statistic:", t_stat, "p-value:", p_val)
2. ANOVA:
from scipy.stats import f_oneway
group1 = [20, 21, 19, 22, 20]
group2 = [30, 31, 29, 32, 30]
group3 = [25, 26, 24, 27, 25]
f_stat, p_val = f_oneway(group1, group2, group3)
print("F-statistic:", f_stat, "p-value:", p_val)
Day 5: Regression Analysis
Theory:
Topics to Revise:
o Linear Regression
o Multiple Linear Regression
o Assumptions of Regression
Python Practice:
1. Simple Linear Regression:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
x = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 5, 4, 5])
model = LinearRegression()
model.fit(x, y)
plt.scatter(x, y, color='blue')
plt.plot(x, model.predict(x), color='red')
plt.title("Linear Regression")
plt.show()
2. Multiple Linear Regression:
from sklearn.linear_model import LinearRegression
import pandas as pd
data = pd.DataFrame({
"X1": [1, 2, 3, 4, 5],
"X2": [5, 4, 3, 2, 1],
"Y": [2, 4, 5, 4, 5]
})
X = data[["X1", "X2"]]
y = data["Y"]
model = LinearRegression()
model.fit(X, y)
print("Coefficients:", model.coef_)
Day 6: Clustering and Visualization
Theory:
Topics to Revise:
o k-Means Clustering
o Hierarchical Clustering
Python Practice:
1. k-Means Clustering:
from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt
data = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10,
0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(data)
plt.scatter(data[:, 0], data[:, 1], c=kmeans.labels_)
plt.scatter(kmeans.cluster_centers_[:, 0],
kmeans.cluster_centers_[:, 1], s=200, c='red', marker='X')
plt.title("k-Means Clustering")
plt.show()
2. Hierarchical Clustering:
from scipy.cluster.hierarchy import dendrogram, linkage
import numpy as np
import matplotlib.pyplot as plt
data = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10,
0]])
linked = linkage(data, method='ward')
dendrogram(linked)
plt.title("Hierarchical Clustering")
plt.show()
Day 7: Time Series Analysis
Theory:
Topics to Revise:
o Moving Averages
o ARIMA Model
Python Practice:
1. Simple Moving Average:
import pandas as pd
data = pd.Series([1, 2, 3, 4, 5, 6, 7])
print(data.rolling(window=3).mean())
2. ARIMA Model:
from statsmodels.tsa.arima.model import ARIMA
series = [1, 2, 3, 4, 5, 6, 7]
model = ARIMA(series, order=(1, 0, 0))
model_fit = model.fit()
print(model_fit.summary())
Day 8: Data Visualization Techniques
Theory:
Topics to Revise:
o Effective Chart Selection
o Misleading Visualizations
Python Practice:
1. Comparative Bar Chart:
import matplotlib.pyplot as plt
categories = ["A", "B", "C"]
values1 = [3, 7, 8]
values2 = [2, 6, 9]
x = range(len(categories))
plt.bar(x, values1, width=0.4, label='Group 1', align='center')
plt.bar(x, values2, width=0.4, label='Group 2', align='edge')
plt.xticks(x, categories)
plt.legend()
plt.show()
2. Heatmap with Seaborn:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
data = np.random.rand(10, 10)
sns.heatmap(data, annot=True, cmap="coolwarm")
plt.show()
Day 9: Advanced Probability Models
Theory:
Topics to Revise:
o Markov Chains
o Hidden Markov Models
Python Practice:
1. Markov Chain Transition Matrix:
import numpy as np
P = np.array([[0.7, 0.3], [0.4, 0.6]])
state = np.array([1, 0])
print("Next State:", np.dot(state, P))
2. HMM with hmmlearn:
from hmmlearn import hmm
import numpy as np
model = hmm.GaussianHMM(n_components=2, covariance_type="diag")
data = np.random.rand(100, 1)
model.fit(data)
print("Transition Matrix:", model.transmat_)
Day 10: Optimization Techniques
Theory:
Topics to Revise:
o Linear Programming
o Gradient Descent
Python Practice:
1. Linear Programming:
from scipy.optimize import linprog
c = [-1, -2]
A = [[2, 1], [1, 1]]
b = [20, 16]
bounds = [(0, None), (0, None)]
res = linprog(c, A_ub=A, b_ub=b, bounds=bounds, method='highs')
print("Optimal Solution:", res.x)
2. Gradient Descent:
import numpy as np
def gradient_descent(x0, lr, num_iter):
x = x0
for _ in range(num_iter):
grad = 2 * x
x -= lr * grad
return x
print("Minimum Point:", gradient_descent(10, 0.1, 100))