Pre-requisites
Python
Numpy & Pandas for data manipulation
Matplotlib & Seaborn for visualization
Module 1: Data Preprocessing & Feature Engineering
Data Cleaning (nulls, outliers, duplicates)
Encoding (Label, One-Hot)
Feature Scaling (Standardization, Normalization)
Feature Selection (Univariate, Multivariate)
Feature Engineering (Binning, Polynomial, Interaction Terms)
Module 2: Model Development - Supervised Learning
Regression:
Linear Regression
Regularization: Ridge, Lasso
Polynomial Regression
Classification:
Logistic Regression
K-Nearest Neighbors (KNN)
Decision Trees
Random Forest
Support Vector Machine (SVM)
Naive Bayes
Gradient Boosting (XGBoost, LightGBM)
Module 3: Model Development - Unsupervised Learning
K-Means Clustering
Hierarchical Clustering
DBSCAN
Principal Component Analysis (PCA)
t-SNE (introductory)
Module 4: Model Evaluation & Validation
Train-Test Split
Cross-Validation
Metrics:
o Regression: MSE, RMSE, R²
o Classification: Accuracy, Precision, Recall, F1-Score, ROC-AUC
Confusion Matrix
Bias-Variance Tradeoff
Underfitting vs. Overfitting
Module 5: Capstone Project (Choose any one)
Movie Recommendation
House Pricing Predictor
Customer churn prediction
Inventory Management