Machine Learning Algorithms Explained
KNN (K-Nearest Neighbors)
- Type: Supervised, Non-parametric
- How it works:
- Stores all training data.
- For a new point, it finds the 'K' closest data points.
- Classification: majority vote.
- Regression: average value.
- Pros: Simple, no training needed.
- Cons: Slow for large data, sensitive to irrelevant features.
Random Forest
- Type: Ensemble of Decision Trees
- How it works:
- Builds many decision trees using random subsets of data/features.
- Classification: majority voting.
- Regression: average result.
- Pros: High accuracy, handles missing values.
- Cons: Slower and less interpretable than single tree.
SVR (Support Vector Regression)
- Type: Regression (based on SVM)
- How it works:
- Fits the best line within a margin (epsilon).
- Points outside margin are penalized.
- Uses kernels for non-linear data.
- Pros: Works well in high dimensions, kernel flexibility.
- Cons: Hard to tune, slow for large data.
Gradient Boosting
- Type: Ensemble (Boosting)
- How it works:
- Sequentially builds trees correcting previous errors.
- Focuses more on difficult cases.
- Pros: High accuracy, captures complex patterns.
- Cons: Slow training, prone to overfitting.
XGBoost (Extreme Gradient Boosting)
- Type: Optimized Gradient Boosting
- How it works:
- Similar to Gradient Boosting but faster, regularized.
- Includes pruning, regularization, parallelism.
- Pros: Fast, accurate, widely used.
- Cons: Complex to tune.
AdaBoost (Adaptive Boosting)
- Type: Ensemble (Boosting)
- How it works:
- Trains weak learners sequentially.
- Each focuses on previous errors.
- Assigns weights to predictions.
- Pros: Simple, effective on clean data.
- Cons: Sensitive to noise and outliers.
Extra Trees (Extremely Randomized Trees)
- Type: Ensemble of Decision Trees
- How it works:
- Similar to Random Forest but more randomness.
- Random thresholds for splits.
- Pros: Fast, reduces overfitting.
- Cons: Might be less accurate than Random Forest.
Comparison Table
Feature KNN Random SVR Gradient XGBoost AdaBoos Extra
Forest Boosting t Trees
Type Lazy, Ensembl Margin- Ensembl Gradient Boosting Ensembl
Instance e based e Boosting e
-based (Bagging Regressi (Boostin (Bagging
) on g) )
Training Fast Medium Slow Slow Fast Medium Fast
Time
Predictio Slow Medium Medium Slow Fast Medium Fast
n Time
Accuracy Medium High Medium High Very High High
High
Overfitti High Low Medium High (if Low High Low
ng Risk not (with (with
tuned) tuning) noise)
Handles Yes Yes Yes Yes Yes Yes Yes
Non- (distanc (with
linearity e-based) kernels)
Feature No Yes No Yes Yes Yes Yes
Importa
nce
Robust No Yes No No Yes No Yes
to
Outliers