Machine Learning Algorithms - Detailed Explanation
Supervised Learning Models
1. Linear Regression
- Models a linear relationship between inputs and a continuous output.
- Use Cases: Stock prices, housing, lifetime value.
- Pros: Simple, fast, interpretable.
- Cons: Assumes linearity, sensitive to outliers.
2. Ridge Regression
- Adds L2 penalty to reduce overfitting.
- Use Cases: Predictive maintenance, sales revenue.
- Pros: Handles multicollinearity, keeps all features.
- Cons: No feature elimination.
3. Lasso Regression
- Adds L1 penalty, can zero out features.
- Use Cases: Housing prices, clinical outcomes.
- Pros: Feature selection, handles high-dimensional data.
- Cons: May retain correlated features.
4. Logistic Regression
- Predicts categorical outcomes (0 or 1).
- Use Cases: Churn, credit risk.
- Pros: Simple, interpretable.
- Cons: Assumes linearity, can overfit.
5. Decision Tree
- Splits features into branches to predict outcome.
- Use Cases: Churn, credit scoring, disease prediction.
- Pros: Interpretable, handles missing data.
- Cons: Overfitting, sensitive to outliers.
6. Random Forest
- Ensemble of decision trees.
- Use Cases: Housing prices, credit scoring.
- Pros: Reduces overfitting, high accuracy.
- Cons: Less interpretable, complex.
7. XGBoost
- High performance gradient boosting.
- Use Cases: Churn, insurance claims.
- Pros: Accurate, captures non-linearity.
Machine Learning Algorithms - Detailed Explanation
- Cons: Complex tuning, not ideal for sparse data.
8. LightGBM Regressor
- Fast, efficient gradient boosting.
- Use Cases: Flight time, cholesterol prediction.
- Pros: Fast, low memory.
- Cons: Overfitting risk, complex tuning.
9. Gradient Boosting Regression
- Boosting of weak learners for regression.
- Use Cases: Emissions, fare prediction.
- Pros: Handles non-linearity and multicollinearity.
- Cons: Sensitive to outliers, slow.
Unsupervised Learning Models
1. K-Means Clustering
- Partitions data into k clusters.
- Use Cases: Customer segmentation, recommendations.
- Pros: Scalable, simple.
- Cons: Need to specify k, struggles with irregular clusters.
2. Hierarchical Clustering
- Bottom-up clustering, produces dendrogram.
- Use Cases: Fraud detection, document grouping.
- Pros: No need to specify clusters, visual dendrogram.
- Cons: Not scalable, suboptimal clustering.
3. Gaussian Mixture Models (GMM)
- Probabilistic assignment to Gaussian-distributed clusters.
- Use Cases: Customer segmentation, recommendations.
- Pros: Overlapping clusters, probabilistic output.
- Cons: Needs number of clusters, complex tuning.
Association Rule Learning
1. Apriori Algorithm
- Finds frequent itemsets and derives rules.
- Use Cases: Product placement, promotions.
- Pros: Intuitive, exhaustive rule finding.
- Cons: Generates too many rules, resource intensive.