0% found this document useful (0 votes)

21 views34 pages

Aml Unit 1

Uploaded by

aitscserd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views34 pages

Aml Unit 1

Uploaded by

aitscserd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

R23 III B.

Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
Bias and Variance:
 Bias refers to the errors which occur when we try to fit a statistical model on real-world data
which does not fit perfectly well on some mathematical model. If we use a way too simplistic
a model to fit the data then we are more probably face the situation of High
Bias (underfitting) refers to the case when the model is unable to learn the patterns in the
data at hand and perform poorly.
 Variance shows the error value that occurs when we try to make predictions by using data
that is not previously seen by the model. There is a situation known as high
variance (overfitting) that occurs when the model learns noise that is present in the data.
Finding a proper balance between the two is also known as the Bias-Variance Tradeoff which
helps us to design an accurate model.
Bias Variance tradeoff
The Bias-Variance Tradeoff refers to the balance between bias and variance which affect
predictive model performance. Finding the right tradeoff is important for creating models that
generalize well to new data.
 The bias-variance tradeoff shows the inverse relationship between bias and variance. When
one decreases, the other tends to increase and vice versa.
 Finding the right balance is important. An overly simple model with high bias won't capture
the underlying patterns while an overly complex model with high variance will fit the noise
in the data.

Overfitting and Underfitting:

Overfitting and underfitting are terms used to describe the performance of machine learning
models in relation to their ability to generalize from the training data to unseen data.

1
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I

Overfitting happens when a machine learning model learns the training data too well including
the noise and random details. This makes the model to perform poorly on new, unseen data
because it memorizes the training data instead of understanding the general patterns.
For example, if we only study last week’s weather to predict tomorrow’s i.e our model might
focus on one-time events like a sudden rainstorm which won’t help for future predictions.

Underfitting is the opposite problem which happens when the model is too simple to learn even
the basic patterns in the data. An underfitted model performs poorly on both training and new
data. To fix this we need to make the model more complex or add more features.
For example if we use only the average temperature of the year to predict tomorrow’s weather
hence the model misses important details like seasonal changes which results in bad predictions.

2
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
Ensemble Learning
Ensemble learning is a method where we use many small models instead of just one. Each of
these models may not be very strong on its own, but when we put their results together, we get a
better and more accurate answer. It's like asking a group of people for advice instead of just one
person—each one might be a little wrong, but together, they usually give a better answer.
Types of Ensembles Learning in Machine Learning
There are three main types of ensemble methods:
1. Bagging (Bootstrap Aggregating):
Models are trained independently on different random subsets of the training data. Their
results are then combined—usually by averaging (for regression) or voting (for
classification). This helps reduce variance and prevents overfitting.
2. Boosting:
Models are trained one after another. Each new model focuses on fixing the errors made by
the previous ones. The final prediction is a weighted combination of all models, which helps
reduce bias and improve accuracy.
3. Stacking (Stacked Generalization):
Multiple different models (often of different types) are trained, and their predictions are used
as inputs to a final model, called a meta-model. The meta-model learns how to best combine
the predictions of the base models, aiming for better performance than any individual model.
1. Bagging Algorithm
Bagging classifier can be used for both regression and classification tasks. Here is an overview
of Bagging classifier algorithm:
 Bootstrap Sampling: Divides the original training data into ‘N’ subsets and randomly
selects a subset with replacement in some rows from other subsets. This step ensures that the
base models are trained on diverse subsets of the data and there is no class imbalance.
 Base Model Training: For each bootstrapped sample we train a base model independently on
that subset of data. These weak models are trained in parallel to increase computational
efficiency and reduce time consumption. We can use different base learners i.e. different ML
models as base learners to bring variety and robustness.
 Prediction Aggregation: To make a prediction on testing data combine the predictions of all
base models. For classification tasks it can include majority voting or weighted majority
while for regression it involves averaging the predictions.
 Out-of-Bag (OOB) Evaluation: Some samples are excluded from the training subset of
particular base models during the bootstrapping method. These “out-of-bag” samples can be
used to estimate the model’s performance without the need for cross-validation.
 Final Prediction: After aggregating the predictions from all the base models, Bagging
produces a final prediction for each instance.
Python pseudo code for Bagging Estimator implementing libraries:
1. Importing Libraries and Loading Data
 BaggingClassifier: for creating an ensemble of classifiers trained on different subsets of
data.
 DecisionTreeClassifier: the base classifier used in the bagging ensemble.
 load_iris: to load the Iris dataset for classification.
 train_test_split: to split the dataset into training and testing subsets.
 accuracy_score: to evaluate the model’s prediction accuracy.

3
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I

from sklearn.ensemble import BaggingClassifier

from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
2. Loading and Splitting the Iris Dataset
 data = load_iris(): loads the Iris dataset, which includes features and target labels.
 X = data.data: extracts the feature matrix (input variables).
 y = data.target: extracts the target vector (class labels).
 train_test_split(...): splits the data into training (80%) and testing (20%) sets, with
random_state=42 to ensure reproducibility.

data = load_iris()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
3. Creating a Base Classifier
Decision tree is chosen as the base model. They are prone to overfitting when trained on
small datasets making them good candidates for bagging.
 base_classifier = DecisionTreeClassifier(): initializes a Decision Tree classifier, which will
serve as the base estimator in the Bagging ensemble.
base_classifier = DecisionTreeClassifier()
4. Creating and Training the Bagging Classifier
 A BaggingClassifier is created using the decision tree as the base classifier.
 n_estimators = 10 specifies that 10 decision trees will be trained on different bootstrapped
subsets of the training data.

bagging_classifier = BaggingClassifier(base_classifier, n_estimators=10, random_state=42)

bagging_classifier.fit(X_train, y_train)
5. Making Predictions and Evaluating Accuracy
 The trained bagging model predicts labels for test data.
 The accuracy of the predictions is calculated by comparing the predicted labels (y_pred) to
the actual labels (y_test).

y_pred = bagging_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Output:
Accuracy: 1.0

2. Boosting Algorithm
Boosting is an ensemble technique that combines multiple weak learners to create a strong
learner. Weak models are trained in series such that each next model tries to correct errors of the
previous model until the entire training dataset is predicted correctly. One of the most well-

4
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
known boosting algorithms is AdaBoost (Adaptive Boosting). Here is an overview of Boosting
algorithm:
 Initialize Model Weights: Begin with a single weak learner and assign equal weights to all
training examples.
 Train Weak Learner: Train weak learners on these dataset.
 Sequential Learning: Boosting works by training models sequentially where each model
focuses on correcting the errors of its predecessor. Boosting typically uses a single type of
weak learner like decision trees.
 Weight Adjustment: Boosting assigns weights to training datapoints. Misclassified
examples receive higher weights in the next iteration so that next models pay more attention
to them.
Python pseudo code for boosting Estimator implementing libraries:
1. Importing Libraries and Modules
 AdaBoostClassifier from sklearn.ensemble: for building the AdaBoost ensemble model.
 DecisionTreeClassifier from sklearn.tree: as the base weak learner for AdaBoost.
 load_iris from sklearn.datasets: to load the Iris dataset.
 train_test_split from sklearn.model_selection: to split the dataset into training and testing
sets.
 accuracy_score from sklearn.metrics: to evaluate the model’s accuracy.

from sklearn.ensemble import AdaBoostClassifier

from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
2. Loading and Splitting the Dataset
 data = load_iris(): loads the Iris dataset, which includes features and target labels.
 X = data.data: extracts the feature matrix (input variables).
 y = data.target: extracts the target vector (class labels).
 train_test_split(...): splits the data into training (80%) and testing (20%) sets, with
random_state=42 to ensure reproducibility.

data = load_iris()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

3. Defining the Weak Learner

We are creating the base classifier as a decision tree with maximum depth 1 (a decision stump).
This simple tree will act as a weak learner for the AdaBoost algorithm, which iteratively
improves by combining many such weak learners.

base_classifier = DecisionTreeClassifier(max_depth=1)

4. Creating and Training the AdaBoost Classifier

5
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
 base_classifier: The weak learner used in boosting.
 n_estimators = 50: Number of weak learners to train sequentially.
 learning_rate = 1.0: Controls the contribution of each weak learner to the final model.
 random_state = 42: Ensures reproducibility.

adaboost_classifier = AdaBoostClassifier(
base_classifier, n_estimators=50, learning_rate=1.0, random_state=42
)
adaboost_classifier.fit(X_train, y_train)

5. Making Predictions and Calculating Accuracy

We are calculating the accuracy of the model by comparing the true labels y_test with the
predicted labels y_pred. The accuracy_score function returns the proportion of correctly
predicted samples. Then, we print the accuracy value.

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy)
Output:
Accuracy: 1.0

Benefits of Ensemble Learning in Machine Learning

Ensemble learning is a versatile approach that can be applied to machine learning model for: -
 Reduction in Overfitting: By aggregating predictions of multiple model's ensembles can
reduce overfitting that individual complex models might exhibit.
 Improved Generalization: It generalizes better to unseen data by minimizing variance and
bias.
 Increased Accuracy: Combining multiple models gives higher predictive accuracy.
 Robustness to Noise: It mitigates the effect of noisy or incorrect data points by averaging
out predictions from diverse models.
 Flexibility: It can work with diverse models including decision trees, neural networks and
support vector machines making them highly adaptable.
 Bias-Variance Tradeoff: Techniques like bagging reduce variance, while boosting reduces
bias leading to better overall performance.
There are various ensemble learning techniques we can use as each one of them has their own
pros and cons.
Ensemble Learning Techniques
Technique Category Description

Random forest constructs multiple decision trees on

Random Forest Bagging bootstrapped subsets of the data and aggregates their
predictions for final output, reducing overfitting and variance.

6
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
Technique Category Description

Trains models on random subsets of input features to enhance

Random
Bagging diversity and improve generalization while reducing
Subspace Method
overfitting.

Gradient Boosting Machines sequentially builds decision

Gradient Boosting
Boosting trees, with each tree correcting errors of the previous ones,
Machines (GBM)
enhancing predictive accuracy iteratively.

Extreme Gradient XGBoost do optimizations like tree pruning, regularization,

Boosting Boosting and parallel processing for robust and efficient predictive
(XGBoost) models.

AdaBoost AdaBoost focuses on challenging examples by assigning

(Adaptive Boosting weights to data points. Combines weak classifiers with
Boosting) weighted voting for final predictions.

CatBoost specialize in handling categorical features natively

CatBoost Boosting without extensive preprocessing with high predictive accuracy
and automatic overfitting handling.

7
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
Bagging
Bagging (Bootstrap Aggregating) is an ensemble learning technique in machine learning that
improves the accuracy and stability of models by reducing variance and avoiding overfitting,
especially in high-variance models like decision trees.

Definition:
Bagging stands for Bootstrap Aggregating. It involves:
 Generating multiple versions of a training dataset using bootstrap sampling (random
sampling with replacement).
 Training separate models (often the same type, like decision trees) on each of these
datasets.
 Aggregating their predictions (averaging for regression, majority vote for
classification).
Workflow of Bagging Algorithm (Step-by-Step):

1. Bootstrap Sampling: Create multiple datasets (say, 𝑘 datasets) from the original training
data using sampling with replacement.
2. Model Training: Train a base learner (e.g., decision tree) on each dataset independently.
3. Aggregation:
o Classification: Use majority voting to decide the final output.
o Regression: Use averaging of all predictions to give the final output.

Uses of Bagging:
 Reduces overfitting by averaging out predictions.
 Decreases model variance (good for unstable models).
 Improves generalization.

Common Algorithms That Use Bagging:

 Random Forest is a prime example: it’s a bagging method using decision trees with
added randomness in feature selection.

8
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I

Advantages of Bagging:
 Reduces variance, thus improving model stability.
 Works well with high-variance, low-bias models.
 Easy to implement and parallelize.

Limitations:
 Doesn’t help much if the base model is already low in variance (like linear regression).
 May not reduce bias.
 Can be computationally expensive.

9
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
Boosting
Boosting is an ensemble learning method that combines multiple weak learners to form a
strong learner. It builds models sequentially, where each model learns from the errors of the
previous ones, improving overall performance.

Definition:
Boosting refers to a family of algorithms that convert weak models (like shallow decision
trees) into a strong model by focusing more on misclassified data points during each iteration.

Working Steps of Boosting:

1. Initialize the model by training a weak learner on the original dataset.

2. Compute Errors: Measure the performance of the model.
3. Update Weights: Increase weights of incorrectly predicted samples.
4. Train Next Learner: The next model focuses more on the harder examples.
5. Combine Models: Final prediction is a weighted sum of all weak learners.

Key Concepts:
 Sequential training
 Focus on difficult samples
 Reduces both bias and variance
 Final prediction is based on the weighted majority vote (classification) or weighted
average (regression)

Popular Boosting Algorithms:

Algorithm Key Feature
AdaBoost Adjusts weights of samples
Gradient Boosting Optimizes loss function via gradients
XGBoost Optimized, fast version of gradient boosting
LightGBM Faster training, uses histogram-based techniques

10
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
Algorithm Key Feature
CatBoost Handles categorical features efficiently

Advantages of Boosting:
 High accuracy
 Handles both bias and variance
 Performs well on imbalanced data

Limitations:
 Prone to overfitting if not regularized
 Sequential → difficult to parallelize
 Slower than bagging

11
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
Random Forest Algorithm
 Random Forest is a supervised ensemble learning algorithm.
 It is used for both classification and regression tasks.
 It builds multiple decision trees and merges them together to get a more accurate and
stable prediction.
A Random Forest is a collection (ensemble) of Decision Trees where:
 Each tree is trained on a different subset of the data using bootstrap sampling (bagging).
 At each node, only a random subset of features is considered for splitting.
 Final output is based on majority voting (classification) or averaging (regression).
Workflow of Random Forest (Step-by-Step)

Step 1: Bootstrap Sampling

 Create N different subsets (with replacement) from the training data.
 Each subset is used to train one decision tree.
Step 2: Build Decision Trees
 For each tree:
o Choose a random subset of features at each split (feature bagging).
o Grow trees fully without pruning.
Step 3: Aggregate Results
 For Classification: Each tree votes → final class = majority vote.
 For Regression: Average the outputs from all trees.
Key Terms
Term Description
Bootstrap Sampling Sampling with replacement from the dataset
Feature Bagging Randomly selecting a subset of features at each split
Ensemble Learning Combining multiple models for better performance
Majority Voting Used in classification
Averaging Used in regression

12
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
Advantages
 Reduces overfitting compared to individual decision trees.
 Works well with both categorical and numerical features.
 Can handle missing values and maintain accuracy.
 Robust to outliers and noise.
 Can give feature importance scores.
Disadvantages
 Computationally intensive (training many trees).
 Less interpretable than a single decision tree.
 Slower in real-time predictions (due to ensemble size).
Applications of Ramdom Forest:
 Medical diagnosis (e.g., cancer prediction)
 Financial risk analysis
 Credit scoring
 Image classification
 Fraud detection

Python Code Example

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load data
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Build model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Accuracy
from sklearn.metrics import accuracy_score
print("Accuracy:", accuracy_score(y_test, y_pred))

Parameters of Random Forest (Sklearn)

Parameter Description
n_estimators Number of trees
max_features Number of features to consider at each split
max_depth Maximum depth of the tree
min_samples_split Minimum samples required to split an internal node

13
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
Parameter Description
bootstrap Whether bootstrap samples are used

Comparison with Other Algorithms

Feature Decision Tree Bagging Random Forest Boosting
Overfitting Risk High Low Low Medium
Interpretability High Low Medium Low
Accuracy Medium High High Very High
Training Speed Fast Moderate Slow Slow

14
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
AdaBoost Algorithm
AdaBoost (Adaptive Boosting) is a Boosting ensemble technique that combines multiple weak
learners (usually decision stumps — trees with one split) to form a strong classifier.
 It focuses on instances that were previously misclassified.
 Learners are added sequentially, and each one tries to correct the mistakes of the
previous ones.
Key Idea:
Increase the weights of incorrectly classified data points so that subsequent models focus more
on those “hard” cases.
Workflow of AdaBoost:

Step-by-Step:
1. Initialize Weights:
o Assign equal weights to all training samples.
2. Train a Weak Learner:
o Train a classifier (e.g., a decision stump) on the weighted data.
3. Calculate Error:
o Compute the weighted error of the learner:

o
where I is an indicator function.
4. Compute Learner's Weight:
o A classifier with lower error gets higher importance:

15
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I

5. Update Weights of Samples:

o Increase weights of misclassified samples.
o Decrease weights of correctly classified samples:

o Normalize weights.
6. Repeat:
o Train next learner on updated weights.
o Repeat steps for T rounds (number of estimators).
7. Final Prediction:
o Combine all classifiers using their weights:

Key Notations:

AdaBoost Code Example (Python)

from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load data
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Base weak learner: Decision stump

base = DecisionTreeClassifier(max_depth=1)

# AdaBoost model
model = AdaBoostClassifier(base_estimator=base, n_estimators=50, learning_rate=1.0)
model.fit(X_train, y_train)

16
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
# Accuracy
print("Accuracy:", model.score(X_test, y_test))
Advantages of AdaBoost
Feature Benefit
Improves weak learners Combines simple models to perform well
Versatile Works for binary and multi-class classification
Feature importance Can give feature significance
No need for data pre-processing Robust to outliers and noise

Disadvantages
 Sensitive to noisy data and outliers
 Not suitable for large datasets with many irrelevant features
 Harder to interpret compared to individual trees

Applications
 Face detection (e.g., Viola-Jones algorithm)
 Fraud detection
 Text classification
 Bioinformatics

Comparison: AdaBoost vs Bagging vs Random Forest

Feature AdaBoost Bagging Random Forest
Base Learners Sequential Parallel Parallel
Focus Hard samples Variance reduction Random features & samples
Output Weighted vote Majority vote Majority vote

17
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
Gradient Boosting Algorithm
Gradient Boosting is an ensemble learning technique that builds a strong predictive model by
combining multiple weak learners (typically decision trees), trained sequentially to correct the
errors made by previous models.
It uses the idea of minimizing a loss function by applying gradient descent.
Key Idea:
Each new learner is trained to predict the residuals (errors) of the previous learners, thereby
improving the model step by step.
Workflow of Gradient Boosting (Step-by-Step):

Step 1: Initialize the Model

 Use a constant value that minimizes the loss function.
 For regression with MSE:

Step 2: Iterate for T steps (number of trees)

These are the pseudo-residuals.

18
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I

Key Terms
Term Description
Weak Learner Typically a decision tree (shallow)
Loss Function Measures error (MSE, Log Loss, etc.)
Learning Rate η\etaη Shrinks the contribution of each tree
Residuals Errors the model tries to fix
Additive Model Combines learners in a stage-wise manner
Loss Functions
 Regression:

 Classification:

Gradient Boosting Code in Python

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load data
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Model
gb_model = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3)
gb_model.fit(X_train, y_train)

# Accuracy
print("Accuracy:", gb_model.score(X_test, y_test))
Advantages of Gradient Boosting
 High prediction accuracy
 Handles both regression and classification

19
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
 Works with many types of loss functions
 Feature importance ranking
Disadvantages
 Can overfit if not tuned properly
 Training is slower due to sequential nature
 Requires careful parameter tuning (learning rate, depth, etc.)
Comparison: AdaBoost and Gradient Boosting
Feature AdaBoost Gradient Boosting
Loss Optimization Based on exponential loss Any differentiable loss
Weighting Adjusts sample weights Fits to residuals
Robustness to Outliers Lower Higher
Tuning Needed Less More (learning rate, depth)

20
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
XGBoost Algorithm
XGBoost (Extreme Gradient Boosting) is an advanced implementation of the Gradient
Boosting algorithm. It is designed to be highly efficient, flexible, and portable, with state-of-
the-art performance.

XGBoost = Gradient Boosting + Regularization + Speed + Flexibility

It is robust, scalable, and tunable, and often outperforms other models in structured/tabular data
tasks.

The uses of XGBoost:

 Fast and parallelizable
 Handles missing values
 Includes regularization (to prevent overfitting)
 Excellent performance in Kaggle competitions
 Scales well to large datasets
Core Idea
Like Gradient Boosting, XGBoost builds trees sequentially, where each new tree corrects the
errors of the previous ensemble by minimizing a loss function using gradient descent.
XGBoost enhances this process with:
 Second-order optimization (using both gradient and hessian)
 Regularization
 Tree pruning
 Cache-aware computing

Workflow of XGBoost (Step-by-Step)

Step 1: Objective Function

XGBoost minimizes a regularized objective function:

21
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
Step 2: Second-Order Taylor Approximation
The loss is approximated with gradients and hessians:

Step 3: Structure Score for Splits

For a split node with instances III:

Choose the split with the highest score.

Step 4: Tree Building
 Add trees greedily to minimize loss.
 Trees are built depth-wise or loss-wise, not leaf-wise like LightGBM.
 Stop growing when score improvement < threshold.
Step 5: Prediction Update
Update prediction:

 η: Learning rate
Advantages of XGBoost
Advantage Description
Speed Parallel and fast due to efficient CPU use
Accuracy Often better than other ML models
Regularization Controls overfitting via λ,γ\lambda, \gammaλ,γ
Handles Missing Values Smart split-finding for missing data
Built-in Cross-Validation Available in API

Disadvantages
 Complex to tune (many hyperparameters)
 Can overfit on small data if not regularized
 Not ideal for image or sequential data (use CNNs or RNNs instead)

XGBoost Code Example (Python)

import xgboost as xgb
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

22
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
# Load data
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = xgb.XGBClassifier(n_estimators=100, learning_rate=0.1, max_depth=3)
model.fit(X_train, y_train)

# Predict and evaluate

y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
Common Parameters
Parameter Meaning
n_estimators Number of boosting rounds
max_depth Maximum tree depth
learning_rate Shrinks contribution of each tree
subsample Fraction of training data per tree
colsample_bytree Feature sampling per tree
lambda L2 regularization
gamma Minimum loss reduction to make a split

23
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
Stacking

Stacking (Stacked Generalization) is an ensemble learning technique that combines multiple

different models (called base learners) and trains a meta-model to make the final prediction.

Unlike bagging or boosting (which use the same type of learners), stacking uses diverse models
(e.g., decision trees, SVMs, neural networks).

Workflow of Stacking:

Step-by-Step Process:
1. Train Base Learners
o Train several different machine learning models on the training dataset.
o These models can be of different types (e.g., logistic regression, random forest,
SVM).
2. Generate Base Predictions
o Each base learner makes predictions on:
 Either the validation set (during cross-validation),
 Or directly on the test set.
3. Train Meta-Learner
o A new model (called a meta-model or blender) is trained using the predictions
of base models as features.
o Its goal is to learn how to best combine the outputs of base models.
4. Final Prediction
o The meta-model takes the predictions from base learners and makes the final
decision.

Illustration (Simple Example)

Assume you have 3 base learners:
 Model 1: Logistic Regression
 Model 2: Decision Tree
 Model 3: K-Nearest Neighbors
Let the predictions from these models for a data point be:
Model 1: 0.6
Model 2: 0.8
Model 3: 0.7
These become the features for the meta-model, which might output a final prediction of 0.75.

Use of Stacking:
 Combines strengths of multiple models
 Can reduce generalization error
 Works well when base models are diverse and not highly correlated

24
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I

Mathematically:

Example in Python (with scikit-learn)

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load data
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Define base learners

base_learners = [
('dt', DecisionTreeClassifier()),
('svc', SVC(probability=True))
]

# Define meta-learner
meta_model = LogisticRegression()

# Build stacking model

stacked_model = StackingClassifier(estimators=base_learners, final_estimator=meta_model)
stacked_model.fit(X_train, y_train)

# Predict and evaluate

y_pred = stacked_model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

25
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
Advantages of Stacking
Benefit Description
Combines model strengths Leverages diversity to improve performance
Reduces generalization error Less likely to overfit than a single model
Flexible Works with any combination of models
Disadvantages
Limitation Description
More complex Requires training multiple models
Risk of overfitting If meta-model is too complex or base models are similar
Slower to train Compared to single-model methods

26
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
Blending in Machine Learning

Blending is an ensemble technique used to combine the predictions of multiple machine learning
models using a validation dataset and a meta-model (usually a simple one like logistic
regression or linear regression).
It’s very similar to stacking, but with a few key differences in how data is split and how the
meta-model is trained.

How Does Blending Work?

Steps:
1. Split the dataset into 3 parts:
o Training set: For training base models
o Validation set: For generating predictions from base models
o Test set: For final evaluation
2. Train Base Models:
o Use the training set to train multiple models (e.g., SVM, Random Forest,
XGBoost)
3. Predict on Validation Set:
o Use base models to make predictions on the validation set
o These predictions become input features for the meta-model
4. Train Meta-Model:
o Train a simple model (e.g., logistic regression) using:
 Inputs: Predictions of base models on the validation set
 Targets: True values from the validation set
5. Final Prediction:
o Use base models to predict on the test set
o Meta-model uses these to make final predictions

How It Differs from Stacking

Feature Blending Stacking
Data Split Train/Validation/Test split Usually uses cross-validation
Meta-model trained Out-of-fold predictions from cross-
Validation set predictions
on validation
Simplicity Easier to implement More robust but complex
Higher (due to smaller validation
Risk of Overfitting Lower (thanks to cross-validation)
set)

Why Use Blending?

 Simpler implementation
 Useful when you're in a time crunch (e.g., in competitions)
 Easy to apply when you want to combine different models quickly

Blending Illustration Example

27
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I

Advantages of Blending
Benefit Description
Simple to implement No need for complex cross-validation setups
Fast to train Meta-model trained on small dataset
Good for competitions Useful in last-minute model improvement

Disadvantages
Drawback Description
High risk of overfitting Meta-model trained on small validation set
Not as robust Compared to stacking with cross-validation
Wastes data Validation data not used in base model training

Small Python Example (Pseudo-code Style)

# Step 1: Split data
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2)

# Step 2: Train base models

model1 = LogisticRegression().fit(X_train, y_train)
model2 = RandomForestClassifier().fit(X_train, y_train)

# Step 3: Predict on validation set

pred1 = model1.predict_proba(X_valid)[:, 1]
pred2 = model2.predict_proba(X_valid)[:, 1]

# Step 4: Stack predictions and train meta-model

meta_X = np.column_stack((pred1, pred2))
meta_model = LogisticRegression().fit(meta_X, y_valid)

# Step 5: Predict on test set

28
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
final_pred = meta_model.predict(np.column_stack((
model1.predict_proba(X_test)[:, 1],
model2.predict_proba(X_test)[:, 1]
)))

Mathematical Formulation of Blending

29
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I

Example with 3 Models

30
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
Regularization Methods in Machine Learning

Regularization is a technique used to reduce overfitting by adding a penalty term to the loss
function of a machine learning model. This discourages the model from becoming too complex
or sensitive to noise in the training data.

Need to Use Regularization:

 Prevents overfitting
 Improves generalization to unseen data
 Controls the complexity of the model

Benefits of Regularization
Now, let’s see various benefits of regularization which are as follows:
1. Prevents Overfitting: Regularization helps models focus on underlying patterns instead of
memorizing noise in the training data.
2. Improves Interpretability: L1 (Lasso) regularization simplifies models by reducing less
important feature coefficients to zero.
3. Enhances Performance: Prevents excessive weighting of outliers or irrelevant features
helps in improving overall model accuracy.
4. Stabilizes Models: Reduces sensitivity to minor data changes which ensures consistency
across different data subsets.
5. Prevents Complexity: Keeps model from becoming too complex which is important for
limited or noisy data.
6. Handles Multicollinearity: Reduces the magnitudes of correlated coefficients helps in
improving model stability.
7. Allows Fine-Tuning: Hyperparameters like alpha and lambda control regularization strength
helps in balancing bias and variance.
8. Promotes Consistency: Ensures reliable performance across different datasets which
reduces the risk of large performance shifts.

Common Regularization Methods

1. L1 Regularization (Lasso)
 Adds the absolute value of coefficients to the loss function.
 Encourages sparsity (sets some weights to zero), leading to feature selection.

2. L2 Regularization (Ridge)
 Adds the square of coefficients to the loss function.

31
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
 Keeps all features but shrinks weights.

3. Elastic Net Regularization

 Combines both L1 and L2 penalties.
 Useful when there are many correlated features.

4. Dropout (in Neural Networks)

 Randomly sets a fraction of neurons to 0 during training.
 Reduces co-adaptation of neurons.
Intuition:
During each training iteration:
 Drop units with a probability p
 Forces the network to not rely too much on specific paths
5. Early Stopping
 Stop training when the model’s performance on the validation set starts to degrade.
 Prevents overfitting without modifying the loss function.
6. Data Augmentation & Noise Injection
 Add noise to input data or intermediate layers to make the model more robust.

32
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
Cross-Validation Strategies

Cross-validation (CV) is a statistical method used to estimate the performance of machine

learning models. It helps detect overfitting and ensures that the model generalizes well to unseen
data.

Use of Cross-Validation:
 To assess model stability and robustness
 To detect overfitting or underfitting
 To choose the best model hyperparameters

Common Cross-Validation Strategies

1. Hold-Out Validation
 Split dataset into:
o Training set: to train the model
o Test set: to evaluate the model
Limitation:
 High variance depending on how data is split
2. K-Fold Cross-Validation
 Divide data into K equal parts (folds)
 Train the model on K−1 folds, validate on the remaining fold
 Repeat K times, each fold used once as validation
 Final performance = mean of K results
Example:
For K=5
Fold Train On Validate On
1 2,3,4,5 1
2 1,3,4,5 2
3 1,2,4,5 3
4 1,2,3,5 4
5 1,2,3,4 5
3. Stratified K-Fold Cross-Validation
 Like K-Fold but preserves the percentage of samples for each class in every fold.
 Useful for imbalanced datasets.
4. Leave-One-Out Cross-Validation (LOOCV)
 Special case of K-Fold with K=n (number of samples)
 Train on all data except one sample, test on that one
 Repeat for all samples
Limitation:
 Computationally expensive for large datasets
5. Repeated K-Fold Cross-Validation
 Repeats K-Fold CV multiple times with different random splits
 Reduces variance in performance estimation

33
R23 III B.Tech I Semester
Department of Artificial Intelligence and Machine Learning
Subject: Advanced Machine Learning 23A03351T
Unit-I
6. Group K-Fold Cross-Validation
 Ensures that the same group (e.g., from the same patient or user) does not appear in
both training and validation sets.
 Ideal for grouped or clustered data
7. Time Series Split (Rolling Forecast Origin)
 For time series data where order matters
 Avoids data leakage by ensuring that future data is not used to predict the past
Example:
Fold Train On Validate On
1 1, 2 3
2 1, 2, 3 4
3 1, 2, 3, 4 5

Advantages and Disadvantages of Cross-Validation Strategies

Strategy Best For Advantages Disadvantages
Hold-Out Quick checks Simple and fast High variance
K-Fold General-purpose Balanced, less bias Can be slow for large K
Imbalanced Maintains class
Stratified K-Fold More complex
classification distribution
Uses almost all data to Very slow for large
LOOCV Small datasets
train datasets
Slower than standard
Repeated K-Fold Stability checking Reduces random bias
K-Fold
Grouped data (e.g.,
Group K-Fold Prevents data leakage Requires group identifiers
patients)
Time Series Split Time-based data Respects time order Needs careful setup

Evaluating Machine Learning Models
100% (2)
Evaluating Machine Learning Models
10 pages
Jntuk r20 ML Unit-III
No ratings yet
Jntuk r20 ML Unit-III
28 pages
Unit 3
No ratings yet
Unit 3
63 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
Overfitting vs Underfitting in ML Models
No ratings yet
Overfitting vs Underfitting in ML Models
7 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Ensemble Learning Methods Explained
No ratings yet
Ensemble Learning Methods Explained
52 pages
Week 11
No ratings yet
Week 11
16 pages
Unit 2
No ratings yet
Unit 2
13 pages
MachineLearning Chatgpt
No ratings yet
MachineLearning Chatgpt
19 pages
UNIT1
No ratings yet
UNIT1
80 pages
ML11 Generalization
No ratings yet
ML11 Generalization
40 pages
Unit 3
No ratings yet
Unit 3
59 pages
Model Validation in Machine Learning
100% (2)
Model Validation in Machine Learning
26 pages
Ensemble Learning and Random Forest Guide
No ratings yet
Ensemble Learning and Random Forest Guide
28 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Machine Learning Concepts and Techniques
No ratings yet
Machine Learning Concepts and Techniques
68 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
ML Exp4 Part A
No ratings yet
ML Exp4 Part A
14 pages
AIML-Unit 5 Notes-Assignment 5
No ratings yet
AIML-Unit 5 Notes-Assignment 5
24 pages
Week 11.3
No ratings yet
Week 11.3
14 pages
Unit 3 ML
No ratings yet
Unit 3 ML
40 pages
Unit 4
No ratings yet
Unit 4
34 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
8 pages
Ensemble Learning in Machine Learning
100% (1)
Ensemble Learning in Machine Learning
40 pages
Ensemble Learning Techniques in ML
No ratings yet
Ensemble Learning Techniques in ML
99 pages
UE20CS302 Unit3 Slides
No ratings yet
UE20CS302 Unit3 Slides
308 pages
Lec06 - Ensembling Methods Bagging Boosting
No ratings yet
Lec06 - Ensembling Methods Bagging Boosting
48 pages
Machine Leafning
No ratings yet
Machine Leafning
5 pages
Unit 5 Ensemble Model
No ratings yet
Unit 5 Ensemble Model
36 pages
Module 4 Supervised Learning
No ratings yet
Module 4 Supervised Learning
4 pages
ML UNIT 4 Notes
No ratings yet
ML UNIT 4 Notes
30 pages
Ensemble Learning (Autosaved)
No ratings yet
Ensemble Learning (Autosaved)
31 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Understanding Ensemble Learning Methods
No ratings yet
Understanding Ensemble Learning Methods
41 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
41 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
18 pages
Overfitting and Feature Engineering Guide
No ratings yet
Overfitting and Feature Engineering Guide
37 pages
8 Ensembles
No ratings yet
8 Ensembles
94 pages
Unit 4 ML
No ratings yet
Unit 4 ML
25 pages
15-The Bias - Variance - Trade-Off-08-04-2024
No ratings yet
15-The Bias - Variance - Trade-Off-08-04-2024
23 pages
Ensemble Learning and Random Forests Guide
No ratings yet
Ensemble Learning and Random Forests Guide
17 pages
Bagging and Random Forest Explained
No ratings yet
Bagging and Random Forest Explained
22 pages
Unit 5 ML
No ratings yet
Unit 5 ML
14 pages
Ensemble Learning
No ratings yet
Ensemble Learning
46 pages
Model Evaluation Techniques in ML
No ratings yet
Model Evaluation Techniques in ML
44 pages
Ensembling Methods in Machine Learning
No ratings yet
Ensembling Methods in Machine Learning
5 pages
Unit 4
No ratings yet
Unit 4
24 pages
Bagging vs Pasting in Ensemble Learning
No ratings yet
Bagging vs Pasting in Ensemble Learning
28 pages
Chapter Two Common Issues
No ratings yet
Chapter Two Common Issues
28 pages
Eda - M4
No ratings yet
Eda - M4
7 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
32 pages
Machine Learning Applications and Techniques
No ratings yet
Machine Learning Applications and Techniques
53 pages
Understanding Underfitting, Overfitting, and Capacity
No ratings yet
Understanding Underfitting, Overfitting, and Capacity
72 pages
Ensemble Learning Techniques Overview
No ratings yet
Ensemble Learning Techniques Overview
79 pages
Ensemble Learning
No ratings yet
Ensemble Learning
26 pages
Evolutionary Bagging in Ensemble Learning
No ratings yet
Evolutionary Bagging in Ensemble Learning
16 pages
Flower Category Analysis - Ipynb - Colab
No ratings yet
Flower Category Analysis - Ipynb - Colab
2 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
15 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
148 pages
Final DL
No ratings yet
Final DL
26 pages
Artificial Neural Network Unit 1
No ratings yet
Artificial Neural Network Unit 1
42 pages
Deep Learning Mst2
No ratings yet
Deep Learning Mst2
2 pages
Ia 2 Ai ML Ece
No ratings yet
Ia 2 Ai ML Ece
1 page
Week 4 - Classification Alternative Techniques
No ratings yet
Week 4 - Classification Alternative Techniques
87 pages
One-vs-Rest vs One-vs-One Explained
No ratings yet
One-vs-Rest vs One-vs-One Explained
16 pages
Neural Network Implementation in Python
No ratings yet
Neural Network Implementation in Python
4 pages
02 Linear Classification
No ratings yet
02 Linear Classification
52 pages
Introduction to Neural Networks Basics
No ratings yet
Introduction to Neural Networks Basics
57 pages
Machine Learning Model Essentials
No ratings yet
Machine Learning Model Essentials
8 pages
Classification and Regression Models
No ratings yet
Classification and Regression Models
20 pages
Module 5 Decision Tree Part2
No ratings yet
Module 5 Decision Tree Part2
47 pages
A Course On Large Language Models 1695542186
No ratings yet
A Course On Large Language Models 1695542186
3 pages
Chatgpt Tweets Sentiment Analysis Using Machine Learning and Data Classification
No ratings yet
Chatgpt Tweets Sentiment Analysis Using Machine Learning and Data Classification
11 pages
Flower Classification With Deep CNN and Machine Learning Algorithms
No ratings yet
Flower Classification With Deep CNN and Machine Learning Algorithms
5 pages
Deep Learning Syllabus
No ratings yet
Deep Learning Syllabus
2 pages
Deep Learning
No ratings yet
Deep Learning
24 pages
Deep Learning Course Overview and Objectives
No ratings yet
Deep Learning Course Overview and Objectives
31 pages
Credit Card Fraud Detection via ML & DL
No ratings yet
Credit Card Fraud Detection via ML & DL
16 pages
Neuron Answers 21 28
No ratings yet
Neuron Answers 21 28
3 pages
Multiclass Classifications Using ANN
No ratings yet
Multiclass Classifications Using ANN
2 pages
DenseNet Presentation
No ratings yet
DenseNet Presentation
16 pages
Deep Learning Assignment - 1
No ratings yet
Deep Learning Assignment - 1
2 pages
AI and Machine Learning Course For Beginners
No ratings yet
AI and Machine Learning Course For Beginners
4 pages
Pokemon Type Detection with CNN
No ratings yet
Pokemon Type Detection with CNN
2 pages
Large Language Models (LLMS) Summary - 20 Pages
No ratings yet
Large Language Models (LLMS) Summary - 20 Pages
21 pages
Decision Tree Classification Techniques
No ratings yet
Decision Tree Classification Techniques
41 pages

Aml Unit 1

Uploaded by

Aml Unit 1

Uploaded by

R23 III B.

Overfitting and Underfitting:

from sklearn.ensemble import BaggingClassifier

bagging_classifier = BaggingClassifier(base_classifier, n_estimators=10, random_state=42)

from sklearn.ensemble import AdaBoostClassifier

3. Defining the Weak Learner

4. Creating and Training the AdaBoost Classifier

5. Making Predictions and Calculating Accuracy

accuracy = accuracy_score(y_test, y_pred)

Benefits of Ensemble Learning in Machine Learning

Random forest constructs multiple decision trees on

Trains models on random subsets of input features to enhance

Gradient Boosting Machines sequentially builds decision

Extreme Gradient XGBoost do optimizations like tree pruning, regularization,

AdaBoost AdaBoost focuses on challenging examples by assigning

CatBoost specialize in handling categorical features natively

Common Algorithms That Use Bagging:

Working Steps of Boosting:

1. Initialize the model by training a weak learner on the original dataset.

Popular Boosting Algorithms:

Step 1: Bootstrap Sampling

Python Code Example

Parameters of Random Forest (Sklearn)

Comparison with Other Algorithms

5. Update Weights of Samples:

AdaBoost Code Example (Python)

# Base weak learner: Decision stump

Comparison: AdaBoost vs Bagging vs Random Forest

Step 1: Initialize the Model

Step 2: Iterate for T steps (number of trees)

These are the pseudo-residuals.

Gradient Boosting Code in Python

XGBoost = Gradient Boosting + Regularization + Speed + Flexibility

The uses of XGBoost:

Workflow of XGBoost (Step-by-Step)

Step 1: Objective Function

Step 3: Structure Score for Splits

Choose the split with the highest score.

XGBoost Code Example (Python)

# Predict and evaluate

Stacking (Stacked Generalization) is an ensemble learning technique that combines multiple

Illustration (Simple Example)

Example in Python (with scikit-learn)

# Define base learners

# Build stacking model

# Predict and evaluate

How Does Blending Work?

How It Differs from Stacking

Why Use Blending?

Blending Illustration Example

Small Python Example (Pseudo-code Style)

# Step 2: Train base models

# Step 3: Predict on validation set

# Step 4: Stack predictions and train meta-model

# Step 5: Predict on test set

Mathematical Formulation of Blending

Example with 3 Models

Need to Use Regularization:

Common Regularization Methods

3. Elastic Net Regularization

4. Dropout (in Neural Networks)

Cross-validation (CV) is a statistical method used to estimate the performance of machine

Common Cross-Validation Strategies

Advantages and Disadvantages of Cross-Validation Strategies

You might also like