0% found this document useful (0 votes)
77 views33 pages

Module 4 ML

The document discusses advanced machine learning techniques, focusing on ensemble methods like bagging and boosting, which combine multiple models to enhance accuracy and reduce overfitting. It details specific algorithms such as AdaBoost, Gradient Boosting, and XGBoost, explaining their advantages, disadvantages, and implementation steps. Additionally, it covers Support Vector Machines (SVM) and neural networks, highlighting their architectures, working mechanisms, and evaluation methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views33 pages

Module 4 ML

The document discusses advanced machine learning techniques, focusing on ensemble methods like bagging and boosting, which combine multiple models to enhance accuracy and reduce overfitting. It details specific algorithms such as AdaBoost, Gradient Boosting, and XGBoost, explaining their advantages, disadvantages, and implementation steps. Additionally, it covers Support Vector Machines (SVM) and neural networks, highlighting their architectures, working mechanisms, and evaluation methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Module -4

Advanced Machine Learning Techniques:


Ensemble Methods:
Ensemble learning is a technique where multiple models (called base learners or weak
learners) are combined to produce a stronger and more accurate model.
Goal of Ensemble Learning:
To improve:
• Accuracy
• Stability
• Generalization (reduce overfitting/underfitting)
Why Use Ensemble?
• A single model (e.g., one decision tree) might not be strong enough.
• Combining many weak models can produce a strong predictive model.
• Helps balance bias and variance.

Bagging and Boosting


Bagging
Bagging is a machine learning ensemble meta-algorithm designed to improve the stability
and accuracy of machine learning algorithms used in statistical classification and regression.
It decreases the variance and helps to avoid overfitting. It is usually applied to decision tree
methods.
Bagging is a special case of the model averaging approach.
Implementation Steps of Bagging
• Step 1: Multiple subsets are created from the original data set with equal tuples,
selecting observations with replacement.
• Step 2: A base model is created on each of these subsets.
• Step 3: Each model is learned in parallel with each training set and independent of
each other.
• Step 4: The final predictions are determined by combining the predictions from all
the models.
Example of Bagging:
Boosting
Boosting is an ensemble learning technique that sequentially combines multiple weak
classifiers to create a strong classifier. It is done by training a model using training data and is
then evaluated. Next model is built on that which tries to correct the errors present in the first
model. This procedure is continued and models are added until either the complete training
data set is predicted correctly or predefined number of iterations is reached.

Types of Boosting
1. AdaBoost (Adaptive Boosting)
2. Gradient Boosting
3. XGBoost (Extreme Gradient Boosting)
[Link] (Adaptive Boosting)

AdaBoost (Adaptive Boosting) is a type of Boosting algorithm that combines multiple weak
learners (typically shallow decision trees) to create a strong classifier.

It works by assigning weights to training samples and increasing the weights of incorrectly
classified points, so that the next model focuses more on these "hard" samples.

Steps:
1. Initialize: Start with equal weights for all training samples.
2. Train First Model: Fit a decision tree to the data.
3. Calculate Errors: Increase the weights for wrongly predicted samples.
4. Train Next Model: Fit another decision tree on the updated data (focus more on hard
samples).
5. Repeat: Keep adding new models to correct the previous one’s mistakes.
6. Final Prediction: Combine all models using weighted majority voting (for
classification) or weighted average (for regression).
Advantages of AdaBoost:

1. Simple and easy to implement


2. Reduces both bias and variance
3. Improves accuracy of weak models
4. Works well with noisy and small data
5. Focuses on difficult cases

Disadvantages:
1. Sensitive to outliers and noisy data
2 .Can overfit if too many weak learners are added
3. Requires clean and balanced data
• AdaBoost is a powerful boosting algorithm that turns weak learners into a strong
classifier by focusing on errors.
• It is used in many real-world applications like face detection, fraud detection, and
spam filtering.

Gradient Boosting:
Gradient Boosting is a ensemble learning method used for classification and regression tasks.
It is a boosting algorithm which combine multiple weak learner to create a strong predictive
model.
It works by sequentially training models where each new model tries to correct the errors
made by its predecessor.
Steps:

Step 5: Repeat
• Repeat Steps 2–4 for many iterations.
• Each new tree focuses on correcting the previous error.

Step 6: Final Model


• The final prediction is the sum of all trees.
4. Advantages:

1) High prediction accuracy


2) Works well with both classification and regression
3) Handles missing values and outliers
4) Can use custom loss functions
5) Used in many real-world applications (e.g., XGBoost, LightGBM)

Disadvantages:

1) Slow training due to sequential nature


2) Risk of overfitting if not regularized
3) Requires careful hyperparameter tuning
4) Not ideal for small datasets
5) Difficult to interpret compared to simple models

Gradient Boosting is a powerful and widely-used technique for both classification and
regression tasks. It builds models sequentially by minimizing loss using gradients, making
it one of the most accurate machine learning methods available today.
XGBoost

It is an optimized implementation of Gradient Boosting and is a type of ensemble


learning method that combines multiple weak models to form a stronger model.
• XGBoost uses decision trees as its base learners and combines them sequentially to
improve the model’s performance. Each new tree is trained to correct the errors made
by the previous tree and this process is called boosting.
• It has built-in parallel processing to train models on large datasets quickly. XGBoost
also supports customizations allowing users to adjust model parameters to optimize
performance based on the specific problem.

How XGBoost Works?


It builds decision trees sequentially with each tree attempting to correct the mistakes made by
the previous one. The process can be broken down as follows:
1. Start with a base learner: The first model decision tree is trained on the data. In
regression tasks this base model simply predicts the average of the target variable.
2. Calculate the errors: After training the first tree the errors between the predicted and
actual values are calculated.
3. Train the next tree: The next tree is trained on the errors of the previous tree. This
step attempts to correct the errors made by the first tree.
4. Repeat the process: This process continues with each new tree trying to correct the
errors of the previous trees until a stopping criterion is met.
5. Combine the predictions: The final prediction is the sum of the predictions from all
the trees.

Advantages of XGboost
• Scalable and efficient for large datasets with millions of records
• Supports parallel processing and GPU acceleration for faster training
• Offers customizable parameters and regularization for fine-tuning
• Includes feature importance analysis for better insights and selection
• Trusted by data scientists across multiple programming languages
Disadvantages of XGBoost
• XGBoost can be computationally intensive, making it less ideal for resource-
constrained systems.
• It may be sensitive to noise or outliers, requiring careful data preprocessing.
• Prone to overfitting, especially on small datasets or with too many trees.
• Offers feature importance, but overall model interpretability is limited compared to
simpler methods which is an issue in fields like healthcare or finance.
Difference between Bagging and Boosting
Difference Between AdaBoost Gradient Boosting and XGBoost

No. Aspect AdaBoost Gradient Boosting XGBoost

Gradient Boosted Extreme Gradient


Full Name Adaptive Boosting
Machines Boosting

Reduce errors by Minimize loss using Faster and regularized


Goal
adjusting weights gradients GBM

Adjusts sample
Uses gradients to fit Adds regularization +
Focus weights after each
residuals handles overfitting
model

Misclassified points New tree fits the Uses both gradient and
How it works
get more weight residuals hessian (2nd order)

Mostly Decision Decision Trees Decision Trees


Base Learner
Stumps (1-level) (shallow or deep) (optimized)

Sequentially, focuses Sequential, uses loss Sequential, with system-


Model Building
on errors gradients level improvements

Overfitting Controlled via


Lower (simpler trees) Medium
Risk regularization

Faster due to parallelism


Speed Slower Moderate
& pruning

Missing Value Not handled Not handled Handles missing


Handling automatically automatically values automatically

Common Use Simple binary Classification & Kaggle, large datasets,


Cases classification regression high performance ML

Support Vector Machine (SVM)

Support Vector Machine (SVM) is a supervised learning algorithm used for classification and
regression tasks. It is mostly used for binary classification problems.
SVM tries to find a best boundary (hyperplane) that separates the data points of different
classes with maximum margin.
• Hyperplane: A decision boundary that separates classes.
• Margin: The distance between the hyperplane and the nearest data points from both
classes.
• Support Vectors: Data points that are closest to the hyperplane and influence its
position.
Working:
• SVM takes the labeled training data.
• It finds the optimal hyperplane that separates the classes.
• If the data is not linearly separable, it uses:
o Kernel Trick: Transforms data into higher dimensions to make it separable.
▪ Common kernels: Linear, Polynomial, Radial Basis Function (RBF).
Applications:
• Image classification
• Text categorization
• Handwriting recognition
• Bioinformatics (e.g., cancer classification)

Types of SVM:
• Linear SVM: Used when data is linearly separable.
• Non-linear SVM: Used when data is not linearly separable (uses kernel functions).

Linear and non-linear SVM


Linear SVM
A Linear Support Vector Machine (SVM) is a supervised machine learning algorithm used for
classification, specifically when data can be separated by a straight line (or hyperplane in
higher dimensions).
or
Linear SVM is a supervised machine learning algorithm used for classification. It finds the
best hyperplane that separates data points of different classes with the maximum margin.
Key Concepts:
1. Hyperplane: A decision boundary that separates two classes.
2. Support Vectors: Data points closest to the hyperplane. They define the margin.
3. Margin: Distance between the hyperplane and the support vectors. SVM maximizes
this margin.
4. Linear: Assumes data is linearly separable (i.e., can be divided by a straight line or
plane).
Non-linear SVM
Non-Linear SVM is a type of Support Vector Machine used when data is not linearly
separable (i.e., cannot be divided by a straight line).
Why Use Non-Linear SVM?
• When classes are mixed in curves, not lines
• Non-linear SVM handles complex shapes like circles, spirals, etc.
Difference Between Linear and Non-Linear SVM

Kernel trick
The kernel trick is a technique used in machine learning—mainly in Support Vector
Machines (SVMs)—to transform non-linearly separable data into a higher-dimensional space,
where it becomes linearly separable without explicitly computing the transformation.
Types of Kernel
1. Linear Kernel
2. Polynomial Kernel
3. RBF (Radial Basis Function) / Gaussian Kernel
4. Sigmoid Kernel

Linear Kernel:
A Linear Kernel is the simplest kernel function used in SVM.
It is used when the data can be separated by a straight line (or plane).

Polynomial Kernel
The Polynomial Kernel is used when the relationship between the features and the target is
non-linear — meaning you can’t separate classes with a straight line.
It creates curved decision boundaries that can capture more complex patterns in data.

RBF Kernel (Radial Basis Function)


The RBF kernel is a type of kernel that can handle complex, non-linear data by drawing
smooth, curved boundaries.
It separates classes by forming round, oval, or wavy regions — perfect for situations where
data cannot be separated with a straight or even polynomial curve.
It’s also called the Gaussian Kernel.
Sigmoid Kernel:
The Sigmoid Kernel is a kernel function that comes from neural networks.
It behaves like the activation function (sigmoid) used in a single-layer neural network.

Note:

Kernel Output Pattern Interpretation

Linear A straight line separating the classes Works only if data is linearly separable

Polynomial Curved decision boundary Captures interaction between features

RBF Circular and flexible boundary Excellent for complex non-linear shapes

Sigmoid Less smooth, sometimes unpredictable Mimics neural network behavior


Model Evaluation and Tuning:
Model Evaluation:
• Purpose: To assess how well the trained SVM model generalizes to new, unseen
data.
• Metrics:
o Accuracy: Overall correctness of predictions.
o Precision: Ability of the model to avoid false positives.
o Recall: Ability of the model to find all relevant instances.
o F1-score: Harmonic mean of precision and recall, providing a balance
between the two.
• Techniques:
o Cross-validation: Dividing the data into multiple folds, training on some and
testing on others, to get a more robust estimate of performance.
Hyperparameter Tuning:
• Purpose:
To find the optimal values for SVM hyperparameters that lead to the best performance.
• Hyperparameters to tune:
• Kernel: Determines the type of decision boundary (linear, RBF, polynomial,
etc.).
• C (Regularization parameter): Controls the trade-off between maximizing
the margin and minimizing the classification error.
• Gamma: Influences the influence of a single training example.
• Tuning techniques:
• Grid Search: Exhaustively searches through a predefined set of
hyperparameter values, evaluating the model for each combination.
• Cross-validation: Used during hyperparameter tuning to evaluate the
performance of different hyperparameter combinations on different data folds.
Neural Networks and Deep Learning
Introduction to neural networks

Neural network:
A neural network is a computer system modeled after the human brain. It is
designed to recognize patterns, learn from data, and make decisions.

Learning in neural networks follows a structured, three-stage process:


1. Input Computation: Data is fed into the network.
2. Output Generation: Based on the current parameters, the network
generates an output.
3. Iterative Refinement: The network refines its output by adjusting weights
and biases, gradually improving its performance on diverse tasks.
Layers in Neural Network Architecture
1. Input Layer: This is where the network receives its input data. Each input
neuron in the layer corresponds to a feature in the input data.
2. Hidden Layers: These layers perform most of the computational heavy
lifting. A neural network can have one or multiple hidden layers. Each
layer consists of units (neurons) that transform the inputs into something
that the output layer can use.
3. Output Layer: The final layer produces the output of the model. The
format of these outputs varies depending on the specific task like
classification, regression.

Working of Neural Networks


1. Forward Propagation
When data is input into the network, it passes through the network in the
forward direction, from the input layer through the hidden layers to the output
layer. This process is known as forward propagation. Here’s what happens
during this phase:
1. Linear Transformation: Each neuron in a layer receives inputs which are
multiplied by the weights associated with the connections. These products
are summed together and a bias is added to the sum. This can be
represented mathematically as:
2. Activation: The result of the linear transformation (denoted as zz) is then
passed through an activation function. The activation function is crucial because
it introduces non-linearity into the system, enabling the network to learn more
complex patterns. Popular activation functions include ReLU, sigmoid and tanh.
2. Backpropagation
After forward propagation, the network evaluates its performance using a loss
function which measures the difference between the actual output and the
predicted output. The goal of training is to minimize this loss. This is where
backpropagation comes into play:
1. Loss Calculation: The network calculates the loss which provides a
measure of error in the predictions. The loss function could vary;
common choices are mean squared error for regression tasks or cross-
entropy loss for classification.
2. Gradient Calculation: The network computes the gradients of the loss
function with respect to each weight and bias in the network. This
involves applying the chain rule of calculus to find out how much each
part of the output error can be attributed to each weight and bias.
3. Weight Update: Once the gradients are calculated, the weights and
biases are updated using an optimization algorithm like stochastic
gradient descent (SGD). The weights are adjusted in the opposite
direction of the gradient to minimize the loss. The size of the step taken
in each update is determined by the learning rate.

3. Iteration
This process of forward propagation, loss calculation, backpropagation and
weight update is repeated for many iterations over the dataset. Over time, this
iterative process reduces the loss and the network's predictions become more
accurate.
Through these steps, neural networks can adapt their parameters to better
approximate the relationships in the data, thereby improving their performance
on tasks such as classification, regression or any other predictive modeling.

Working of a Neural Network (Step-by-Step)


Step 1: Input Layer – Receive Data
• The network gets input data (like numbers, features, images, etc.).
• Example: For movie prediction, inputs could be:
[Genre=Action, Duration=Short, Rating=8]
Step 2: Weighted Sum + Bias
• Each input is multiplied by a weight.
• A bias is added to the result.
• Formula at each neuron:
• z = (input1 × weight1) + (input2 × weight2) + ... + bias
Step 3: Activation Function – Decide
• The neuron passes the result (z) through an activation function like
ReLU, Sigmoid, or Tanh.
• This decides the neuron’s output (e.g., how "active" it is).
Step 4: Hidden Layers – Process
• The output from the first layer becomes input for the next layer.
• Multiple hidden layers allow the network to learn complex patterns.
Step 5: Output Layer – Final Prediction
• The final layer produces the result.
• Example:
o Output = 1 → Likes the movie
o Output = 0 → Doesn’t like the movie
Step 6: Compare with Actual Answer
• The network compares its prediction with the real answer using a loss
function.
• If it’s wrong, the error is calculated.
Step 7: Learn from Mistakes (Backpropagation)
• The network uses an algorithm called backpropagation to adjust weights
and bias:
o It reduces the error by updating parameters using Gradient
Descent.
Step 8: Repeat (Training)
• Steps 1–7 are repeated many times (called epochs) to improve accuracy.

Types of Neural Networks


There are seven types of neural networks that can be used.
1. Feedforward Neural Network (FNN):
Data moves in one direction — from input to output — without going
backward.
It is the most basic type of neural network.
2. Single-layer Perceptron:Has just one layer of neurons. It takes input,
multiplies with weights, adds bias, and gives one output using an activation
function.
Good for solving simple problems like AND, OR.

3. Multilayer Perceptron (MLP):


It has multiple layers — input, hidden, and output. It uses nonlinear activation
functions to learn complex patterns.
Useful in speech recognition, image classification, etc.
4. Convolutional Neural Network (CNN):
This network is best for image processing. It automatically finds features like
edges, shapes, and colors using filters.
Used in face recognition, self-driving cars, etc.
5. Recurrent Neural Network (RNN):
This network works well with sequences like text or time data. It remembers
previous steps using loops.
Used in language translation or stock price prediction.
6. Long Short-Term Memory (LSTM):
A special type of RNN that can remember for a long time. It uses gates to
control what to remember or forget.
Useful in music generation, chatbots, or video analysis.

Advantages of Neural Networks


1. Learns from data
o Can automatically learn patterns and relationships without
hardcoding rules.
2. Handles complex problems
o Works well for tasks like image recognition, voice detection, and
natural language processing.
3. Good with large data
o Performs better as the size of data increases.
4. No need for feature engineering
o Automatically extracts important features (especially in CNNs).
5. Can generalize
o Once trained properly, it can give accurate predictions on unseen
data.
6. Flexible and powerful
o Can be designed for classification, regression, generation,
translation, etc.

Disadvantages of Neural Networks


1. Requires a lot of data
o Needs large datasets to learn well. Not ideal for small data.
2. Takes time and power
o Training can be slow and needs powerful hardware (especially
deep networks).
3. Hard to interpret (Black box)
o Difficult to understand why it made a specific decision.
4. Overfitting risk
o If not managed well, the model may memorize the training data
and perform poorly on new data.
5. Need for tuning
o Requires a lot of trial-and-error to choose the right architecture,
learning rate, etc.
6. No guarantee of success
o Might not work well without proper preprocessing or parameter
tuning.

Applications of Neural Networks


Neural networks have numerous applications across various fields:
1. Image and Video Recognition: CNNs are extensively used in
applications such as facial recognition, autonomous driving and medical
image analysis.
2. Natural Language Processing (NLP): RNNs and transformers power
language translation, chatbots and sentiment analysis.
3. Finance: Predicting stock prices, fraud detection and risk management.
4. Healthcare: Neural networks assist in diagnosing diseases, analyzing
medical images and personalizing treatment plans.
5. Gaming and Autonomous Systems: Neural networks enable real-time
decision-making, enhancing user experience in video games and enabling
autonomous systems like self-driving cars.

Building and training neural networks using TensorFloow and Keras


What Are TensorFlow and Keras?
• TensorFlow: An open-source library developed by Google for numerical
computation and machine learning. It provides robust tools to build and
train deep learning models.
• Keras: A high-level API integrated with TensorFlow that simplifies the
creation of neural networks with a user-friendly interface.
import numpy as np
import pandas as pd
from [Link] import Sequential
from [Link] import Dense

Step 2: Create and Load Dataset


data = {
'feature1': [0.1, 0.2, 0.3, 0.4, 0.5],
'feature2': [0.5, 0.4, 0.3, 0.2, 0.1],
'label': [0, 0, 1, 1, 1]
}
df = [Link](data)

# Convert to NumPy arrays


X = df[['feature1', 'feature2']].values
y = df['label'].values

Step 3: Define the Model Architecture


model = Sequential([
Dense(8, activation='relu', input_shape=(2,)), # 2 input features
Dense(4, activation='relu'),
Dense(1, activation='sigmoid') # 1 output for binary classification
])
Step 4: Compile the Model
[Link](optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])

Step 5: Train the Model


[Link](X, y, epochs=20, batch_size=1)

Step 6: Evaluate the Model


print(f"Accuracy: {accuracy * 100:.2f}%")

Output:
Accuracy: 100.00%

Convolutional Neural Networks(CNN)


Convolutional Neural Networks (CNNs) are deep learning models designed to
process data with a grid-like topology such as images. They are the foundation
for most modern computer vision applications to detect features within visual
data.
How CNNs Work?
1. Input Image: CNN receives an input image which is preprocessed to
ensure uniformity in size and format.
2. Convolutional Layers: Filters are applied to the input image to extract
features like edges, textures and shapes.
3. Pooling Layers: The feature maps generated by the convolutional layers
are downsampled to reduce dimensionality.
4. Fully Connected Layers: The downsampled feature maps are passed
through fully connected layers to produce the final output, such as a
classification label.
5. Output: The CNN outputs a prediction, such as the class of the image.
How to Train a Convolutional Neural Network?
CNNs are trained using a supervised learning approach. This means that the
CNN is given a set of labeled training images. The CNN learns to map the input
images to their correct labels.
The training process for a CNN involves the following steps:
1. Data Preparation: The training images are preprocessed to ensure that
they are all in the same format and size.
2. Loss Function: A loss function is used to measure how well the CNN is
performing on the training data. The loss function is typically calculated
by taking the difference between the predicted labels and the actual labels
of the training images.
3. Optimizer: An optimizer is used to update the weights of the CNN in
order to minimize the loss function.
4. Backpropagation: Backpropagation is a technique used to calculate the
gradients of the loss function with respect to the weights of the CNN. The
gradients are then used to update the weights of the CNN using the
optimizer.
How to Evaluate CNN Models
Efficiency of CNN can be evaluated using a variety of criteria. Among the most
popular metrics are:
• Accuracy: Accuracy is the percentage of test images that the CNN
correctly classifies.
• Precision: Precision is the percentage of test images that the CNN
predicts as a particular class and that are actually of that class.
• Recall: Recall is the percentage of test images that are of a particular
class and that the CNN predicts as that class.
• F1 Score: The F1 Score is a harmonic mean of precision and recall. It is a
good metric for evaluating the performance of a CNN on classes that are
imbalanced.
Advantages of CNN
• High Accuracy: They can achieve high accuracy in various image
recognition tasks.
• Efficiency: They are efficient, especially when implemented on GPUs.
• Robustness: They are robust to noise and variations in input data.
• Adaptability: It can be adapted to different tasks by modifying their
architecture.
Disadvantages of CNN
• Complexity: It can be complex and difficult to train, especially for large
datasets.
• Resource-Intensive: It require significant computational resources for
training and deployment.
• Data Requirements: They need large amounts of labeled data for
training.
• Interpretability: They can be difficult to interpret making it challenging
to understand their predictions.
Recurrent Neural Networks(RNN)

Definition:
A Recurrent Neural Network (RNN) is a type of neural network used for sequential data.
Unlike traditional neural networks, RNNs have a memory — they can remember previous
inputs due to internal loops. This makes them powerful for tasks like:
• Time Series Forecasting
• Natural Language Processing
• Speech Recognition
• Music Generation

Applications of RNNs
• Text generation
• Language translation
• Stock price prediction
• Handwriting recognition
• Chatbots

Components of Recurrent Neural Network (RNN)


1. Input (xₜ)
At each time step t, the RNN receives an input xₜ.
This input is part of a sequence and is processed one step at a time.
It provides the data required for the RNN to perform computation at that specific time step.

2. Hidden State (hₜ)

The hidden state hₜ represents the internal memory of the RNN at time step t.
It is updated based on the current input and the previous hidden state.
This helps the network retain information over time and learn temporal patterns in the data.

3. Output (yₜ)

The output yₜ is the prediction or result produced by the RNN at time step t.
It is generated using the current hidden state.
This output may be used at each time step or only at the final step, depending on the task.

Types of Recurrent Neural Networks (RNNs)


Recurrent Neural Networks (RNNs) come in several types, each designed to handle different
challenges in sequence learning. The main types are explained below:

1. Simple RNN
Simple RNN is the basic form of a recurrent neural network.
It processes input sequences one step at a time and uses hidden states to remember
information.
However, it struggles with long sequences due to the vanishing gradient problem.

2. LSTM (Long Short-Term Memory)


LSTM is a special type of RNN that can learn long-term dependencies.
It uses three gates — input gate, forget gate, and output gate — to control the flow of
information.
LSTM is widely used in applications like language modeling, translation, and speech
recognition.

3. GRU (Gated Recurrent Unit)


GRU is a simplified version of LSTM.
It combines the input and forget gates into a single update gate and uses fewer parameters.
GRU trains faster than LSTM and performs well in many tasks involving sequential data.

4. Bidirectional RNN
In a bidirectional RNN, the input sequence is processed in both forward and backward
directions.
This allows the network to have context from both past and future time steps.
It is useful in tasks where full sequence information is important, such as text classification.

5. Deep or Stacked RNN


A deep RNN has multiple RNN layers stacked on top of each other.
Each layer learns more complex patterns from the sequence.
Stacked RNNs are useful for tasks that require high-level feature extraction from time-series
or sequential data.
Output:

Predicted next number: 104.05

Advantages:
1. RNNs can handle sequential and time-series data.
2. They remember previous inputs using hidden states.
3. Same weights are used at each time step, reducing complexity.
4. Suitable for variable-length input and output sequences.
5. Useful for tasks like text generation, translation, and speech recognition.

Disadvantages:
1. Struggle with long-term memory due to vanishing gradient.
2. Training is slow and difficult to parallelize.
3. Requires careful tuning of hyperparameters.
4. Standard RNNs forget earlier inputs in long sequences.
5. Less effective than LSTM or GRU for long dependencies.

You might also like