0% found this document useful (0 votes)

26 views18 pages

Deep Learning

The document discusses key concepts in deep learning, including artificial neural networks (ANNs), activation functions, and training techniques. It covers topics such as the structure of ANNs, the sigmoid activation function, training methods like gradient descent and backpropagation, and the differences between overfitting and underfitting. Additionally, it explains the roles of various activation functions like sigmoid, Tanh, and ReLU, along with their advantages and disadvantages in neural networks.

Uploaded by

swyf0hbh61

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views18 pages

Deep Learning

Uploaded by

swyf0hbh61

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

DEEP LEARNING

2 MARKS
1. **What is an artificial neural network?**

- An artificial neural network (ANN) is a computational model inspired by the structure and
function of the human brain. It consists of interconnected layers of nodes (neurons) that
process input data, learn patterns, and produce outputs. ANNs are used in machine learning to
solve complex problems like image recognition, natural language processing, and more.

2. Write the mathematical equation of the sigmoid activation function?

3. Name any two techniques to train feed-forward neural networks?

- Gradient Descent: An optimization algorithm used to minimize the error function by

iteratively adjusting the weights of the network.

- **Backpropagation**: A method for calculating the gradient of the loss function with respect
to each weight by applying the chain rule, allowing for efficient training of deep networks.

4. How does deep learning address the fast-food problem?

- In the fast-food problem, deep learning uses a neural network to predict the total cost of a
meal based on the number of servings of burgers, fries, and sodas. The network learns the
optimal weights (prices) for each item through training on a dataset of meal combinations and
their corresponding total costs.
5. **List two real-world applications of artificial neural networks.**

- **Image Recognition**: Used in applications like facial recognition, medical imaging, and
self-driving cars.

- Natural Language Processing (NLP): Used in chatbots, language translation, and

sentiment analysis.

6. How does the information flow in a feed-forward neural network?

- In a feed-forward neural network, information flows in one direction: from the input layer
through one or more hidden layers to the output layer. There are no cycles or loops in the
network, meaning data does not flow backward.

7. Difference between overfitting and underfitting.

- **Overfitting**: Occurs when a model learns the training data too well, including noise and
outliers, resulting in poor performance on new, unseen data.

- **Underfitting**: Occurs when a model is too simple to capture the underlying patterns in
the data, resulting in poor performance on both training and test data.

8. Define batch gradient.

- Batch gradient descent is a variant of gradient descent where the gradient of the loss
function is computed using the entire training dataset. This method updates the model
parameters once per epoch, making it computationally expensive for large datasets.

9. How to reduce the cost of convolutional network training?

- **Batch Normalization**: Normalizes the inputs to each layer, reducing internal covariate
shift and accelerating training.

- Parameter Sharing: Reduces the number of parameters by sharing weights across

different parts of the input, which is a key feature of convolutional layers.
10. **List about parameter sharing in neural networks.**

- Parameter sharing is a technique used in convolutional neural networks (CNNs) where the
same set of weights (filters) is applied to different regions of the input. This reduces the number
of parameters and allows the network to detect features regardless of their position in the
input.

11. What is the role of an optimizer in deep learning?

- The role of an optimizer in deep learning is to adjust the weights of the neural network to
minimize the loss function. Optimizers like Gradient Descent, Adam, and RMSProp help the
network converge to the optimal solution efficiently.

12. **What does the dilation effect on the receptive field in CNN?**

- Dilation in CNNs refers to the spacing between the elements of a convolutional filter.
Increasing the dilation rate expands the receptive field without increasing the number of
parameters, allowing the network to capture larger contextual information.

13. How does max pooling differ from average pooling?

- **Max Pooling**: Selects the maximum value from a region of the feature map, emphasizing
the most prominent features.

- **Average Pooling**: Computes the average value from a region of the feature map,
providing a smoother representation of the features.

14. What is a convolutional network?

- A convolutional network (CNN) is a type of deep neural network designed to process grid-like
data, such as images. It uses convolutional layers to automatically and adaptively learn spatial
hierarchies of features from the input data.
15. **Why is parameter sharing used in CNN?**

- Parameter sharing is used in CNNs to reduce the number of parameters, making the network
more efficient and less prone to overfitting. It also allows the network to detect features
regardless of their position in the input, providing translation invariance.

16. **Write the formula to find how many neurons to fit for a network.**

- The number of neurons in a layer is typically determined by the problem complexity and the
architecture design. There is no fixed formula, but a common heuristic is to start with a number
of neurons between the size of the input layer and the output layer, and adjust based on
performance.

17. List the different types of recurrent neural networks.

- **Vanilla RNN**: The basic form of RNN with simple recurrent connections.

- Long Short-Term Memory (LSTM): A type of RNN designed to remember long-term

dependencies.

- **Gated Recurrent Unit (GRU)**: A simplified version of LSTM with fewer parameters.

18. Name the function of transducer in the design patterns of RNN.

- In RNN design patterns, a transducer is used to map an input sequence to an output

sequence of the same length, such as in sequence labeling tasks.

19. Name the function of acceptor in the design patterns of RNN.

- In RNN design patterns, an acceptor is used to map an input sequence to a single output,
such as in classification tasks where the final output represents the class label.

20. Define unfolding in the context of RNNs.

- Unfolding in RNNs refers to the process of visualizing the network as a sequence of layers,
where each time step is represented as a separate layer. This helps in understanding the flow of
information through time and applying backpropagation through time (BPTT) for training.
16 MARKS
1. 1. Explain the concept of linear perceptron’s as a model for neurons and list its functions
and its limitations

### **1. Explain the concept of linear perceptron’s as a model for neurons and list its functions
and its limitations in detail (16 marks)**

---

#### Concept of Linear Perceptron:

The linear perceptron is one of the earliest and simplest models of an artificial neuron. It was
introduced by Frank Rosenblatt in 1957 and is inspired by the biological neurons in the human
brain. The perceptron is a computational unit that takes multiple inputs, applies weights to
them, and produces an output based on a linear combination of the inputs. Mathematically, the
output \( y \) is given by:

The perceptron uses a **threshold function** (e.g., step function) to classify inputs into binary
categories (e.g., 0 or 1). For example:
The perceptron is a **linear classifier**, meaning it can only separate data points that are
linearly separable. It works by finding a hyperplane (a straight line in 2D) that divides the input
space into two regions, each corresponding to one of the two classes.

#### Functions of Linear Perceptron:

1. **Binary Classification**:

- The primary function of the perceptron is to classify inputs into two categories. For example,
it can classify whether an email is spam or not spam based on features like the presence of
certain keywords.

- The decision boundary is a hyperplane defined by the weights and bias. The perceptron
adjusts the weights and bias during training to minimize classification errors.

2. **Learning**:

- The perceptron can learn the optimal weights and bias using algorithms like the **Delta
Rule** or **Gradient Descent**. During training, the perceptron updates its weights based on
the error between the predicted output and the actual target.

3. **Feature Extraction**:

- The perceptron can learn to emphasize important features in the input data by assigning
higher weights to them. For example, in image recognition, the perceptron might learn to focus
on edges or textures.

4. Linear Decision Boundary:

- The perceptron creates a linear decision boundary in the input space. This boundary
separates the input data into two classes based on the learned weights and bias.

---

#### Limitations of Linear Perceptron:

1. **Linear Separability**:

- The perceptron can only solve problems that are **linearly separable**. This means that the
data points of the two classes must be separable by a straight line (in 2D) or a hyperplane (in
higher dimensions).

- For example, the perceptron cannot solve the **XOR problem**, where the data points are
arranged in such a way that no straight line can separate them.

2. **Limited Expressiveness**:

- The perceptron is a simple model with no hidden layers. It cannot model complex, non-linear
relationships between inputs and outputs.

- This makes it unsuitable for tasks like image recognition or natural language processing,
where the relationships between inputs and outputs are highly non-linear.

3. No Hidden Layers:

- The perceptron lacks hidden layers, which are essential for learning hierarchical features. In
deep learning, hidden layers allow the network to learn intermediate representations of the
data, such as edges, shapes, and objects in images.

4. **Sensitivity to Initialization**:

- The performance of the perceptron can be highly dependent on the initial values of the
weights and bias. Poor initialization can lead to slow convergence or getting stuck in local
minima.
5. **Vanishing Gradient Problem**:

- Although the perceptron itself does not suffer from the vanishing gradient problem (since it
has no hidden layers), it is a limitation of more complex neural networks that build upon the
perceptron. In deep networks, gradients can become very small during backpropagation,
making it difficult to update the weights effectively.

6. **Overfitting**:

- The perceptron can overfit to the training data if the number of parameters (weights) is large
relative to the size of the dataset. Overfitting occurs when the model learns the noise in the
training data instead of the underlying patterns.

7. No Probabilistic Output:

- The perceptron produces a binary output (0 or 1) based on a threshold function. It does not
provide a probabilistic measure of the output, which is useful in many classification tasks.

---

#### Example from the Book:

In the book "Fundamentals of Deep Learning" by Nikhil Buduma, the perceptron is

introduced as a foundational model for understanding neural networks. The book explains how
the perceptron can be used to classify data points in a 2D plane. For example, given a dataset of
points labeled as either "above" or "below" a line, the perceptron can learn to classify new
points based on their position relative to the line.

The book also highlights the limitations of the perceptron, particularly its inability to solve non-
linear problems like XOR. This limitation motivated the development of more complex models,
such as multi-layer perceptrons (MLPs) and deep neural networks, which can learn non-linear
decision boundaries.

---
#### **Conclusion**:

The linear perceptron is a simple yet powerful model for binary classification tasks. It serves as
the foundation for more complex neural networks. However, its limitations, such as the inability
to solve non-linear problems and the lack of hidden layers, make it unsuitable for many real-
world applications. These limitations led to the development of more advanced models, such as
multi-layer perceptrons and convolutional neural networks, which can handle complex, non-
linear relationships in data.

**3. Explain sigmoid, Tanh, and ReLU neurons and discuss their differences, advantages, and
disadvantages in the context of neural networks (16 marks)**

#### **Introduction**:

Activation functions are a critical component of neural networks. They introduce non-linearity
into the model, enabling the network to learn complex patterns and relationships in the data.
Three commonly used activation functions are **Sigmoid**, **Tanh**, and **ReLU**. Each of
these functions has unique properties, advantages, and disadvantages, which make them
suitable for different scenarios.

#### 1. Sigmoid Neuron:

- **Mathematical Function**:

- **Output Range**:

The output of the sigmoid function is in the range \( (0, 1) \).

- **Advantages**:

1. **Smooth Gradient**: The sigmoid function is differentiable, which makes it suitable for
gradient-based optimization algorithms like backpropagation.

2. Probabilistic Interpretation: The output can be interpreted as a probability, making it

useful for binary classification tasks.

3. **Historical Significance**: Sigmoid was one of the first activation functions used in neural
networks, and it played a key role in the development of early models.

- **Disadvantages**:

1. **Vanishing Gradients**: For very large or very small inputs, the gradient of the sigmoid
function becomes very small. This slows down learning and makes it difficult to train deep
networks.

2. Non-Zero-Centered Outputs: The output of the sigmoid function is always positive,

which can lead to inefficient weight updates during gradient descent.

3. Computationally Expensive: The exponential function in the sigmoid is computationally

expensive compared to simpler functions like ReLU.

---

#### 2. Tanh Neuron:

- **Mathematical Function**:
- **Output Range**:

The output of the Tanh function is in the range \( (-1, 1) \).

- **Advantages**:

1. Zero-Centered Outputs: The Tanh function is zero-centered, which helps in faster

convergence during training compared to sigmoid.

2. **Stronger Gradients**: The gradients of the Tanh function are stronger than those of the
sigmoid function, reducing the vanishing gradient problem to some extent.

3. **Smooth Gradient**: Like sigmoid, Tanh is differentiable, making it suitable for gradient-
based optimization.

- **Disadvantages**:

1. **Vanishing Gradients**: Although Tanh performs better than sigmoid, it still suffers from
the vanishing gradient problem for very large or very small inputs.

2. Computationally Expensive: The Tanh function involves exponential calculations, which

are computationally expensive.

---

#### 3. ReLU Neuron:

- **Mathematical Function**:

The Rectified Linear Unit (ReLU) function is defined as:

f(z) = \max(0, z)

where \( z \) is the input to the neuron.

- **Output Range**:

The output of the ReLU function is in the range \( [0, \infty) \).

- **Advantages**:

1. Computationally Efficient: ReLU is computationally cheap because it involves simple

thresholding at zero.

2. **Sparsity**: ReLU outputs zero for negative inputs, which introduces sparsity in the
activations. This can lead to more efficient and faster learning.

3. **No Vanishing Gradients for Positive Inputs**: For positive inputs, the gradient of ReLU is
always 1, which avoids the vanishing gradient problem and speeds up training.

4. **Widely Used in Deep Learning**: ReLU is the most commonly used activation function in
modern deep learning models due to its simplicity and effectiveness.

- **Disadvantages**:

1. **Dying ReLU Problem**: For negative inputs, the gradient of ReLU is zero. If too many
neurons become inactive (output zero), the network can stop learning. This is known as the
"dying ReLU" problem.

2. **Not Zero-Centered**: Like sigmoid, ReLU outputs are not zero-centered, which can affect
the efficiency of weight updates during training.

---

#### Differences Between Sigmoid, Tanh, and ReLU:

---

#### Advantages and Disadvantages Summary:

#### **Context in Neural Networks**:

- **Sigmoid**: Historically used in early neural networks, but its limitations (vanishing
gradients, computational cost) have made it less popular in modern deep learning. It is still used
in the output layer for binary classification tasks.

- **Tanh**: Performs better than sigmoid due to zero-centered outputs and stronger gradients.
However, it is still prone to vanishing gradients and is computationally expensive.

- **ReLU**: The most widely used activation function in deep learning due to its simplicity,
efficiency, and ability to avoid vanishing gradients. However, the dying ReLU problem can be
mitigated using variants like **Leaky ReLU** or **Parametric ReLU

Unit 2 use CNN ppt

UNIT -3

1. recurrent Neural network

### **Concept of Gradients in RNNs (16 marks)**

---

#### **Introduction**:
Recurrent Neural Networks (RNNs) are designed to handle sequential data, such as time
series, text, or speech. Unlike feedforward neural networks, RNNs have connections that
form cycles, allowing them to maintain a "memory" of previous inputs through hidden
states. However, training RNNs involves computing gradients over time, which
introduces unique challenges, such as **vanishing** and **exploding gradients**.
These challenges are critical to understanding how RNNs learn and why they sometimes
fail to converge.

---

#### 1. Gradients in RNNs:

- In RNNs, gradients are computed using **Backpropagation Through Time (BPTT)**,
which is an extension of the standard backpropagation algorithm. BPTT unrolls the RNN
over time and computes gradients for each time step.
- The goal of gradient computation is to update the weights of the RNN to minimize the
loss function. The loss function measures the difference between the predicted output
and the actual target at each time step.
---

#### 2. Backpropagation Through Time (BPTT):

- **Unrolling the RNN**:
- An RNN processes sequential data one time step at a time. During BPTT, the network
is "unrolled" over time, creating a computational graph that represents the network's
operations at each time step.
- For example, if the input sequence has length \( T \), the RNN is unrolled into \( T \)
layers, each corresponding to a time step.

- **Gradient Computation**:
- At each time step \( t \), the RNN computes:

-
**Gradient Propagation**:
- During BPTT, gradients are computed for each time step and propagated backward
through the unrolled network. The gradient of the loss with respect to the weights \(
W_h \), \( W_x \), and \( W_y \) is computed using the chain rule.

---

#### 3. Challenges with Gradients in RNNs:

1. **Vanishing Gradients**:
- **Cause**: When the gradients are propagated backward through time, they are
multiplied by the same weight matrix \( W_h \) repeatedly. If the eigenvalues of \( W_h
\) are less than 1, the gradients shrink exponentially as they propagate backward,
leading to vanishing gradients.
- **Effect**: The weights are not updated effectively, and the network stops learning.
This makes it difficult for the RNN to capture long-term dependencies in the data.
- **Example**: In a language model, the RNN may fail to learn relationships between
words that are far apart in a sentence.

2. **Exploding Gradients**:
- **Cause**: If the eigenvalues of \( W_h \) are greater than 1, the gradients grow
exponentially as they propagate backward, leading to exploding gradients.
- **Effect**: The weights are updated with very large values, causing the network to
become unstable and diverge.
- **Example**: The loss function may become NaN (Not a Number) due to numerical
overflow.

---

#### 4. Solutions to Gradient Problems:

1. **Gradient Clipping**:
- To address exploding gradients, the gradients are clipped to a maximum threshold.
This prevents the gradients from growing too large and stabilizes training.

2. **Weight Initialization**:
- Proper initialization of the weight matrices \( W_h \), \( W_x \), and \( W_y \) can
help mitigate vanishing and exploding gradients. For example, initializing the weights
using the Xavier or He initialization methods ensures that the gradients remain stable.

3. Advanced RNN Architectures:

- **Long Short-Term Memory (LSTM)**: LSTMs use gating mechanisms to control the
flow of information through the network. This allows them to capture long-term
dependencies and avoid vanishing gradients.
- **Gated Recurrent Unit (GRU)**: GRUs are a simplified version of LSTMs that also
use gating mechanisms to address gradient problems.

4. **Batch Normalization**:
- Batch normalization normalizes the inputs to each layer, reducing internal covariate
shift and stabilizing the gradients.

---

#### 5. Example from the Book:

In the book **"Fundamentals of Deep Learning" by Nikhil Buduma**, the concept of
gradients in RNNs is explained in detail. The book highlights the challenges of training
RNNs, particularly the vanishing and exploding gradient problems. It also discusses
solutions like gradient clipping, weight initialization, and advanced architectures like
LSTMs and GRUs.

For example, the book explains how LSTMs use input, forget, and output gates to control
the flow of information and gradients through the network. This allows LSTMs to
maintain long-term memory and avoid vanishing gradients, making them more effective
for tasks like language modeling and machine translation.
---

#### **Conclusion**:
Gradients play a crucial role in training RNNs, but they also introduce challenges like
vanishing and exploding gradients. These challenges arise due to the repeated
multiplication of the same weight matrix during backpropagation through time.
Solutions like gradient clipping, proper weight initialization, and advanced architectures
like LSTMs and GRUs have been developed to address these issues. Understanding the
concept of gradients in RNNs is essential for designing and training effective models for
sequential data.

---

This answer is structured to meet the 16-mark requirement, providing a detailed

explanation of the concept of gradients in RNNs, including the challenges and solutions,
with references to the book where applicable.

2. Outline the architecture of recurrent Neural network

----Recurrent Neural Network (RNN) Architecture Explained | by Sushmita Poudel |

Medium
3. types of Recurrent Neural Networks- Types of Recurrent Neural Networks (RNN) in
Tensorflow - GeeksforGeeks

DL CO1 and CO2 Answers
No ratings yet
DL CO1 and CO2 Answers
36 pages
Exam Gen AI
No ratings yet
Exam Gen AI
14 pages
Lecture 1-Unit 3.3
No ratings yet
Lecture 1-Unit 3.3
3 pages
Neural Network - Test Questions
No ratings yet
Neural Network - Test Questions
9 pages
Question Bank Advanced CO1, CO2
No ratings yet
Question Bank Advanced CO1, CO2
4 pages
Deep Learning Theory Questions
No ratings yet
Deep Learning Theory Questions
3 pages
Neural Networks: Concepts and Challenges
No ratings yet
Neural Networks: Concepts and Challenges
13 pages
DL Cie2
No ratings yet
DL Cie2
5 pages
Viva
No ratings yet
Viva
8 pages
Ch4 and Ch5 Notes
No ratings yet
Ch4 and Ch5 Notes
38 pages
Deep Learning Cheats
No ratings yet
Deep Learning Cheats
13 pages
DL Viva
No ratings yet
DL Viva
7 pages
Deep Learning Exam Questions Explained
No ratings yet
Deep Learning Exam Questions Explained
5 pages
Deep Learning Viva Questions Simple Answers
No ratings yet
Deep Learning Viva Questions Simple Answers
3 pages
Deep Learning - Question Bank
No ratings yet
Deep Learning - Question Bank
6 pages
120 Deep Learning Important Questions + Answers ?
No ratings yet
120 Deep Learning Important Questions + Answers ?
68 pages
Question Bank
No ratings yet
Question Bank
2 pages
Deep Learning
No ratings yet
Deep Learning
17 pages
Nueral
No ratings yet
Nueral
16 pages
Model Questions DWT COMPLETE SOLUTIONS
No ratings yet
Model Questions DWT COMPLETE SOLUTIONS
18 pages
Question Bank
No ratings yet
Question Bank
14 pages
Assignment Jaiprakash
No ratings yet
Assignment Jaiprakash
5 pages
Deep Learning Viva Questions (1-3)
No ratings yet
Deep Learning Viva Questions (1-3)
4 pages
Deep Learning
No ratings yet
Deep Learning
8 pages
Deep Learning Sem
No ratings yet
Deep Learning Sem
128 pages
Deep Learning Updated
No ratings yet
Deep Learning Updated
11 pages
ST1 Question Bank
No ratings yet
ST1 Question Bank
2 pages
Deep Learning
No ratings yet
Deep Learning
15 pages
Interview Questions Answers
No ratings yet
Interview Questions Answers
7 pages
New - Neural Network & Deep Learning
No ratings yet
New - Neural Network & Deep Learning
8 pages
Unit-1 MCQ
No ratings yet
Unit-1 MCQ
53 pages
Ilide Info Deep Learning Questions PR
No ratings yet
Ilide Info Deep Learning Questions PR
51 pages
Deep Learning Concepts Explained
50% (2)
Deep Learning Concepts Explained
51 pages
Genai See
No ratings yet
Genai See
51 pages
OT Unit1
No ratings yet
OT Unit1
3 pages
Deep Learning Final
No ratings yet
Deep Learning Final
17 pages
DL
No ratings yet
DL
1 page
DL Important
No ratings yet
DL Important
13 pages
Deep Learning Fundamentals Explained
No ratings yet
Deep Learning Fundamentals Explained
6 pages
Transfer Learning
No ratings yet
Transfer Learning
3 pages
Class - 10th Computer Vision Mcqs by Pratik
No ratings yet
Class - 10th Computer Vision Mcqs by Pratik
15 pages
DL QB
No ratings yet
DL QB
4 pages
Deep Learning
No ratings yet
Deep Learning
4 pages
Answer Key Deep Nural Netwrok Unit 6
No ratings yet
Answer Key Deep Nural Netwrok Unit 6
8 pages
Deep Learning Essentials for Experts
No ratings yet
Deep Learning Essentials for Experts
8 pages
IF4071 Ass1 - Practive Questions of Deep Learning IF4071 Ass1 - Practive Questions of Deep Learning
No ratings yet
IF4071 Ass1 - Practive Questions of Deep Learning IF4071 Ass1 - Practive Questions of Deep Learning
8 pages
Unit Test - 1 Neural Network 2025-26
No ratings yet
Unit Test - 1 Neural Network 2025-26
8 pages
DLT PYQs
No ratings yet
DLT PYQs
3 pages
Detailed Deep Learning Answers
No ratings yet
Detailed Deep Learning Answers
4 pages
SELFMADEMCQASCST2
No ratings yet
SELFMADEMCQASCST2
8 pages
Deep Learning Viva
No ratings yet
Deep Learning Viva
5 pages
DL Lab
No ratings yet
DL Lab
5 pages
Deep Learning
No ratings yet
Deep Learning
6 pages
Assignment No 02 Neural Networks
No ratings yet
Assignment No 02 Neural Networks
3 pages
DeepLearning Saq
No ratings yet
DeepLearning Saq
11 pages
Unit 1 Mid Term
No ratings yet
Unit 1 Mid Term
3 pages
Section - C: Unit 1
No ratings yet
Section - C: Unit 1
12 pages
Deep Learning Question Bank
No ratings yet
Deep Learning Question Bank
1 page
Unit 1
No ratings yet
Unit 1
99 pages
I ST Internal-CE
No ratings yet
I ST Internal-CE
26 pages
Dis Cia 1
No ratings yet
Dis Cia 1
25 pages
Ce 2
No ratings yet
Ce 2
28 pages
Ce 2internal
No ratings yet
Ce 2internal
34 pages
Variable Neighborhood Search Overview
No ratings yet
Variable Neighborhood Search Overview
15 pages
Bcse0101: Digital Image Processing: Credits: 03 L-T-P: 3-0-0
No ratings yet
Bcse0101: Digital Image Processing: Credits: 03 L-T-P: 3-0-0
1 page
Polynomial Long Division Guide
No ratings yet
Polynomial Long Division Guide
28 pages
Bài Báo NCKH
No ratings yet
Bài Báo NCKH
3 pages
R Data Visualization Techniques
No ratings yet
R Data Visualization Techniques
21 pages
Dynamical Systems Analysis of A K-Essence Model
No ratings yet
Dynamical Systems Analysis of A K-Essence Model
10 pages
Jim and The Jokes
No ratings yet
Jim and The Jokes
3 pages
Feedback & Control Laboratory1 Nomenclature of A Control System
No ratings yet
Feedback & Control Laboratory1 Nomenclature of A Control System
10 pages
Multivariate Gaussian Document Representation From Word Embeddings For Text Categorization
No ratings yet
Multivariate Gaussian Document Representation From Word Embeddings For Text Categorization
6 pages
C Programming Exam for Engineers
No ratings yet
C Programming Exam for Engineers
2 pages
137A Lecture Note 0803
No ratings yet
137A Lecture Note 0803
118 pages
Real Time Prediction and Anomaly Detection of Electrical Load in PDF
No ratings yet
Real Time Prediction and Anomaly Detection of Electrical Load in PDF
10 pages
Daa Important Two Mark Questions With Answers
No ratings yet
Daa Important Two Mark Questions With Answers
20 pages
Class10 IT Unit2 Electronic Spreadsheet Advanced
No ratings yet
Class10 IT Unit2 Electronic Spreadsheet Advanced
4 pages
Tufts CS170
No ratings yet
Tufts CS170
3 pages
ACJC 9758 2025 Prelim P2 Solution
No ratings yet
ACJC 9758 2025 Prelim P2 Solution
12 pages
Graph Types in Psychological Statistics
No ratings yet
Graph Types in Psychological Statistics
17 pages
Difference Between Distinct and Group by
No ratings yet
Difference Between Distinct and Group by
1 page
Properties of Task Environment
100% (2)
Properties of Task Environment
2 pages
Learning Mechanisms and Rules-Notes
No ratings yet
Learning Mechanisms and Rules-Notes
7 pages
Probability and Random Processes For Electrical and Computer Engineers Second Edition Solution
0% (1)
Probability and Random Processes For Electrical and Computer Engineers Second Edition Solution
70 pages
PDE Tutorial: Vector Calculus Exercises
No ratings yet
PDE Tutorial: Vector Calculus Exercises
6 pages
Automata Theory P.K.srimani
No ratings yet
Automata Theory P.K.srimani
622 pages
Zero-Shot Learning Approach To Adaptive Cybersecurity Using Explainable AI
No ratings yet
Zero-Shot Learning Approach To Adaptive Cybersecurity Using Explainable AI
10 pages
Books Final
No ratings yet
Books Final
128 pages
CF Chapter 12 Excel Master Student
No ratings yet
CF Chapter 12 Excel Master Student
25 pages
On The Mean-Square Performance of The Constrained LMS Algorithm
No ratings yet
On The Mean-Square Performance of The Constrained LMS Algorithm
5 pages
Constraint Satisfaction Problems Guide
No ratings yet
Constraint Satisfaction Problems Guide
51 pages
Sahil Merai CV Data Science and Researcher PDF
No ratings yet
Sahil Merai CV Data Science and Researcher PDF
2 pages
1 4 Extrema and Average Rates of Change
No ratings yet
1 4 Extrema and Average Rates of Change
35 pages

Deep Learning

Uploaded by

Deep Learning

Uploaded by

DEEP LEARNING

2. **Write the mathematical equation of the sigmoid activation function?**

3. **Name any two techniques to train feed-forward neural networks?**

- **Gradient Descent**: An optimization algorithm used to minimize the error function by

4. **How does deep learning address the fast-food problem?**

- **Natural Language Processing (NLP)**: Used in chatbots, language translation, and

6. **How does the information flow in a feed-forward neural network?**

7. **Difference between overfitting and underfitting.**

8. **Define batch gradient.**

9. **How to reduce the cost of convolutional network training?**

- **Parameter Sharing**: Reduces the number of parameters by sharing weights across

11. **What is the role of an optimizer in deep learning?**

13. **How does max pooling differ from average pooling?**

14. **What is a convolutional network?**

17. **List the different types of recurrent neural networks.**

- **Long Short-Term Memory (LSTM)**: A type of RNN designed to remember long-term

18. **Name the function of transducer in the design patterns of RNN.**

- In RNN design patterns, a transducer is used to map an input sequence to an output

19. **Name the function of acceptor in the design patterns of RNN.**

20. **Define unfolding in the context of RNNs.**

#### **Concept of Linear Perceptron**:

#### **Functions of Linear Perceptron**:

4. **Linear Decision Boundary**:

#### **Limitations of Linear Perceptron**:

3. **No Hidden Layers**:

7. **No Probabilistic Output**:

#### **Example from the Book**:

In the book **"Fundamentals of Deep Learning" by Nikhil Buduma**, the perceptron is

#### **1. Sigmoid Neuron**:

The output of the sigmoid function is in the range \( (0, 1) \).

2. **Probabilistic Interpretation**: The output can be interpreted as a probability, making it

2. **Non-Zero-Centered Outputs**: The output of the sigmoid function is always positive,

3. **Computationally Expensive**: The exponential function in the sigmoid is computationally

#### **2. Tanh Neuron**:

The output of the Tanh function is in the range \( (-1, 1) \).

1. **Zero-Centered Outputs**: The Tanh function is zero-centered, which helps in faster

2. **Computationally Expensive**: The Tanh function involves exponential calculations, which

#### **3. ReLU Neuron**:

The Rectified Linear Unit (ReLU) function is defined as:

where \( z \) is the input to the neuron.

1. **Computationally Efficient**: ReLU is computationally cheap because it involves simple

#### **Differences Between Sigmoid, Tanh, and ReLU**:

#### **Advantages and Disadvantages Summary**:

Unit 2 use CNN ppt

1. recurrent Neural network

#### **1. Gradients in RNNs**:

#### **2. Backpropagation Through Time (BPTT)**:

#### **3. Challenges with Gradients in RNNs**:

#### **4. Solutions to Gradient Problems**:

3. **Advanced RNN Architectures**:

#### **5. Example from the Book**:

This answer is structured to meet the 16-mark requirement, providing a detailed

2. Outline the architecture of recurrent Neural network

----Recurrent Neural Network (RNN) Architecture Explained | by Sushmita Poudel |

You might also like

2. Write the mathematical equation of the sigmoid activation function?

3. Name any two techniques to train feed-forward neural networks?

- Gradient Descent: An optimization algorithm used to minimize the error function by

4. How does deep learning address the fast-food problem?

- Natural Language Processing (NLP): Used in chatbots, language translation, and

6. How does the information flow in a feed-forward neural network?

7. Difference between overfitting and underfitting.

8. Define batch gradient.

9. How to reduce the cost of convolutional network training?

- Parameter Sharing: Reduces the number of parameters by sharing weights across

11. What is the role of an optimizer in deep learning?

13. How does max pooling differ from average pooling?

14. What is a convolutional network?

17. List the different types of recurrent neural networks.

- Long Short-Term Memory (LSTM): A type of RNN designed to remember long-term

18. Name the function of transducer in the design patterns of RNN.

19. Name the function of acceptor in the design patterns of RNN.

20. Define unfolding in the context of RNNs.

#### Concept of Linear Perceptron:

#### Functions of Linear Perceptron:

4. Linear Decision Boundary:

#### Limitations of Linear Perceptron:

3. No Hidden Layers:

7. No Probabilistic Output:

#### Example from the Book:

In the book "Fundamentals of Deep Learning" by Nikhil Buduma, the perceptron is

#### 1. Sigmoid Neuron:

2. Probabilistic Interpretation: The output can be interpreted as a probability, making it

2. Non-Zero-Centered Outputs: The output of the sigmoid function is always positive,

3. Computationally Expensive: The exponential function in the sigmoid is computationally

#### 2. Tanh Neuron:

1. Zero-Centered Outputs: The Tanh function is zero-centered, which helps in faster

2. Computationally Expensive: The Tanh function involves exponential calculations, which

#### 3. ReLU Neuron:

1. Computationally Efficient: ReLU is computationally cheap because it involves simple

#### Differences Between Sigmoid, Tanh, and ReLU:

#### Advantages and Disadvantages Summary:

#### 1. Gradients in RNNs:

#### 2. Backpropagation Through Time (BPTT):

#### 3. Challenges with Gradients in RNNs:

#### 4. Solutions to Gradient Problems:

3. Advanced RNN Architectures:

#### 5. Example from the Book: