0% found this document useful (0 votes)

46 views15 pages

Reinforcement Learning and Deep Learning

The document discusses key concepts in reinforcement learning, including Markov decision processes, SARSA vs Q-learning, and actor-critic methods like A2C and A3C. It also covers the importance of the Bellman equation, the need for target networks in DQN, and the challenges of POMDPs. Additionally, it explains meta-learning, model-based techniques, and various properties of dynamic programming, alongside practical applications in fields like robotics and healthcare.

Uploaded by

leg3endary0777

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views15 pages

Reinforcement Learning and Deep Learning

Uploaded by

leg3endary0777

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Question 1

1(a) What are the main components of Markov decision process? (2

marks)
 States (S): All possible situations the agent can be in.
 Actions (A): All possible moves/choices the agent can take.
 Transition probabilities (P): Probability of moving from one
state to another after an action.
 Reward function (R): Immediate feedback (positive/negative)
after an action.
 Policy (π): Strategy that defines which action to take in a given
state.

1(b) When to use SARSA over Q-learning? (2 marks)

 SARSA (on-policy): Used when we want the agent to learn
based on the policy it actually follows (including exploration).
 Q-learning (off-policy): Learns optimal policy regardless of
current behavior.
👉 Use SARSA when the environment is risky/unstable and we
want safer learning (since it accounts for exploratory actions).

1(c) What is the difference between A2C and A3C actor critic? (2
marks)
 A2C (Advantage Actor-Critic): Synchronous version where
multiple agents run in parallel and update gradients together.
 A3C (Asynchronous Advantage Actor-Critic): Asynchronous
version where each agent updates independently at different
times.
👉 Difference = synchronous (A2C) vs asynchronous (A3C)
training.

1(d) How are multi-agent systems different from distributed

systems? (2 marks)
 Multi-agent systems: Multiple intelligent agents interact,
cooperate, or compete to achieve goals (focus = decision-
making).
 Distributed systems: Multiple computers share resources and
coordinate tasks (focus = computation + fault tolerance).
👉 Multi-agent = intelligent decisions, Distributed = resource
distribution.

1(e) What is a real world example of reinforcement learning? (2

marks)
 Example: Self-driving cars → Learn to drive by interacting with
environment (traffic, signals, pedestrians) and maximizing
safety & efficiency.
Other examples: Robotics, Game playing (AlphaGo),
Recommendation systems.

Question 2
2(a) Why is the Bellman equation important in reinforcement
learning? How to solve it? (5 marks)
 Importance:
o It provides a recursive way to calculate value of a state.
o Forms the foundation of Dynamic Programming, Q-
learning, and Value Iteration.
 Bellman equation (for value function):
V(s)=R(s)+γ∑s′P(s′∣s,a)V(s′)V(s) = R(s) + \gamma \sum_{s'} P(s'|s,a)
V(s')V(s)=R(s)+γs′∑P(s′∣s,a)V(s′)
 How to solve:
o Iterative methods: Value Iteration, Policy Iteration.
o Approximation: Monte Carlo, Temporal Difference
learning.

2(b) Why do we need target network in DQN? How can we improve

DQN model? (5 marks)
 Need of target network:
o Prevents unstable learning by keeping target Q-values
fixed for some steps instead of updating immediately.
o Reduces oscillations and divergence.
 Improvements to DQN:
o Double DQN (removes overestimation bias).
o Dueling DQN (separates value and advantage functions).
o Prioritized experience replay.
o Using larger neural networks and better optimizers.
o Question 3
o 3(a) What is stochastic policy? What is the formula for
the policy of reinforcement learning? Explain. (5 marks)
o A stochastic policy in reinforcement learning is a type of
decision-making rule where the agent does not always
choose the same action for a given state, but instead
selects an action based on a probability distribution. This
means that for the same state, different actions may be
chosen at different times, which introduces randomness
and allows better exploration of the environment. This is
different from a deterministic policy, where the action is
always fixed for each state.
o The general formula for a stochastic policy is:
o π(a∣s)=P(A=a∣S=s)\pi(a|s) = P(A = a \mid S =
s)π(a∣s)=P(A=a∣S=s)
o This represents the probability of choosing action a when
the agent is in state s.
o Stochastic policies are very important in reinforcement
learning, especially in complex or continuous
environments, because they prevent the agent from
getting stuck in local optima and promote exploration. For
example, in a game-playing agent, using a stochastic
policy ensures that the agent sometimes tries out less
common moves, which may eventually lead to discovering
better strategies. Many modern algorithms such as Policy
Gradient, REINFORCE, and Actor-Critic methods rely on
stochastic policies for efficient learning.
o

o 3(b) What is meta-learning in reinforcement learning?

What are the applications of meta-learning? (5 marks)
o Meta-learning, also called “learning to learn,” is a process
in reinforcement learning where the agent does not only
learn to solve a single task but also learns how to quickly
adapt its knowledge to new tasks with minimal data and
training. Instead of starting from scratch for each task, the
agent develops a general learning strategy that can
transfer across different environments.
o The goal of meta-learning is to build agents that are
flexible and adaptable, much like humans who can apply
previous experience to new problems. For instance, once
a robot learns how to walk on flat ground, it should
quickly adapt to walking on sand, stairs, or rocky terrain
without learning everything again.
o Applications of meta-learning include:
o Robotics: Robots adapting quickly to different terrains,
tasks, or objects.
o Healthcare: Personalized treatment recommendations
based on patient-specific data.
o Few-shot learning: Training models that can classify or act
correctly with very few examples.
o Recommendation systems: Adapting quickly to changing
user preferences.
o Meta-learning is therefore extremely powerful because it
makes reinforcement learning agents more generalizable,
efficient, and closer to human-like adaptability.
o

o Question 4
o 4(a) What are the challenges associated with using a
POMDP? Explain the key components of the POMDP. (5
marks)
o A POMDP (Partially Observable Markov Decision
Process) is an extension of the MDP where the agent does
not have complete knowledge of the state of the
environment. Instead, it receives only partial observations
that provide incomplete information about the true state.
o Challenges associated with POMDP:
o High complexity: Solving POMDPs is computationally
difficult, as the agent must reason about all possible
hidden states.
o Uncertainty handling: Since the agent never knows the
true state, it must maintain a belief (probability
distribution over states), which is mathematically
challenging.
o Memory requirement: The agent must often remember
past actions and observations to make better decisions,
unlike in standard MDPs where the current state is
enough.
o Scalability issues: As the environment grows larger,
maintaining beliefs and making optimal policies becomes
nearly impossible in real time.
o Key components of POMDP:
o States (S): True states of the environment (hidden from
the agent).
o Actions (A): Choices available to the agent.
o Transition function (P): Probability of moving to a new
state given an action.
o Rewards (R): Immediate feedback after actions.
o Observations (O): What the agent can perceive from the
environment.
o Observation function: Probability of receiving an
observation given the hidden state.
o In short, POMDPs model real-world situations better than
MDPs (since we rarely know the full state of the world),
but their solution is much harder and often requires
approximation methods.
o

o 4(b) What are model-based techniques? Is AlphaZero

model based on RL? (5 marks)
o In reinforcement learning, model-based techniques are
methods where the agent builds or uses a model of the
environment to plan and make decisions. The model
usually includes the transition dynamics (probability of
going from one state to another) and the reward function.
By simulating possible future states using the model, the
agent can evaluate different strategies before actually
executing them in the environment.
o This is in contrast to model-free techniques (like Q-
learning or Policy Gradient methods), where the agent
directly learns from trial and error without trying to
predict future states explicitly. Model-based methods are
often more data-efficient but computationally expensive.
o Examples of model-based techniques:
o Dynamic programming
o Monte Carlo Tree Search (MCTS)
o Planning algorithms in robotics
o AlphaZero: Yes, the AlphaZero model is based on
reinforcement learning and it combines model-based
planning with deep learning. It uses:
o Monte Carlo Tree Search (MCTS): A planning algorithm
that simulates future game moves (model-based).
o Neural networks: To approximate value function and
policy.
o Self-play reinforcement learning: The system plays
against itself, improving iteratively without human data.
o This combination makes AlphaZero a hybrid model that
leverages the strengths of both model-based RL
(planning) and deep learning (function approximation). It
has been successfully used to master games like chess,
shogi, and Go at superhuman levels.
o

Q1 (Attempt any five, 5 marks each)

(a) What is Reinforcement Learning (RL)?
Reinforcement Learning is a type of machine learning where an agent
learns by interacting with an environment. The agent takes actions,
receives rewards or penalties, and improves its strategy over time to
maximize total rewards. Unlike supervised learning, RL does not
require labeled data, but instead focuses on trial-and-error learning.
Examples include training robots, game-playing (like AlphaGo), and
self-driving cars.

(b) What do you mean by Metadata?

Metadata means “data about data.” It gives information about other
data, such as how it is created, stored, or used. For example, a photo
file may have metadata such as date taken, camera type, and
resolution. In machine learning and databases, metadata helps in
organizing, retrieving, and understanding data better.

(c) Explain the two required properties of Dynamic Programming.

Dynamic Programming (DP) is used when problems can be broken
into smaller subproblems. The two key properties are:
1. Optimal substructure: The solution to a big problem can be
built from solutions of smaller subproblems.
2. Overlapping subproblems: The same subproblems occur
multiple times, so storing and reusing results saves time.
For example, shortest path problems and Fibonacci calculation
use DP.

(d) List out the requirements for Monte Carlo method.

Monte Carlo methods are techniques that use random sampling to
estimate values. Requirements:
1. Environment model or simulator for running episodes.
2. Many random samples or trials.
3. Reward function to evaluate outcomes.
4. Sufficient episodes to average results for accuracy.
It is often used in reinforcement learning when the
environment’s dynamics are not fully known.

(e) Differentiate between meta-learning and model-agent based

learning used in RL.
 Meta-learning: “Learning to learn.” The agent learns a general
way of learning so it can quickly adapt to new tasks. Example: A
robot trained to walk on flat ground can easily adapt to sand or
stairs.
 Model-agent based learning: The agent builds or uses a model
of the environment (states, actions, transitions) to plan
decisions. Example: Using simulations to decide the next move
in chess.

(f) What is Deep Learning?

Deep Learning is a branch of machine learning that uses artificial
neural networks with many layers to learn complex patterns from
data. It can automatically extract features from raw data like images,
sound, or text, without requiring manual feature engineering.
Applications include speech recognition, image classification, and
natural language processing.

(g) List out the types of Neural Networks.

1. Feedforward Neural Networks (FNN)
2. Convolutional Neural Networks (CNN)
3. Recurrent Neural Networks (RNN)
4. Generative Adversarial Networks (GANs)
5. Autoencoders
6. Radial Basis Function Networks

(h) Define Vector Space Model.

Vector Space Model (VSM) is a way to represent text documents as
vectors of numbers. Each document is expressed as a vector of
terms, and similarity is measured using methods like cosine similarity.
It is widely used in Information Retrieval (e.g., search engines) to find
documents similar to a given query.

UNIT – I (12.5 marks)

Q2. Discuss the features and elements of Reinforcement Learning.
Reinforcement Learning has some key features:
 Trial and error learning – Agent learns by interacting.
 Feedback-driven – Rewards or penalties guide behavior.
 Exploration vs. Exploitation – Agent must balance trying new
actions vs using known best actions.
 Sequential decision making – Actions affect future outcomes.
Elements of RL framework:
1. Agent – learner/decision-maker.
2. Environment – everything the agent interacts with.
3. State (S) – current situation of the agent.
4. Action (A) – choices agent can make.
5. Reward (R) – feedback after action.
6. Policy (π) – strategy followed by the agent.
7. Value function (V) – long-term expected reward from a state.

OR Q3. Illustrate about Markov Decision Process and RL

Framework.
A Markov Decision Process (MDP) is the mathematical framework
behind RL. It consists of:
 States (S), Actions (A), Transition probabilities (P), Reward
function (R), and Policy (π).
The Markov property means that the next state depends only
on the current state and action, not the past history.
The RL framework is built on top of MDP where the agent interacts
with the environment, updates its policy, and learns optimal
behavior. Example: In chess, the state is the board position, action is
a move, and reward is winning/losing.

UNIT – II (12.5 marks)

Q4. State and explain the various policy-based methods used in RL.
Policy-based methods directly learn a parameterized policy (πθ)
without using value functions. Examples:
1. Policy Gradient methods (REINFORCE): Adjusts policy
parameters in the direction that increases expected reward.
2. Actor-Critic methods: Combination of policy-based (actor) and
value-based (critic).
3. Trust Region Policy Optimization (TRPO): Improves policy while
preventing large harmful updates.
4. Proximal Policy Optimization (PPO): Simplified version of TRPO,
widely used in practice.
Advantages: Work well in continuous action spaces, provide
stochastic policies, and are stable for large problems.

OR Q5. Explain with an example the working of model-based RL

approach.
In model-based RL, the agent builds a model of the environment,
which includes transition probabilities and reward functions. The
agent then uses this model to simulate different possible futures and
choose the best actions.
Example: AlphaZero in chess. It uses Monte Carlo Tree Search (MCTS)
as a planning method. The agent simulates many possible moves
ahead (like a human imagining future moves) and chooses the one
with the best expected outcome. This makes model-based RL more
sample-efficient compared to model-free methods.

UNIT – III (12.5 marks)

Q6. Discuss the working principle of deep learning with practical
examples.
Deep learning works on the principle of using multi-layer neural
networks where each layer extracts increasingly complex features
from the input data. The first layers capture simple features (edges in
images), while deeper layers combine them into complex patterns
(faces, objects).
The network is trained using backpropagation, where errors are
propagated backward, and weights are adjusted using optimization
algorithms like gradient descent.
Examples:
 Image recognition (CNN): Identifying cats, dogs, or humans in
photos.
 Speech recognition (RNN, LSTM): Converting spoken language
into text.
 Medical diagnosis: Detecting tumors from X-rays or MRI scans.

OR Q7. Illustrate about the Convolutional Neural Network and its

application in real-time.
A Convolutional Neural Network (CNN) is a type of neural network
specialized for image and spatial data. It uses convolutional layers to
automatically extract features like edges, shapes, and textures.
Pooling layers reduce dimensions, and fully connected layers classify
the output.
Applications in real-time:
 Face recognition (unlocking phones).
 Self-driving cars (object detection like pedestrians, traffic
lights).
 Medical imaging (cancer detection).
 Security systems (CCTV image recognition).

UNIT – IV (12.5 marks)

Q8. Explain with example how deep learning can be utilized in
Natural Language Processing (NLP).
Deep learning has transformed NLP by replacing hand-crafted
features with automatic learning from text data. Models like RNNs,
LSTMs, GRUs, and Transformers can capture sequential dependencies
and context.
Examples:
 Machine translation: Google Translate uses deep learning.
 Chatbots & assistants: Siri, Alexa.
 Sentiment analysis: Detecting emotions in tweets or reviews.
 Text summarization: Generating concise summaries of long
articles.

OR Q9. Draw and explain the deep learning architecture for

Computer Vision.
Computer Vision architectures are mostly based on CNNs.
Architecture steps:
1. Input layer – image pixels.
2. Convolutional layers – detect features (edges, corners).
3. Pooling layers – reduce size and preserve important info.
4. Fully connected layers – combine extracted features.
5. Output layer – classification (e.g., cat, dog, car).
This layered architecture makes CNNs very effective in vision tasks
like face recognition, autonomous driving, and medical image
analysis.

15) EXPLAIN Fitted Q and Deep Q-Learning
No ratings yet
15) EXPLAIN Fitted Q and Deep Q-Learning
17 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
DL Unit 6 QP Solution
No ratings yet
DL Unit 6 QP Solution
15 pages
RL Unit 1
100% (1)
RL Unit 1
26 pages
RL Viva
No ratings yet
RL Viva
30 pages
Unit-8 - Reinforcement Learning
No ratings yet
Unit-8 - Reinforcement Learning
52 pages
Artificial Intelligence: Computer Science & Engineering, Khulna University
No ratings yet
Artificial Intelligence: Computer Science & Engineering, Khulna University
30 pages
Reinforcement Learning Question Bank
No ratings yet
Reinforcement Learning Question Bank
11 pages
Mlunit 5
No ratings yet
Mlunit 5
10 pages
Intro to Reinforcement Learning
No ratings yet
Intro to Reinforcement Learning
15 pages
RL Unit - Iii
No ratings yet
RL Unit - Iii
20 pages
Lect 2
No ratings yet
Lect 2
26 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
RL Unit 2
No ratings yet
RL Unit 2
10 pages
Lecture Week12
No ratings yet
Lecture Week12
37 pages
Lecture 5
No ratings yet
Lecture 5
28 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
28 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
Unit 3
No ratings yet
Unit 3
29 pages
Lecture 30 Reinforcement-Learning
No ratings yet
Lecture 30 Reinforcement-Learning
50 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
IntroductiontoRL BR
No ratings yet
IntroductiontoRL BR
22 pages
3.RL Unit 3
No ratings yet
3.RL Unit 3
31 pages
Unit 5-1
No ratings yet
Unit 5-1
8 pages
Unit 4
No ratings yet
Unit 4
23 pages
Unit No. 05 - Reinforced and Deep Learning
No ratings yet
Unit No. 05 - Reinforced and Deep Learning
44 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Reinforcement Learning Guide
No ratings yet
Reinforcement Learning Guide
64 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Reinforcemnet Learning
No ratings yet
Reinforcemnet Learning
8 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
19 pages
Deep Reinforcement Learning Overview
No ratings yet
Deep Reinforcement Learning Overview
52 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
Mlt-Cia Iii Ans Key
No ratings yet
Mlt-Cia Iii Ans Key
14 pages
Unit 5
No ratings yet
Unit 5
45 pages
Module 1
No ratings yet
Module 1
72 pages
Reinf 2
No ratings yet
Reinf 2
4 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
UNIT V 5.3 ML Reinforcement Learning
No ratings yet
UNIT V 5.3 ML Reinforcement Learning
41 pages
Unit 5 ML 3year
No ratings yet
Unit 5 ML 3year
17 pages
Exam RL 2022 Sample
No ratings yet
Exam RL 2022 Sample
8 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
46 pages
37 RL
No ratings yet
37 RL
18 pages
Unit 3
No ratings yet
Unit 3
12 pages
RL Assignment1
No ratings yet
RL Assignment1
5 pages
Basics of Reinforcement Learning
No ratings yet
Basics of Reinforcement Learning
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
20CM1111
No ratings yet
20CM1111
3 pages
RL & DL Notes
No ratings yet
RL & DL Notes
43 pages
Reinforcement Learning (RL) : Big Data Mining
No ratings yet
Reinforcement Learning (RL) : Big Data Mining
86 pages
Unit 5
No ratings yet
Unit 5
39 pages
6-CSC 405 Sem1 2020-2021 - Intro To Machine Learning
No ratings yet
6-CSC 405 Sem1 2020-2021 - Intro To Machine Learning
39 pages
Abstract
No ratings yet
Abstract
13 pages
Advanced Land Mine Detection System
No ratings yet
Advanced Land Mine Detection System
24 pages
DWDM PPT
No ratings yet
DWDM PPT
13 pages
AI Tool for Automated Log Analysis
No ratings yet
AI Tool for Automated Log Analysis
37 pages
MachineLeanrning With Python
No ratings yet
MachineLeanrning With Python
6 pages
ST Open Source Data Pipelines Oreilly f22568 202003 en PDF
No ratings yet
ST Open Source Data Pipelines Oreilly f22568 202003 en PDF
79 pages
Data Analysis and Preparation Guide
No ratings yet
Data Analysis and Preparation Guide
16 pages
Graduate Student Handbook: Master of Science in Biostatistics
No ratings yet
Graduate Student Handbook: Master of Science in Biostatistics
18 pages
AI in HR
No ratings yet
AI in HR
22 pages
Decision Tree
No ratings yet
Decision Tree
9 pages
Azure AI Certification Guide
No ratings yet
Azure AI Certification Guide
68 pages
Baryannis 2018
No ratings yet
Baryannis 2018
25 pages
SegNeXt: Efficient Convolutional Attention for Segmentation
No ratings yet
SegNeXt: Efficient Convolutional Attention for Segmentation
17 pages
Customer Churn Prediction
100% (1)
Customer Churn Prediction
18 pages
Artificial Intelligence Project
No ratings yet
Artificial Intelligence Project
51 pages
AI for Kids' Metacognitive Growth
No ratings yet
AI for Kids' Metacognitive Growth
38 pages
Machine Learning Blockchain
100% (1)
Machine Learning Blockchain
15 pages
500 MCQs For Competitive Exams & Bank Jobs
No ratings yet
500 MCQs For Competitive Exams & Bank Jobs
35 pages
Week 6 Machine Learning Assignments
No ratings yet
Week 6 Machine Learning Assignments
13 pages
Arabic Sign Language Recognition Review
No ratings yet
Arabic Sign Language Recognition Review
8 pages
Art+064 BJB Out Dez 2024
No ratings yet
Art+064 BJB Out Dez 2024
13 pages
Practical Period Report #VI: Faculty of Engineering Dual Studies
No ratings yet
Practical Period Report #VI: Faculty of Engineering Dual Studies
13 pages
Tailored Chatbots - Custom AI Solutions For Your Unique Business Needs
No ratings yet
Tailored Chatbots - Custom AI Solutions For Your Unique Business Needs
46 pages
Overview of Data Science Methodologies
No ratings yet
Overview of Data Science Methodologies
2 pages
Models For Machine Learning: M. Tim Jones
No ratings yet
Models For Machine Learning: M. Tim Jones
10 pages
Generative AI With Perplexity
No ratings yet
Generative AI With Perplexity
17 pages
Rethinking AI for Health Equity
No ratings yet
Rethinking AI for Health Equity
33 pages
AI-Powered Exercise Posture Detection
No ratings yet
AI-Powered Exercise Posture Detection
30 pages
Image Annotation For Computer Vision Guide
No ratings yet
Image Annotation For Computer Vision Guide
27 pages

Reinforcement Learning and Deep Learning

Uploaded by

Reinforcement Learning and Deep Learning

Uploaded by

Question 1

1(a) What are the main components of Markov decision process? (2

1(b) When to use SARSA over Q-learning? (2 marks)

1(d) How are multi-agent systems different from distributed

1(e) What is a real world example of reinforcement learning? (2

2(b) Why do we need target network in DQN? How can we improve

o 3(b) What is meta-learning in reinforcement learning?

o 4(b) What are model-based techniques? Is AlphaZero

Q1 (Attempt any five, 5 marks each)

(b) What do you mean by Metadata?

(c) Explain the two required properties of Dynamic Programming.

(d) List out the requirements for Monte Carlo method.

(e) Differentiate between meta-learning and model-agent based

(f) What is Deep Learning?

(g) List out the types of Neural Networks.

(h) Define Vector Space Model.

UNIT – I (12.5 marks)

OR Q3. Illustrate about Markov Decision Process and RL

UNIT – II (12.5 marks)

OR Q5. Explain with an example the working of model-based RL

UNIT – III (12.5 marks)

OR Q7. Illustrate about the Convolutional Neural Network and its

UNIT – IV (12.5 marks)

OR Q9. Draw and explain the deep learning architecture for

You might also like