0% found this document useful (0 votes)
43 views4 pages

Honors QP

The document is an examination paper for the course CST497 on Reinforcement Learning at APJ Abdul Kalam Technological University. It consists of multiple-choice questions and detailed problems covering topics such as probability, Markov Decision Processes, Monte Carlo methods, and Q-learning. The exam is structured into two parts, with Part A containing short answer questions and Part B requiring detailed responses to selected questions from different modules.

Uploaded by

shinolshinto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views4 pages

Honors QP

The document is an examination paper for the course CST497 on Reinforcement Learning at APJ Abdul Kalam Technological University. It consists of multiple-choice questions and detailed problems covering topics such as probability, Markov Decision Processes, Monte Carlo methods, and Q-learning. The exam is structured into two parts, with Part A containing short answer questions and Part B requiring detailed responses to selected questions from different modules.

Uploaded by

shinolshinto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

lOMoARcPSD|42490715

CST497-QP - CST497-QP

reinforcement learning (APJ Abdul Kalam Technological University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Shinol Shinto (shinolshinto@[Link])
lOMoARcPSD|42490715

Reg No.:_______________ Name:__________________________


1000CST497122203
APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
Seventh Semester [Link] Degree (Honours) Examination December 2023 (2020 Admission)

Course Code: CST497


Course Name: REINFORCEMENT LEARNING
Max. Marks: 100 Duration: 3 Hours

PART A
Answer all questions, each carries 3 marks. Marks

1 Each of two persons A and B toss three fair coins. The probability that both get (3)
the same number of heads is?
2 Two components of a laptop computer have the following joint probability density (3)
function for their useful lifetimes X and Y (in years):

Find the marginal probability density function of X, fX( x ).


3 What is a Markov Decision Process? Give suitable example (3)
4 What is the difference between state value function and state action value (3)
function?
5 List any three advantages of Monte Carlo methods over dynamic programming (3)
techniques?
6 Brief the concept of Monte Carlo Estimation of Action values. (3)
7 Why is Q-learning considering an off-policy control method? (3)

8 Differentiate N-Step bootstrapping and Q- Learning. (3)

9 Explain Feature Construction for Linear Methods (3)


10 Differentiate Stochastic-gradient and Semi-gradient Methods. (3)
PART B
Answer any one full question from each module, each carries 14 marks.
Module I
11 a) Dick and Jane have agreed to meet for lunch between noon (0:00 p.m.) and 1:00 (7)
p.m. Denote Jane’s arrival time by X, Dick’s by Y, and suppose X and Y are
independent with probability density functions

Page 1 of 3

Downloaded by Shinol Shinto (shinolshinto@[Link])


lOMoARcPSD|42490715

1000CST497122203

Find the probability that Jane arrives before Dick. That is, find P( X < Y ).
b) Let A and B are two independent events such that P(A) = 0.2 and P(B) = 0.8. Find (7)
P(A and B), P(A or B), P(B not A), and P(neither A nor B).

Given P(A) = 0.2 and P(B) = 0.8 and events A and B are independent of each
other.

OR
12 a) 10% of the bulbs produced in a factory are of red colour and 2% are red and (5)
defective. If one bulb is picked up at random, determine the probability of its being
defective if it is red.

b) A discrete random variable X has the following probability distribution: (9)

Find the value of C. Also find the mean of the distribution.


Module II
13 a) Limitations and Scope of Reinforcement Learning? (5)

b) Explain the agent- environment interaction in Markov Decision Process with (9)
diagrammatic representation.
OR
14 a) How to justify the involvement of Policies and Value functions in reinforcement (7)
learning algorithm.

b) Reinforcement Learning Algorithm involves estimating value functions. Justify. (7)


Module III
15 a) With respect to the expected SARSA algorithm, is exploration required as it is in (7)
the normal SARSA and Q-learning algorithms? Justify.

b) Suppose you are given a finite set of transition data. Assuming that the Markov (7)
model that can be formed with the given data is the actual MDP from which the

Page 2 of 3

Downloaded by Shinol Shinto (shinolshinto@[Link])


lOMoARcPSD|42490715

1000CST497122203

data is generated, will the value functions calculated by the MC and TD methods
necessarily agree? Justify.
OR
16 a) Brief the concept of Monte Carlo Control. How can we avoid the unlikely (9)
assumption of exploring starts?

b) For a specific MDP, suppose we have a policy that we want to evaluate through (5)
the use of actual experience in the environment alone and using Monte Carlo
methods. We decide to use the first-visit approach along with the technique of
always picking the start state at random from the available set of states. Will this
approach ensure complete evaluation of the action value function corresponding
to the policy?
Module IV
17 a) What is the difference between Monte Carlo simulations and Markov Chain (7)
Monte Carlo (MCMC)?

b) What are the advantages and disadvantages of temporal difference learning and (7)
monte carlo?

OR
18 a) Why do we use the Monte Carlo simulations? Justify your answer. (7)
b) What is the difference between Q-learning and Sarsa? (7)

Module V
19 a) What is the difference between a state value function V(s) and a state-action value (7)
function Q (s,a)?

b) Justify the concept of Monte Carlo methods in reinforcement learning? (7)

OR
20 a) What is an intuitive explanation of tile coding function approximation in (9)
reinforcement learning?

b) Why are function approximators required? (5)

****

Page 3 of 3

Downloaded by Shinol Shinto (shinolshinto@[Link])

You might also like