lOMoARcPSD|42490715
CST497-QP - CST497-QP
reinforcement learning (APJ Abdul Kalam Technological University)
Scan to open on Studocu
Studocu is not sponsored or endorsed by any college or university
Downloaded by Shinol Shinto (shinolshinto@[Link])
lOMoARcPSD|42490715
Reg No.:_______________ Name:__________________________
1000CST497122203
APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
Seventh Semester [Link] Degree (Honours) Examination December 2023 (2020 Admission)
Course Code: CST497
Course Name: REINFORCEMENT LEARNING
Max. Marks: 100 Duration: 3 Hours
PART A
Answer all questions, each carries 3 marks. Marks
1 Each of two persons A and B toss three fair coins. The probability that both get (3)
the same number of heads is?
2 Two components of a laptop computer have the following joint probability density (3)
function for their useful lifetimes X and Y (in years):
Find the marginal probability density function of X, fX( x ).
3 What is a Markov Decision Process? Give suitable example (3)
4 What is the difference between state value function and state action value (3)
function?
5 List any three advantages of Monte Carlo methods over dynamic programming (3)
techniques?
6 Brief the concept of Monte Carlo Estimation of Action values. (3)
7 Why is Q-learning considering an off-policy control method? (3)
8 Differentiate N-Step bootstrapping and Q- Learning. (3)
9 Explain Feature Construction for Linear Methods (3)
10 Differentiate Stochastic-gradient and Semi-gradient Methods. (3)
PART B
Answer any one full question from each module, each carries 14 marks.
Module I
11 a) Dick and Jane have agreed to meet for lunch between noon (0:00 p.m.) and 1:00 (7)
p.m. Denote Jane’s arrival time by X, Dick’s by Y, and suppose X and Y are
independent with probability density functions
Page 1 of 3
Downloaded by Shinol Shinto (shinolshinto@[Link])
lOMoARcPSD|42490715
1000CST497122203
Find the probability that Jane arrives before Dick. That is, find P( X < Y ).
b) Let A and B are two independent events such that P(A) = 0.2 and P(B) = 0.8. Find (7)
P(A and B), P(A or B), P(B not A), and P(neither A nor B).
Given P(A) = 0.2 and P(B) = 0.8 and events A and B are independent of each
other.
OR
12 a) 10% of the bulbs produced in a factory are of red colour and 2% are red and (5)
defective. If one bulb is picked up at random, determine the probability of its being
defective if it is red.
b) A discrete random variable X has the following probability distribution: (9)
Find the value of C. Also find the mean of the distribution.
Module II
13 a) Limitations and Scope of Reinforcement Learning? (5)
b) Explain the agent- environment interaction in Markov Decision Process with (9)
diagrammatic representation.
OR
14 a) How to justify the involvement of Policies and Value functions in reinforcement (7)
learning algorithm.
b) Reinforcement Learning Algorithm involves estimating value functions. Justify. (7)
Module III
15 a) With respect to the expected SARSA algorithm, is exploration required as it is in (7)
the normal SARSA and Q-learning algorithms? Justify.
b) Suppose you are given a finite set of transition data. Assuming that the Markov (7)
model that can be formed with the given data is the actual MDP from which the
Page 2 of 3
Downloaded by Shinol Shinto (shinolshinto@[Link])
lOMoARcPSD|42490715
1000CST497122203
data is generated, will the value functions calculated by the MC and TD methods
necessarily agree? Justify.
OR
16 a) Brief the concept of Monte Carlo Control. How can we avoid the unlikely (9)
assumption of exploring starts?
b) For a specific MDP, suppose we have a policy that we want to evaluate through (5)
the use of actual experience in the environment alone and using Monte Carlo
methods. We decide to use the first-visit approach along with the technique of
always picking the start state at random from the available set of states. Will this
approach ensure complete evaluation of the action value function corresponding
to the policy?
Module IV
17 a) What is the difference between Monte Carlo simulations and Markov Chain (7)
Monte Carlo (MCMC)?
b) What are the advantages and disadvantages of temporal difference learning and (7)
monte carlo?
OR
18 a) Why do we use the Monte Carlo simulations? Justify your answer. (7)
b) What is the difference between Q-learning and Sarsa? (7)
Module V
19 a) What is the difference between a state value function V(s) and a state-action value (7)
function Q (s,a)?
b) Justify the concept of Monte Carlo methods in reinforcement learning? (7)
OR
20 a) What is an intuitive explanation of tile coding function approximation in (9)
reinforcement learning?
b) Why are function approximators required? (5)
****
Page 3 of 3
Downloaded by Shinol Shinto (shinolshinto@[Link])