0% found this document useful (0 votes)
30 views3 pages

MDP Ppaer Deepsk

The document outlines an exam on Markov Decision Processes (MDPs) consisting of multiple sections including basic concepts, value functions, policy evaluation, and optimal policies. It contains specific questions requiring mathematical definitions, calculations, and the application of Bellman equations. Each section is assigned a certain number of marks, totaling 100, and includes both theoretical and practical components.

Uploaded by

alijaskani35
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views3 pages

MDP Ppaer Deepsk

The document outlines an exam on Markov Decision Processes (MDPs) consisting of multiple sections including basic concepts, value functions, policy evaluation, and optimal policies. It contains specific questions requiring mathematical definitions, calculations, and the application of Bellman equations. Each section is assigned a certain number of marks, totaling 100, and includes both theoretical and practical components.

Uploaded by

alijaskani35
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Markov Decision Processes Exam (100 Marks)

Time: 2 Hours

Section A: Basic Concepts (20 Marks)


1. (a) State the Markov property mathematically and explain its significance in MDPs.
(5 Marks)
(b) Compute the return Gt for Rt+1 = 5, Rt+2 = −3, Rt+3 = 2, with γ = 0.8. (5
Marks)

2. (a) Define the Bellman equation for a Markov Reward Process (MRP). (5 Marks)
(b) For the Student MRP (Page 11), calculate the immediate reward Rs for state
”Pub”. (5 Marks)

Section B: Value Functions & Bellman Equations (30


Marks)
3. Given the Student MRP (Page 11) with γ = 0.9:

(a) Write the Bellman equation for state ”C3”. (10 Marks)
(b) Solve for v(C3) using matrix inversion. (10 Marks)

3. For the Student Markov Chain (Page 9):

(a) Construct the transition matrix P . (5 Marks)


(b) Calculate the probability of transitioning from ”C1” to ”Sleep” in 3 steps. (5
Marks)

Section C: Policy & Value Iteration (25 Marks)


5. For the Student MDP (Page 25) with policy π(Study|s) = 0.6, π(Facebook|s) = 0.4:

(a) Calculate P π and Rπ for state ”C1”. (10 Marks)


(b) Perform one policy evaluation iteration starting with V (C1) = 0. (15 Marks)

Section D: Optimal Policies & Bellman Optimality (25


Marks)
6. For the Student MDP (Page 38):

1
(a) Write the Bellman optimality equation for state ”C1”. (10 Marks)
(b) Compute v∗ (C1) using value iteration (2 steps, V0 = 0). (15 Marks)

2
Solutions
Section A
1. (a) Markov property:
P[St+1 |St ] = P[St+1 |S1 , . . . , St ]
Significance: History independence simplifies computation.
(b)
Gt = 5 + 0.8(−3) + 0.82 (2) = 5 − 2.4 + 1.28 = 3.88
2. (a) Bellman equation for MRP:
X
v(s) = Rs + γ Pss′ v(s′ )
s′

(b) From Page 11: RPub = +1.

Section B
3. (a) Bellman equation for ”C3”:
v(C3) = −2 + 0.9 (0.6v(Pass) + 0.4v(Pub))
(b) Using v = (I − γP )−1 R:
−2 + 0.9 × 0.4 × 1
v(C3) = = −3.33
1 − 0.9 × 0.6
3. (a) Transition matrix (abbreviated):
 
0.5 0.5 0 · · ·
P = 0.8 0 0.2 · · ·
 
.. ..
. .
(3)
(b) 3-step probability: PC1, Sleep = 0.5 × 0.2 × 1 = 0.1.

Section C
5. (a)
π Study Facebook
PC1 = 0.6PC1, C2 + 0.4PC1, FB
RπC1 = 0.6(−2) + 0.4(−1) = −1.6
(b) Updated value:
V (C1) = −1.6 + 0.9 × 0 = −1.6

Section D
6. (a) Bellman optimality equation:
v∗ (C1) = max {−2 + 0.9v∗ (C2), −1 + 0.9v∗ (FB)}
(b) Value iteration:
V1 (C1) = max{−2, −1} = −1
V2 (C1) = max{−2 + 0.9(−1), −1 + 0.9(0)} = −1.9

You might also like