Markov Decision Processes Exam (100 Marks)
Time: 2 Hours
Section A: Basic Concepts (20 Marks)
1. (a) State the Markov property mathematically and explain its significance in MDPs.
(5 Marks)
(b) Compute the return Gt for Rt+1 = 5, Rt+2 = −3, Rt+3 = 2, with γ = 0.8. (5
Marks)
2. (a) Define the Bellman equation for a Markov Reward Process (MRP). (5 Marks)
(b) For the Student MRP (Page 11), calculate the immediate reward Rs for state
”Pub”. (5 Marks)
Section B: Value Functions & Bellman Equations (30
Marks)
3. Given the Student MRP (Page 11) with γ = 0.9:
(a) Write the Bellman equation for state ”C3”. (10 Marks)
(b) Solve for v(C3) using matrix inversion. (10 Marks)
3. For the Student Markov Chain (Page 9):
(a) Construct the transition matrix P . (5 Marks)
(b) Calculate the probability of transitioning from ”C1” to ”Sleep” in 3 steps. (5
Marks)
Section C: Policy & Value Iteration (25 Marks)
5. For the Student MDP (Page 25) with policy π(Study|s) = 0.6, π(Facebook|s) = 0.4:
(a) Calculate P π and Rπ for state ”C1”. (10 Marks)
(b) Perform one policy evaluation iteration starting with V (C1) = 0. (15 Marks)
Section D: Optimal Policies & Bellman Optimality (25
Marks)
6. For the Student MDP (Page 38):
1
(a) Write the Bellman optimality equation for state ”C1”. (10 Marks)
(b) Compute v∗ (C1) using value iteration (2 steps, V0 = 0). (15 Marks)
2
Solutions
Section A
1. (a) Markov property:
P[St+1 |St ] = P[St+1 |S1 , . . . , St ]
Significance: History independence simplifies computation.
(b)
Gt = 5 + 0.8(−3) + 0.82 (2) = 5 − 2.4 + 1.28 = 3.88
2. (a) Bellman equation for MRP:
X
v(s) = Rs + γ Pss′ v(s′ )
s′
(b) From Page 11: RPub = +1.
Section B
3. (a) Bellman equation for ”C3”:
v(C3) = −2 + 0.9 (0.6v(Pass) + 0.4v(Pub))
(b) Using v = (I − γP )−1 R:
−2 + 0.9 × 0.4 × 1
v(C3) = = −3.33
1 − 0.9 × 0.6
3. (a) Transition matrix (abbreviated):
0.5 0.5 0 · · ·
P = 0.8 0 0.2 · · ·
.. ..
. .
(3)
(b) 3-step probability: PC1, Sleep = 0.5 × 0.2 × 1 = 0.1.
Section C
5. (a)
π Study Facebook
PC1 = 0.6PC1, C2 + 0.4PC1, FB
RπC1 = 0.6(−2) + 0.4(−1) = −1.6
(b) Updated value:
V (C1) = −1.6 + 0.9 × 0 = −1.6
Section D
6. (a) Bellman optimality equation:
v∗ (C1) = max {−2 + 0.9v∗ (C2), −1 + 0.9v∗ (FB)}
(b) Value iteration:
V1 (C1) = max{−2, −1} = −1
V2 (C1) = max{−2 + 0.9(−1), −1 + 0.9(0)} = −1.9