0% found this document useful (0 votes)
25 views33 pages

Lecture 02 - Markov Decision Process

Uploaded by

junaiddbz01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views33 pages

Lecture 02 - Markov Decision Process

Uploaded by

junaiddbz01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Reinforcement Learning

Lecture 02: Markov Decision Process

Slides credit: David Silver


Agenda
• Markov Processes
• Markov Reward Processes
• Markov Decision Processes
Action taking in reinforcement learning
• Making a choice out of presented options Out of agent’s control!
• Discrete actions
• Move left or right in Atari Breakout game
• Recommend an item to a target
Reinforcement useris learning what to do — how
learning
to map situations to actions — so as to maximize a
numerical reward signal.
- Sutton & Barto, 2018

3
Reward in reinforcement learning
• A scalar feedback signal about the taken action
• Suggest good/bad immediate consequence of the action
• Score in Atari game
• User clicks/purchase in a recommender system
• Delayed feedback
• GO game
• Generate a sentence in chat-bot
• Goal of learning – maximize cumulative rewards
• Reward hypothesis: “All goals can be described by the maximization of expected
cumulative reward.”

4
How to take an action
• With respect to the current observation

Observation 𝑜𝑡 Action 𝑎𝑡

Reward 𝑟𝑡
5
Introduction to MDPs
• A Markov Decision Process (MDP) is a mathematical framework used
in Reinforcement Learning (RL) to model decision-making problems
where an agent interacts with an environment to maximize rewards.
• Almost all RL problems can be formalized as MDPs
Markov
Property
State
Transition
Probability
Markov
Process
Student
Markov
Chain
Student
Markov
Chain
Student
Markov State
Transition
Markov
Reward
Process
Student MRP
Return
Why
Discount?
Value
Function
Student MRP
Return
State Value
Function for
Student MRP
(1)
State Value
Function for
Student MRP
(2)
State Value
Function for
Student MRP
(3)
Bellman
Equation for
MRPs
Bellman
Equation for
MRPs (2)
Example:
Bellman
Equation for
Student MRP
Markov
Decision
Process
Example:
Student
MDP
Policy
Value
Function
State Value
Function for
Student
MPD
Bellman
Expectation
Equation
Bellman
Expectation
Equation for
𝑉𝜋
Bellman
Expectation
Equation for
𝑄𝜋
Example:
Bellman
Expectation
Equation in
Student MDP

You might also like