AI-832 Reinforcement Learning
Instructor: Dr. Zuhair Zafar
Lecture # 22: Value Function Approximation
Recap
• Monte Carlo Learning
• Prediction
• Control
• Temporal Difference Learning
• Prediction
• TD (0), n-step TD, lambda return, TD(lambda)
• Control (On-policy Learning)
• SARSA (0), n-step SARSA, SARSA (Lambda), Expected SARSA
• Q Learning (Off-policy Learning)
Today’s Agenda
• Value Function Approximation
• Gradient Descent
• Stochastic Gradient Descent
• Linear Value Function Approximation
Large-Scale Reinforcement Learning
Large-Scale Reinforcement Learning
Value Function Approximation
Types of Value Function Approximation
Tabular Methods vs. Approximation Methods
Monte Carlo / Temporal Difference Learning
Neural Network
Which Function Approximator?
Which Function Approximator?
Today’s Agenda
• Value Function Approximation
• Gradient Descent
• Stochastic Gradient Descent
• Linear Value Function Approximation
Gradient Descent
Value Function Approx. By Stochastic Gradient Descent
Today’s Agenda
• Value Function Approximation
• Gradient Descent
• Stochastic Gradient Descent
• Linear Value Function Approximation
Feature Vectors
Linear Value Function Approximation
Table Lookup Features