?
Option 1:
Understand the problem, design a solution
Option 2:
Set it up as a machine learning problem
data
supervised
learning
Deep Reinforcement Learning, Decision Making,
and Control
CS 285
Instructor: Sergey Levine
UC Berkeley
data
reinforcement
learning
What is reinforcement learning?
What is reinforcement learning?
Mathematical formalism for learning-based
decision making
Approach for learning decision making and control
from experience
How is this different from other machine
learning topics?
Standard (supervised)
machine learning:
Reinforcement learning:
• Data is not i.i.d.: previous outputs influence
future inputs!
• Ground truth answer is not known, only know
if we succeeded or failed
Usually assumes: • more generally, we know the reward
• i.i.d. data
• known ground truth outputs in training
decisions (actions)
Actions: muscle contractions Actions: motor current or torque
Observations: sight, smell Observations: camera images
Rewards: food Rewards: task success measure (e.g.,
running speed)
consequences
observations (states)
rewards
Actions: what to purchase
Observations: inventory levels
Rewards: profit
Complex physical tasks…
Rajeswaran, et al. 2018
Unexpected solutions…
Mnih, et al. 2015
In the real world…
Kalashnikov et al. ‘18
In the real world…
Kalashnikov et al. ‘18
Not just games and robots!
Cathy Wu
Why should we care about deep
reinforcement learning?
How do we build intelligent machines?
Intelligent machines must be able to adapt
Deep learning helps us handle unstructured
environments
Reinforcement learning provides a formalism for
behavior
decisions (actions)
Schulman et al. ’14 & ‘15 Mnih et al. ‘13
consequences
observations
rewards
Levine*, Finn*, et al. ‘16
What is deep RL, and why should we care?
standard
features mid-level features classifier
computer
(e.g. HOG) (e.g. DPM) (e.g. SVM)
vision
Felzenszwalb ‘08
end-to-end training
deep
learning
standard
reinforcement
learning
features
? more features
? linear policy
or value func.
action
deep end-to-end training
reinforcement action
learning
What does end-to-end learning mean for
sequential decision making?
perception
Action
(run away)
action
sensorimotor loop
Action
(run away)
Example: robotics
robotic state
modeling & low-level
control observations estimation
prediction
planning
control
controls
pipeline (e.g. vision)
tiny, highly specialized tiny, highly specialized
“visual cortex” “motor cortex”
decisions (actions)
Deep models are what allow reinforcement Actions: muscle contractions
Observations: sight, smell
Actions: motor current or torque
Observations: camera images
Rewards: task success measure (e.g.,
learning algorithms to solve complex problemsRewards: food
running speed)
consequences
endobservations
to end!
rewards
Actions: what to purchase
The reinforcement learning problem is the AI problem! Observations: inventory levels
Rewards: profit
Why should we study this now?
1. Advances in deep learning
2. Advances in reinforcement learning
3. Advances in computational capability
Why should we study this now?
Tesauro, 1995
L.-J. Lin, “Reinforcement learning for robots using neural networks.” 1993
Why should we study this now?
Atari games: Real-world robots: Beating Go champions:
Q-learning: Guided policy search: Supervised learning + policy
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. S. Levine*, C. Finn*, T. Darrell, P. Abbeel. “End-to-end gradients + value functions +
Antonoglou, et al. “Playing Atari with Deep training of deep visuomotor policies”. (2015).
Reinforcement Learning”. (2013).
Monte Carlo tree search:
Q-learning: D. Silver, A. Huang, C. J. Maddison, A. Guez,
Policy gradients: D. Kalashnikov et al. “QT-Opt: Scalable Deep L. Sifre, et al. “Mastering the game of Go
J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Reinforcement Learning for Vision-Based Robotic with deep neural networks and tree
Abbeel. “Trust Region Policy Optimization”. (2015). Manipulation”. (2018). search”. Nature (2016).
V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap,
et al. “Asynchronous methods for deep reinforcement
learning”. (2016).
What other problems do we need to solve to
enable real-world sequential decision making?
Beyond learning from reward
• Basic reinforcement learning deals with maximizing rewards
• This is not the only problem that matters for sequential decision
making!
• We will cover more advanced topics
• Learning reward functions from example (inverse reinforcement learning)
• Transferring knowledge between domains (transfer learning, meta-learning)
• Learning to predict and using prediction to act
Where do rewards come from?
Are there other forms of supervision?
• Learning from demonstrations
• Directly copying observed behavior
• Inferring rewards from observed behavior (inverse reinforcement learning)
• Learning from observing the world
• Learning to predict
• Unsupervised learning
• Learning from other tasks
• Transfer learning
• Meta-learning: learning to learn
Imitation learning
Bojarski et al. 2016
More than imitation: inferring intentions
Warneken & Tomasello
Inverse RL examples
Finn et al. 2016
Prediction
Prediction for real-world control
Ebert et al. 2017
Using tools with
predictive models
Xie et al. 2019
Playing games with predictive models
But sometimes there are issues…
predicted real
Kaiser et al. 2019
How do we build intelligent machines?
How do we build intelligent machines?
• Imagine you have to build an intelligent machine, where do you start?
Learning as the basis of intelligence
• Some things we can all do (e.g. walking)
• Some things we can only learn (e.g. driving a car)
• We can learn a huge variety of things, including very difficult things
• Therefore our learning mechanism(s) are likely powerful enough to do
everything we associate with intelligence
• But it may still be very convenient to “hard-code” a few really important bits
A single algorithm?
• An algorithm for each “module”?
• Or a single flexible algorithm?
Seeing with your tongue
Auditory
Cortex
[BrainPort; Martinez et al; Roe et al.]
adapted from A. Ng
What must that single algorithm do?
• Interpret rich sensory inputs
• Choose complex actions
Why deep reinforcement learning?
• Deep = can process complex sensory input
▪ …and also compute really complex functions
• Reinforcement learning = can choose complex actions
Some evidence in favor of deep learning
Some evidence for reinforcement learning
• Percepts that anticipate reward
become associated with similar
firing patterns as the reward
itself
• Basal ganglia appears to be
related to reward system
• Model-free RL-like adaptation is
often a good fit for experimental
data of animal adaptation
• But not always…
What can deep learning & RL do well now?
• Acquire high degree of proficiency in
domains governed by simple, known
rules
• Learn simple skills with raw sensory
inputs, given enough experience
• Learn from imitating enough human-
provided expert behavior
What has proven challenging so far?
• Humans can learn incredibly quickly
• Deep RL methods are usually slow
• Humans can reuse past knowledge
• Transfer learning in deep RL is an open problem
• Not clear what the reward function should be
• Not clear what the role of prediction should be
Instead of trying to produce a
program to simulate the adult
mind, why not rather try to
produce one which simulates the
child's? If this were then subjected general learning
to an appropriate course of algorithm
observations
education one would obtain the
actions
adult brain.
- Alan Turing
environment