0% found this document useful (0 votes)

70 views41 pages

Model-Based Reinforcement Learning

Uploaded by

aglaveakshay00

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views41 pages

Model-Based Reinforcement Learning

Uploaded by

aglaveakshay00

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Model-Based Reinforcement Learning

CS 285
Instructor: Sergey Levine
UC Berkeley
Today’s Lecture
1. Basics of model-based RL: learn a model, use model for control
• Why does naïve approach not work?
• The effect of distributional shift in model-based RL
2. Uncertainty in model-based RL
3. Model-based RL with complex observations
4. Next time: policy learning with model-based RL
• Goals:
• Understand how to build model-based RL algorithms
• Understand the important considerations for model-based RL
• Understand the tradeoffs between different model class choices
Why learn the model?
Does it work? Yes!

• Essentially how system identification works in classical robotics

• Some care should be taken to design a good base policy
• Particularly effective if we can hand-engineer a dynamics representation
using our knowledge of physics, and fit just a few parameters
Does it work? No!

go right to get higher!

• Distribution mismatch problem becomes exacerbated as we use more

expressive model classes
Can we do better?
What if we make a mistake?
Can we do better?
every N steps

This will be on HW4!

How to replan?
every N steps

• The more you replan, the less perfect

each individual plan needs to be
• Can use shorter horizons
• Even random sampling can often work
well here!
Uncertainty in Model-Based RL
A performance gap in model-based RL

pure model-based
(about 10 minutes real time) model-free training
(about 10 days…)

Nagabandi, Kahn, Fearing, L. ICRA 2018

Why the performance gap?

…but still have high capacity over here

need to not overfit here…
Why the performance gap?
every N steps

very tempting to go here…

How can uncertainty estimation help?

expected reward under high-variance prediction

is very low, even though mean is the same!
Intuition behind uncertainty-aware RL
every N steps

only take actions for which we think we’ll get high

reward in expectation (w.r.t. uncertain dynamics)

This avoids “exploiting” the model

The model will then adapt and get better

There are a few caveats…

Need to explore to get better

Expected value is not the same as pessimistic value

Expected value is not the same as optimistic value

…but expected value is often a good start

Uncertainty-Aware Neural Net Models
How can we have uncertainty-aware models?
Idea 1: use output entropy

why is this not enough?

Two types of uncertainty:

aleatoric or statistical uncertainty

epistemic or model uncertainty

“the model is certain about the data, but we are not

certain about the model”
what is the variance here?
How can we have uncertainty-aware models?
Idea 2: estimate model uncertainty
“the model is certain about the data, but we are not certain about the model”

the entropy of this tells us

the model uncertainty!
Quick overview of Bayesian neural networks

expected weight uncertainty

about the weight
For more, see:
Blundell et al., Weight Uncertainty in Neural Networks
Gal et al., Concrete Dropout

We’ll learn more about variational inference later!

Bootstrap ensembles
Train multiple models and see if they agree!

How to train?
Main idea: need to generate
“independent” datasets to get
“independent” models
Bootstrap ensembles in deep learning
This basically works

Very crude approximation, because the

number of models is usually small (< 10)

Resampling with replacement is usually

unnecessary, because SGD and random
initialization usually makes the models
sufficiently independent
Planning with Uncertainty, Examples
How to plan with uncertainty

distribution over
deterministic models

Other options: moment matching, more complex posterior estimation

with BNNs, etc.
Example: model-based RL with ensembles

exceeds performance of model-free after 40k steps

(about 10 minutes of real time)

before after
More recent example: PDDM

Deep Dynamics Models for Learning Dexterous Manipulation. Nagabandi et al. 2019
Further readings
• Deisenroth et al. PILCO: A Model-Based and Data-Efficient
Approach to Policy Search.
Recent papers:
• Nagabandi et al. Neural Network Dynamics for Model-
Based Deep Reinforcement Learning with Model-Free
Fine-Tuning.
• Chua et al. Deep Reinforcement Learning in a Handful of
Trials using Probabilistic Dynamics Models.
• Feinberg et al. Model-Based Value Expansion for Efficient
Model-Free Reinforcement Learning.
• Buckman et al. Sample-Efficient Reinforcement Learning
with Stochastic Ensemble Value Expansion.
Model-Based RL with Images
What about complex observations?

What is hard about this?

• High dimensionality
• Redundancy
• Partial observability
high-dimensional low-dimension
but not dynamic but dynamic
State space (latent space) models

observation model

dynamics model

reward model

How to train?
standard (fully observed) model:

latent space model:

Model-based RL with latent space models

“encoder”

+ most accurate
full smoothing posterior
- most complicated

+ simplest
single-step encoder
- least accurate
we’ll talk about this one for now
We will discuss variational inference in more detail
next week!
Model-based RL with latent space models

deterministic encoder

Everything is differentiable, can train with backprop

Model-based RL with latent space models

latent space dynamics image reconstruction reward model

Many practical methods use a stochastic encoder to

model uncertainty
Model-based RL with latent space models
every N steps
Learn directly in observation space

Finn, L. Deep Visual Foresight for Planning Robot

Motion. ICRA 2017.
Ebert, Finn, Lee, L. Self-Supervised Visual Planning
with Temporal Skip Connections. CoRL 2017.
Use predictions to complete tasks

Designated Pixel
Goal Pixel
Task execution

Model-Based Reinforcement Learning Overview
No ratings yet
Model-Based Reinforcement Learning Overview
56 pages
Artificial Intelligence: Lecture 10 - Reinforcement Learning Prof. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 10 - Reinforcement Learning Prof. Shivanjali Khare
45 pages
Unit 4 Short Notes Deep Feedforward Networks Gradient Learning
No ratings yet
Unit 4 Short Notes Deep Feedforward Networks Gradient Learning
27 pages
Origins of Life Questions and Debates
No ratings yet
Origins of Life Questions and Debates
12 pages
Lecture 06
No ratings yet
Lecture 06
22 pages
RL Week - 1
No ratings yet
RL Week - 1
53 pages
Reinforcement Learning With Python
No ratings yet
Reinforcement Learning With Python
24 pages
Summary
No ratings yet
Summary
43 pages
CM20315 01 Intro
No ratings yet
CM20315 01 Intro
62 pages
RL Unit 2
No ratings yet
RL Unit 2
10 pages
RL Presentation2
No ratings yet
RL Presentation2
19 pages
Unit 1
No ratings yet
Unit 1
14 pages
Deep Unsupervised Learning
No ratings yet
Deep Unsupervised Learning
90 pages
Dropout As A Bayesian Approximation
No ratings yet
Dropout As A Bayesian Approximation
10 pages
AI Learning for Tech Enthusiasts
No ratings yet
AI Learning for Tech Enthusiasts
230 pages
Model Ensemble Trpo
No ratings yet
Model Ensemble Trpo
15 pages
Model-Based Reinforcement Learning
No ratings yet
Model-Based Reinforcement Learning
67 pages
Lecture13 Postclass
No ratings yet
Lecture13 Postclass
36 pages
AI Reinforcement Learning Guide
No ratings yet
AI Reinforcement Learning Guide
8 pages
Unit 1b - Fundamentals of Machine Learning
No ratings yet
Unit 1b - Fundamentals of Machine Learning
31 pages
Machine Learning
No ratings yet
Machine Learning
70 pages
DL 2
No ratings yet
DL 2
62 pages
Continuous Deep Q-Learning With Model-Based Acceleration: Shixiang Gu Timothy Lillicrap Ilya Sutskever Sergey Levine
No ratings yet
Continuous Deep Q-Learning With Model-Based Acceleration: Shixiang Gu Timothy Lillicrap Ilya Sutskever Sergey Levine
10 pages
Neural Networks: Introduction To ML
No ratings yet
Neural Networks: Introduction To ML
14 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I
35 pages
Continuous Deep Q-Learning Acceleration
No ratings yet
Continuous Deep Q-Learning Acceleration
13 pages
Inherent Stochasticity
No ratings yet
Inherent Stochasticity
12 pages
AI Chapter 5
No ratings yet
AI Chapter 5
31 pages
Time Series Analysis and Reinforcement Learning
No ratings yet
Time Series Analysis and Reinforcement Learning
38 pages
ICML 2018 RL Highlights
No ratings yet
ICML 2018 RL Highlights
55 pages
Balaji Uncertainty Talk Cifar DLRL
No ratings yet
Balaji Uncertainty Talk Cifar DLRL
65 pages
Intro
No ratings yet
Intro
28 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
Lecture 1-2
No ratings yet
Lecture 1-2
57 pages
Richard Sutto on Reinforcement Learning
No ratings yet
Richard Sutto on Reinforcement Learning
43 pages
Bayesian Deep Reinforcement Learning Via Deep Kernel Learning
No ratings yet
Bayesian Deep Reinforcement Learning Via Deep Kernel Learning
8 pages
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
No ratings yet
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
46 pages
Softmax vs Sigmoid in Neural Networks
No ratings yet
Softmax vs Sigmoid in Neural Networks
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
26 pages
Fundamentals of Deep Learning Course
No ratings yet
Fundamentals of Deep Learning Course
195 pages
Unit 5 ML
No ratings yet
Unit 5 ML
49 pages
Deep Exploration Via Bootstrapped DQN: Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy
No ratings yet
Deep Exploration Via Bootstrapped DQN: Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy
18 pages
Chapter 4
No ratings yet
Chapter 4
34 pages
Week 1 - Artificial Neural Networks - Part I - Justin
No ratings yet
Week 1 - Artificial Neural Networks - Part I - Justin
56 pages
Transformers As Decision Makers Provable In-Context Reinforcement Learning Via Supervised Pretraining
No ratings yet
Transformers As Decision Makers Provable In-Context Reinforcement Learning Via Supervised Pretraining
61 pages
CSD411-Week 3 - Learning Paradigms and Mathematical Foundations
No ratings yet
CSD411-Week 3 - Learning Paradigms and Mathematical Foundations
132 pages
Machine Learning Unit-1.2
No ratings yet
Machine Learning Unit-1.2
23 pages
Introduction To Reinforcement Learning: Instructor: Sergey Levine UC Berkeley
No ratings yet
Introduction To Reinforcement Learning: Instructor: Sergey Levine UC Berkeley
46 pages
Week1 UDL CM20315 01 Intro
No ratings yet
Week1 UDL CM20315 01 Intro
49 pages
Deep Q-Learning
No ratings yet
Deep Q-Learning
14 pages
Trajectory Transformer
No ratings yet
Trajectory Transformer
15 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
57 pages
Deep Learning & Neural Networks: Kevin Duh
No ratings yet
Deep Learning & Neural Networks: Kevin Duh
86 pages
1 Introduction To RL
No ratings yet
1 Introduction To RL
46 pages
2015.08.26.Lecture01Intro 2
No ratings yet
2015.08.26.Lecture01Intro 2
37 pages
01 - Introduction To Deep Learning
No ratings yet
01 - Introduction To Deep Learning
56 pages
Dipesh Mahato's Study Plan for Australia
No ratings yet
Dipesh Mahato's Study Plan for Australia
6 pages
Centraalstaal B V and Ostseestall GMBH
No ratings yet
Centraalstaal B V and Ostseestall GMBH
31 pages
SLJ Super Lift Jack Manual
No ratings yet
SLJ Super Lift Jack Manual
10 pages
Trillium Opens Fast-Fill Public CNG Station in Los Angeles
No ratings yet
Trillium Opens Fast-Fill Public CNG Station in Los Angeles
1 page
DC Motors - Definition, Types, and Application
No ratings yet
DC Motors - Definition, Types, and Application
30 pages
Interdisciplinary Insights for Management
No ratings yet
Interdisciplinary Insights for Management
3 pages
Latihan Soal Soal PSAK 16 Aset Tetap
100% (2)
Latihan Soal Soal PSAK 16 Aset Tetap
1 page
FR 3s PDF
No ratings yet
FR 3s PDF
66 pages
Abandoned Mine As Bat Habitat PDF
No ratings yet
Abandoned Mine As Bat Habitat PDF
3 pages
132kV Power Transformer
No ratings yet
132kV Power Transformer
7 pages
Energies 16 04614
No ratings yet
Energies 16 04614
21 pages
G Online Inspire ReadWriteData DocTemp 20241028185502677567575
No ratings yet
G Online Inspire ReadWriteData DocTemp 20241028185502677567575
6 pages
Mark V DCCC Alarms
No ratings yet
Mark V DCCC Alarms
21 pages
Facilitating Learning Module 17
No ratings yet
Facilitating Learning Module 17
8 pages
Minetruck MT436B: Atlas Copco Underground Trucks
No ratings yet
Minetruck MT436B: Atlas Copco Underground Trucks
6 pages
TEE4430 Lecture 8 Notes Solar PV
No ratings yet
TEE4430 Lecture 8 Notes Solar PV
7 pages
Sherkhan
No ratings yet
Sherkhan
66 pages
05 - WC Fundamentals - How Do We Manage The Borehole - RUS
No ratings yet
05 - WC Fundamentals - How Do We Manage The Borehole - RUS
53 pages
Top 100 Thesis Topics
100% (2)
Top 100 Thesis Topics
8 pages
Amplify Instruction Sheet
No ratings yet
Amplify Instruction Sheet
5 pages
Answer Key ABM2
No ratings yet
Answer Key ABM2
6 pages
Teamcenter Admin
100% (1)
Teamcenter Admin
658 pages
Saudi Industrial Safety Guide
No ratings yet
Saudi Industrial Safety Guide
27 pages
Grant's Resume Scholarship
No ratings yet
Grant's Resume Scholarship
2 pages
Decoding Algorithm - V5 PDF
No ratings yet
Decoding Algorithm - V5 PDF
3 pages
A234 WPC
No ratings yet
A234 WPC
2 pages
Ril Pe Price 01.12.2016
No ratings yet
Ril Pe Price 01.12.2016
29 pages
1984 Tape Recording Guide
100% (1)
1984 Tape Recording Guide
73 pages
SecureData ZProtect 8.4.0 Developer
No ratings yet
SecureData ZProtect 8.4.0 Developer
66 pages
Project Report Template
No ratings yet
Project Report Template
11 pages

Model-Based Reinforcement Learning

Uploaded by

Model-Based Reinforcement Learning

Uploaded by

Model-Based Reinforcement Learning

• Essentially how system identification works in classical robotics

go right to get higher!

• Distribution mismatch problem becomes exacerbated as we use more

This will be on HW4!

• The more you replan, the less perfect

Nagabandi, Kahn, Fearing, L. ICRA 2018

…but still have high capacity over here

very tempting to go here…

expected reward under high-variance prediction

only take actions for which we think we’ll get high

This avoids “exploiting” the model

The model will then adapt and get better

Need to explore to get better

Expected value is not the same as pessimistic value

Expected value is not the same as optimistic value

…but expected value is often a good start

why is this not enough?

Two types of uncertainty:

epistemic or model uncertainty

“the model is certain about the data, but we are not

the entropy of this tells us

expected weight uncertainty

We’ll learn more about variational inference later!

Very crude approximation, because the

Resampling with replacement is usually

Other options: moment matching, more complex posterior estimation

exceeds performance of model-free after 40k steps

What is hard about this?

latent space model:

Everything is differentiable, can train with backprop

latent space dynamics image reconstruction reward model

Many practical methods use a stochastic encoder to

Finn, L. Deep Visual Foresight for Planning Robot

You might also like