0% found this document useful (0 votes)

32 views21 pages

Chapter 6 Approximate Dynamic Programming

The document discusses the practical difficulties of dynamic programming (DP), including the curse of dimensionality and modeling, as well as real-time solution constraints. It explores cost-to-go function approximation methods, certainty equivalent control, and various implementation strategies for DP, including rollout algorithms and parametric cost-to-go approximations. Additionally, it provides examples such as computer chess and the quiz problem to illustrate the concepts and their applications in optimization scenarios.

Uploaded by

suhezero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views21 pages

Chapter 6 Approximate Dynamic Programming

Uploaded by

suhezero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

PRACTICAL DIFFICULTIES OF DP

• The curse of dimensionality

− Exponential growth of the computational and
storage requirements as the number of state
variables and control variables increases
− Quick explosion of the number of states in
combinatorial problems
− Intractability of imperfect state information
problems
• The curse of modeling
− Mathematical models
− Computer/simulation models
• There may be real-time solution constraints
− A family of problems may be addressed. The
data of the problem to be solved is given with
little advance notice
− The problem data may change as the system
is controlled – need for on-line replanning
COST-TO-GO FUNCTION APPROXIMATION

• Use a policy computed from the DP equation

where the optimal cost-to-go function Jk+1 is
replaced
by an approximation J˜k+1 . (Sometimes
E gk is also replaced by an approximation.)
• Apply µk (xk ), which attains the minimum in
n o
min E gk (xk , uk , wk )+J˜k+1 fk (xk , uk , wk )
uk ∈Uk (xk )

• There are several ways to compute J˜k+1 :

− Off-line approximation: The entire function
J˜k+1 is computed for every k, before the con-
trol process begins.
− On-line approximation: Only the values J˜k+1 (xk+1 )
at the relevant next states xk+1 are com-
puted and used to compute uk just after the
current state xk becomes known.
− Simulation-based methods: These are off-
line and on-line methods that share the com-
mon characteristic that they are based on
Monte-Carlo simulation. Some of these meth-
ods are suitable for problems of very large
size.
CERTAINTY EQUIVALENT CONTROL (CEC)

• Idea: Replace the stochastic problem with a

deterministic problem
• At each time k, the future uncertain quantities
are fixed at some “typical” values
• On-line implementation for a perfect state info
problem. At each time k:
(1) Fix the wi , i ≥ k, at some wi . Solve the
deterministic problem:
N
X −1

minimize gN (xN ) + gi xi , ui , wi
i=k

where xk is known, and

ui ∈ Ui , xi+1 = fi xi , ui , wi .
(2) Use as control the first element in the opti-
mal control sequence found.
• So we apply µ̄k (xk ) that minimizes
˜

gk xk , uk , wk + Jk+1 fk (xk , uk , wk )
where J˜k+1 is the optimal cost of the correspond-
ing deterministic problem.
ALTERNATIVE OFF-LINE IMPLEMENTATION
d
• Let µ0 (x0 ), . . . , µdN −1 (xN −1 ) be an optimal
controller obtained from the DP algorithm for the
deterministic problem

N −1
X
minimize gN (xN ) + gk xk , µk (xk ), wk
k=0

subject to xk+1 = fk xk , µk (xk ), wk , µk (xk ) ∈ Uk

• The CEC applies at time k the control input

µdk (xk ).
• In an imperfect info version, xk is replaced by
an estimate xk (Ik ).

wk vk

d
u k = mk (x k) System xk Measurement zk
xk + 1 = fk(xk ,u k ,wk) zk = hk(xk ,u k - 1,vk)
uk -1

Delay

uk -1
Actuator x k (Ik ) zk
Estimator
mdk
PARTIALLY STOCHASTIC CEC

• Instead of fixing all future disturbances to their

typical values, fix only some, and treat the rest as
stochastic.
• Important special case: Treat an imperfect
state information problem as one of perfect state
information, using an estimate xk (Ik ) of xk as if
it were exact.
• Multiaccess communication example:black Con-
sider controlling the slotted Aloha system (Ex-
ample 5.1.1 in the text) by optimally choosing
the probability of transmission of waiting pack-
ets. This is a hard problem of imperfect state
info, whose perfect state info version is easy.
• Natural partially stochastic CEC:

1
µ̃k (Ik ) = min 1, ,
xk (Ik )

where xk (Ik ) is an estimate of the current packet

backlog based on the entire past channel history
of successes, idles, and collisions (which is Ik ).
GENERAL COST-TO-GO APPROXIMATION

• One-step lookahead (1SL) policy: At each k

and state xk , use the control µk (xk ) that

˜

min E gk (xk , uk , wk )+Jk+1 fk (xk , uk , wk ) ,
uk ∈Uk (xk )

where
− J˜N = gN .
− J˜k+1 : approximation to true cost-to-go Jk+1
• Two-step lookahead policy: At each k and
xk , use the control µ̃k (xk ) attaining the minimum
above, where the function J˜k+1 is obtained using
a 1SL approximation (solve a 2-step DP problem).
• If J˜k+1 is readily available and the minimiza-
tion above is not too hard, the 1SL policy is im-
plementable on-line.
• Sometimes one also replaces Uk (xk ) above with
a subset of “most promising controls” U k (xk ).
• As the length of lookahead increases, the re-
quired computation quickly explodes.
PERFORMANCE BOUNDS FOR 1SL

• Let J k (xk ) be the cost-to-go from (xk , k) of the

1SL policy, based on functions J˜k .
• Assume that for all (xk , k), we have

Jˆk (xk ) ≤ J˜k (xk ), (*)

where JˆN = gN and for all k,

Jˆk (xk ) =

min E gk (xk , uk , wk )
uk ∈Uk (xk )

+ J˜k+1 fk (xk , uk , wk )

,
[so Jˆk (xk ) is computed along with µk (xk )]. Then

J k (xk ) ≤ Jˆk (xk ), for all (xk , k).

• Important application: When J˜k is the cost-to-
go of some heuristic policy (then the 1SL policy is
called the rollout policy).
• The bound can be extended to the case where
there is a δk in the RHS of (*). Then
J k (xk ) ≤ J˜k (xk ) + δk + · · · + δN −1
COMPUTATIONAL ASPECTS

• Sometimes nonlinear programming can be used

to calculate the 1SL or the multistep version [par-
ticularly when Uk (xk ) is not a discrete set]. Con-
nection with stochastic programming methods
(see text).
• The choice of the approximating functions J˜k
is critical, and is calculated in a variety of ways.
• Some approaches:
(a) Problem Approximation: Approximate the
optimal cost-to-go with some cost derived
from a related but simpler problem
(b) Parametric Cost-to-Go Approximation: Ap-
proximate the optimal cost-to-go with a func-
tion of a suitable parametric form, whose pa-
rameters are tuned by some heuristic or sys-
tematic scheme (Neuro-Dynamic Program-
ming)
(c) Rollout Approach: Approximate the opti-
mal cost-to-go with the cost of some subop-
timal policy, which is calculated either ana-
lytically or by simulation
PROBLEM APPROXIMATION

• Many (problem-dependent) possibilities

− Replace uncertain quantities by nominal val-
ues, or simplify the calculation of expected
values by limited simulation
− Simplify difficult constraints or dynamics
• Example of enforced decomposition: Route m
vehicles that move over a graph. Each node has a
“value.” The first vehicle that passes through the
node collects its value. Max the total collected
value, subject to initial and final time constraints
(plus time windows and other constraints).
• Usually the 1-vehicle version of the problem is
much simpler. This motivates an approximation
obtained by solving single vehicle problems.
• 1SL scheme: At time k and state xk (position
of vehicles and “collected value nodes”), consider
all possible kth moves by the vehicles, and at the
resulting states we approximate the optimal value-
to-go with the value collected by optimizing the
vehicle routes one-at-a-time
PARAMETRIC COST-TO-GO APPROXIMATION

• Use a cost-to-go approximation from a paramet-

˜ r) where x is the current state and
ric class J(x,
r = (r1 , . . . , rm ) is a vector of “tunable” scalars
(weights).
• By adjusting the weights, one can change the
“shape” of the approximation J˜ so that it is rea-
sonably close to the true optimal cost-to-go func-
tion.
• Two key issues:
˜ r) (the
− The choice of parametric class J(x,
approximation architecture).
− Method for tuning the weights (“training”
the architecture).
• Successful application strongly depends on how
these issues are handled, and on insight about the
problem.
• Sometimes a simulation-based algorithm is used,
particularly when there is no mathematical model
of the system.
• We will look in detail at these issues after a few
lectures.
APPROXIMATION ARCHITECTURES

• Divided in linear and nonlinear [i.e., linear or

˜ r) on r].
nonlinear dependence of J(x,
• Linear architectures are easier to train, but non-
linear ones (e.g., neural networks) are richer.
• Architectures based on feature extraction
Feature Cost Approximation
State x Vector y J (y,r )
Feature Extraction Cost Approximator w/
Mapping Parameter Vector r

• Ideally, the features will encode much of the

nonlinearity that is inherent in the cost-to-go ap-
proximated, and the approximation may be quite
accurate without a complicated architecture.
• Sometimes the state space is partitioned, and
“local” features are introduced for each subset of
the partition (they are 0 outside the subset).
• With a well-chosen feature vector y(x), we can
use a linear architecture
X
˜ r) = Jˆ y(x), r =
J(x, ri yi (x)
i
AN EXAMPLE - COMPUTER CHESS

• Programs use a feature-based position evaluator

that assigns a score to each move/position

Features:
Material balance,
Mobility,
Safety, etc Score
Feature Weighting
Extraction of Features

Position Evaluator

• Many context-dependent special features.

• Most often the weighting of features is linear
but multistep lookahead is involved.
• Most often the training is done by trial and
error.
ROLLOUT ALGORITHMS

• One-step lookahead policy: At each k and

state xk , use the control µk (xk ) that

˜

min E gk (xk , uk , wk )+Jk+1 fk (xk , uk , wk ) ,
uk ∈Uk (xk )

where
− J˜N = gN .
− J˜k+1 : approximation to true cost-to-go Jk+1
• Rollout algorithm: When J˜k is the cost-to-go
of some heuristic policy (called the base policy)
• Cost improvement property (to be shown): The
rollout algorithm achieves no worse (and usually
much better) cost than the base heuristic starting
from the same state.
• Main difficulty: Calculating J˜k (xk ) may be com-
putationally intensive if the cost-to-go of the base
policy cannot be analytically calculated.
− May involve Monte Carlo simulation if the
problem is stochastic.
− Things improve in the deterministic case.
EXAMPLE: THE QUIZ PROBLEM

• A person is given N questions; answering cor-

rectly question i has probability pi , reward vi .
Quiz terminates at the first incorrect answer.
• Problem: Choose the ordering of questions so
as to maximize the total expected reward.
• Assuming no other constraints, it is optimal to
use the index policy: Answer questions in decreas-
ing order of pi vi /(1 − pi ).
• With minor changes in the problem, the index
policy need not be optimal. Examples:
− A limit (< N ) on the maximum number of
questions that can be answered.
− Time windows, sequence-dependent rewards,
precedence constraints.
• Rollout with the index policy as base policy:
Convenient because at a given state (subset of
questions already answered), the index policy and
its expected reward can be easily calculated.
• Very effective for solving the quiz problem and
important generalizations in scheduling (see Bert-
sekas and Castanon, J. of Heuristics, Vol. 5, 1999).
COST IMPROVEMENT PROPERTY

• Let

J k (xk ): Cost-to-go of the rollout policy

Hk (xk ): Cost-to-go of the base policy

• We claim that J k (xk ) ≤ Hk (xk ) for all xk , k

• Proof by induction: We have J N (xN ) = HN (xN )
for all xN . Assume that

J k+1 (xk+1 ) ≤ Hk+1 (xk+1 ), ∀ xk+1 .

Then, for all xk

J k (xk ) = E gk xk , µk (xk ), wk + J k+1 fk xk , µk (xk ), wk

≤ E gk xk , µk (xk ), wk + Hk+1 fk xk , µk (xk ), wk

≤ E gk xk , µk (xk ), wk + Hk+1 fk xk , µk (xk ), wk

= Hk (xk )

− Induction hypothesis ==> 1st inequality

− Min selection of µk (xk ) ==> 2nd inequality
− Definition of Hk , µk ==> last equality
DISCRETE DETERMINISTIC PROBLEMS

• Any discrete optimization problem (with finite

number of choices/feasible solutions) can be repre-
sented sequentially by breaking down the decision
process into stages.
• A tree/shortest path representation. The leaves
of the tree correspond to the feasible solutions.
• Decisions can be made in stages.
− May complete partial solutions, one stage at
a time.
− May apply rollout with any heuristic that
can complete a partial solution.
− No costly stochastic simulation needed.
• Example: Traveling salesman problem. Find a
minimum cost tour that goes exactly once through
each of N cities.
A Origin Node s

AB AC AD

ABC ABD ACB ACD ADB ADC

ABCD ABDC ACBD ACDB ADBC ADCB

EXAMPLE: THE BREAKTHROUGH PROBLEM

root

• Given a binary tree with N stages.

• Each arc is either free or is blocked (crossed out
in the figure).
• Problem: Find a free path from the root to the
leaves (such as the one shown with thick lines).
• Base heuristic (greedy): Follow the right branch
if free; else follow the left branch if free.
• This is a rare rollout instance that admits a
detailed analysis.
• For large N and given prob. of free branch:
the rollout algorithm requires O(N ) times more
computation, but has O(N ) times larger prob. of
finding a free path than the greedy algorithm.
DET. EXAMPLE: ONE-DIMENSIONAL WALK

• A person takes either a unit step to the left or

a unit step to the right. Minimize the cost g(i) of
the point i where he will end up after N steps.
(0,0)

_
(N,-N) (N,0) i (N,N)
g(i)

_
-N 0 i N-2 N i

• Base heuristic: Always go to the right. Rollout

finds the rightmost local minimum.
• Base heuristic: Compare always go to the right
and always go the left. Choose the best of the two.
Rollout finds a global minimum.
DET. EXAMPLE: ONE-DIMENSIONAL WALK

• A person takes either a unit step to the left or

a unit step to the right. Minimize the cost g(i) of
the point i where he will end up after N steps.
(0,0)

_
(N,-N) (N,0) i (N,N)
g(i)

_
-N 0 i N-2 N i

• Base heuristic: Always go to the right. Rollout

finds the rightmost local minimum.
• Base heuristic: Compare always go to the right
and always go the left. Choose the best of the two.
Rollout finds a global minimum.
A ROLLOUT ISSUE FOR DISCRETE PROBLEMS

• The base heuristic need not constitute a policy

in the DP sense.
• Reason: Depending on its starting point, the
base heuristic may not apply the same control at
the same state.
• As a result the cost improvement property may
be lost (except if the base heuristic has a property
called sequential consistency; see the text for a
formal definition).
• The cost improvement property is restored in
two ways:
− The base heuristic has a property called se-
quential improvement (see the text for a for-
mal definition).
− A variant of the rollout algorithm, called for-
tified rollout, is used, which enforces cost
improvement. Roughly speaking the “best”
solution found so far is maintained, and it
is followed whenever at any time the stan-
dard version of the algorithm tries to follow
a “worse” solution (see the text).
ROLLING HORIZON WITH ROLLOUT

• We can use a rolling horizon approximation in

calculating the cost-to-go of the base heuristic.
• Because the heuristic is suboptimal, the ratio-
nale for a long rolling horizon becomes weaker.
• Example: N -stage stopping problem where the
stopping cost is 0, the continuation cost is either
−ǫ or 1, where 0 < ǫ << 1, and the first state
with continuation cost equal to 1 is state m. Then
the optimal policy is to stop at state m, and the
optimal cost is −mǫ.
- e - e ... 1 ...
0 1 2 m N

Stopped State

• Consider the heuristic that continues at every

state, and the rollout policy that is based on this
heuristic, with a rolling horizon of l ≤ m steps.
• It will continue up to the first m − l + 1 stages,
thus compiling a cost of −(m−l +1)ǫ. The rollout
performance improves as l becomes shorter!
• Limited vision may work to our advantage!

MIT6 231F11 Notes Short
No ratings yet
MIT6 231F11 Notes Short
125 pages
MIT6 231F15 Complete Slide
No ratings yet
MIT6 231F15 Complete Slide
166 pages
RL Monograph1
No ratings yet
RL Monograph1
42 pages
MIT6 231F15 Notes PDF
No ratings yet
MIT6 231F15 Notes PDF
303 pages
Chapter 1 PDF
No ratings yet
Chapter 1 PDF
45 pages
RL Monograph1
No ratings yet
RL Monograph1
48 pages
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
No ratings yet
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
22 pages
Unified Stochastic Optimization Framework
No ratings yet
Unified Stochastic Optimization Framework
69 pages
Dynamic Programming and Optimal Control
No ratings yet
Dynamic Programming and Optimal Control
199 pages
Dynamic Programming and Optimal Control 3rd Edition, Volume II
No ratings yet
Dynamic Programming and Optimal Control 3rd Edition, Volume II
233 pages
Dynamic Programming Lecture Slides
No ratings yet
Dynamic Programming Lecture Slides
276 pages
Dynamic Programing and Optimal Control PDF
No ratings yet
Dynamic Programing and Optimal Control PDF
276 pages
MIT Dynamic Programming Lecture Slides
No ratings yet
MIT Dynamic Programming Lecture Slides
261 pages
DP Slides
No ratings yet
DP Slides
263 pages
ACC23 Tutorial Paulson
No ratings yet
ACC23 Tutorial Paulson
12 pages
Chapter 1 Introduction and Basic Dynamic Programming Algorithm
No ratings yet
Chapter 1 Introduction and Basic Dynamic Programming Algorithm
16 pages
Audio To Text Embedding
No ratings yet
Audio To Text Embedding
144 pages
NDP PDF
No ratings yet
NDP PDF
5 pages
Dynamic Programming and Optimal Control Script
No ratings yet
Dynamic Programming and Optimal Control Script
58 pages
Adprl Chapter Icis
No ratings yet
Adprl Chapter Icis
43 pages
Reinforcement Learning for Control Systems
100% (1)
Reinforcement Learning for Control Systems
111 pages
Reinforcement Learning - A Comprehensive Overview
No ratings yet
Reinforcement Learning - A Comprehensive Overview
177 pages
Reinforcement Learning I:: The Setting and Classical Stochastic Dynamic Programming Algorithms
No ratings yet
Reinforcement Learning I:: The Setting and Classical Stochastic Dynamic Programming Algorithms
42 pages
Reinforcement Learning Control Techniques
No ratings yet
Reinforcement Learning Control Techniques
268 pages
SSRN 4963741
No ratings yet
SSRN 4963741
26 pages
Reinforcement Learning Basics and Algorithms
No ratings yet
Reinforcement Learning Basics and Algorithms
42 pages
Dynamic Programming in Control Theory
No ratings yet
Dynamic Programming in Control Theory
26 pages
Reinforcement Learning Textbook Draft
No ratings yet
Reinforcement Learning Textbook Draft
11 pages
Notes Summary
No ratings yet
Notes Summary
65 pages
16.323 Principles of Optimal Control: Mit Opencourseware
No ratings yet
16.323 Principles of Optimal Control: Mit Opencourseware
27 pages
ACODS 2014 GAndrade
No ratings yet
ACODS 2014 GAndrade
7 pages
5SC28 L7 Machine Learning
No ratings yet
5SC28 L7 Machine Learning
61 pages
Tac 232
No ratings yet
Tac 232
7 pages
Abstract Dynamic Programming Overview
No ratings yet
Abstract Dynamic Programming Overview
257 pages
Powell-Tutorial-ComputationalStochasticOptimization Informs Nov152014
No ratings yet
Powell-Tutorial-ComputationalStochasticOptimization Informs Nov152014
142 pages
An Introduction To Policy Search Methods: Thomas Furmston
No ratings yet
An Introduction To Policy Search Methods: Thomas Furmston
33 pages
Chapter 2 Deterministic Finite-State Systems and Shortest Path Problems
No ratings yet
Chapter 2 Deterministic Finite-State Systems and Shortest Path Problems
12 pages
Chapter 5 Problems With Imperfect State Information
No ratings yet
Chapter 5 Problems With Imperfect State Information
22 pages
Overview of Neuro-Dynamic Programming
No ratings yet
Overview of Neuro-Dynamic Programming
9 pages
IJCAS v2 n3 pp263-278
No ratings yet
IJCAS v2 n3 pp263-278
16 pages
Lowering The Upper Bounds On The Cost of Robust Distributed Controllers Beyond Quadratic Invariance
No ratings yet
Lowering The Upper Bounds On The Cost of Robust Distributed Controllers Beyond Quadratic Invariance
18 pages
AAscript
No ratings yet
AAscript
158 pages
10 - Reinforcement Learning
No ratings yet
10 - Reinforcement Learning
24 pages
Book All-In-One 2
No ratings yet
Book All-In-One 2
281 pages
Deep Reinforcement Learning: Lecture Notes
No ratings yet
Deep Reinforcement Learning: Lecture Notes
60 pages
1 - Table of Contents
No ratings yet
1 - Table of Contents
6 pages
Book All in One
No ratings yet
Book All in One
288 pages
RL Chap 4
No ratings yet
RL Chap 4
7 pages
Optimal Control Theory
No ratings yet
Optimal Control Theory
28 pages
Dynamic Programming & Control
No ratings yet
Dynamic Programming & Control
62 pages
Optimal Control Theory
No ratings yet
Optimal Control Theory
28 pages
Rl-Unit Iv Qa
No ratings yet
Rl-Unit Iv Qa
16 pages
4 Reinforcement Learning - Basic Algorithms: - S, A) ) and The Immediate Reward Function R (R (S, A, S
No ratings yet
4 Reinforcement Learning - Basic Algorithms: - S, A) ) and The Immediate Reward Function R (R (S, A, S
16 pages
RL Notes
No ratings yet
RL Notes
69 pages
Lokketangen and Woodruff 1996 - Journal of Heuristics - Progressive Hedging and Tabu Search
No ratings yet
Lokketangen and Woodruff 1996 - Journal of Heuristics - Progressive Hedging and Tabu Search
18 pages
Dynamic Programming Online Teaching FOR PRINT
No ratings yet
Dynamic Programming Online Teaching FOR PRINT
44 pages
Compal Electronics Engineering Drawings
No ratings yet
Compal Electronics Engineering Drawings
53 pages
Bakery Management System
No ratings yet
Bakery Management System
80 pages
11.1.4 Packet Tracer - Implement Basic Connectivity
No ratings yet
11.1.4 Packet Tracer - Implement Basic Connectivity
3 pages
Digital Marketing Resources Guide
No ratings yet
Digital Marketing Resources Guide
3 pages
Matrix Multiplication with Threads
No ratings yet
Matrix Multiplication with Threads
5 pages
PAN Card Application & Correction Form
No ratings yet
PAN Card Application & Correction Form
1 page
Digital Tools for Teaching and Learning
No ratings yet
Digital Tools for Teaching and Learning
52 pages
Sinamics Drivesim Basic Doc v105 en
100% (1)
Sinamics Drivesim Basic Doc v105 en
136 pages
Modbus
No ratings yet
Modbus
34 pages
DCP T300 T500+DCP T700 Corto
No ratings yet
DCP T300 T500+DCP T700 Corto
4 pages
Practical Questions With Answer
No ratings yet
Practical Questions With Answer
3 pages
Square: Investment Opportunity Overview
No ratings yet
Square: Investment Opportunity Overview
20 pages
Analysis and Simulation of Plant Disease Progress
No ratings yet
Analysis and Simulation of Plant Disease Progress
13 pages
Docx HTML Editor User Manual v0.0.1
No ratings yet
Docx HTML Editor User Manual v0.0.1
8 pages
7.4 Creating Zombie Computers Botnet
No ratings yet
7.4 Creating Zombie Computers Botnet
12 pages
SPS R780
No ratings yet
SPS R780
6 pages
8150 - Service - Manual - Rev A PDF
100% (1)
8150 - Service - Manual - Rev A PDF
460 pages
F.sq-Wi.013.2 Work Instruction For Performing Test at Switch Gear
No ratings yet
F.sq-Wi.013.2 Work Instruction For Performing Test at Switch Gear
3 pages
Paytm & Alibaba Cloud Collaboration by DHANUS HARSHAN K
No ratings yet
Paytm & Alibaba Cloud Collaboration by DHANUS HARSHAN K
13 pages
Bitcoin Betting Script
No ratings yet
Bitcoin Betting Script
2 pages
21BEC1676 Avinash V Analog Communiction EXP 4
No ratings yet
21BEC1676 Avinash V Analog Communiction EXP 4
6 pages
Cyber Ethics
No ratings yet
Cyber Ethics
1 page
Bundle Shape Defects
No ratings yet
Bundle Shape Defects
12 pages
741 Op-Amp: Features and Applications
No ratings yet
741 Op-Amp: Features and Applications
21 pages
Assignment 1
No ratings yet
Assignment 1
12 pages
Universe II
100% (2)
Universe II
99 pages
Security Primer Securing Login Credentials
No ratings yet
Security Primer Securing Login Credentials
1 page
Philippine Christian University Taft Avenue, Manila: Course Outline and Objectives
No ratings yet
Philippine Christian University Taft Avenue, Manila: Course Outline and Objectives
3 pages
Intro To Lunetta CMOS Synths
67% (3)
Intro To Lunetta CMOS Synths
46 pages
Internship Report - Frontend Developer at Coding75
No ratings yet
Internship Report - Frontend Developer at Coding75
10 pages

Chapter 6 Approximate Dynamic Programming

Uploaded by

Chapter 6 Approximate Dynamic Programming

Uploaded by

PRACTICAL DIFFICULTIES OF DP

• The curse of dimensionality

• Use a policy computed from the DP equation

• There are several ways to compute J˜k+1 :

• Idea: Replace the stochastic problem with a

where xk is known, and

• The CEC applies at time k the control input

• Instead of fixing all future disturbances to their

where xk (Ik ) is an estimate of the current packet

• One-step lookahead (1SL) policy: At each k

• Let J k (xk ) be the cost-to-go from (xk , k) of the

Jˆk (xk ) ≤ J˜k (xk ), (*)

where JˆN = gN and for all k,

J k (xk ) ≤ Jˆk (xk ), for all (xk , k).

• Sometimes nonlinear programming can be used

• Many (problem-dependent) possibilities

• Use a cost-to-go approximation from a paramet-

• Divided in linear and nonlinear [i.e., linear or

• Ideally, the features will encode much of the

• Programs use a feature-based position evaluator

• Many context-dependent special features.

• One-step lookahead policy: At each k and

• A person is given N questions; answering cor-

J k (xk ): Cost-to-go of the rollout policy

Hk (xk ): Cost-to-go of the base policy

• We claim that J k (xk ) ≤ Hk (xk ) for all xk , k

J k+1 (xk+1 ) ≤ Hk+1 (xk+1 ), ∀ xk+1 .

Then, for all xk

− Induction hypothesis ==> 1st inequality

• Any discrete optimization problem (with finite

ABC ABD ACB ACD ADB ADC

ABCD ABDC ACBD ACDB ADBC ADCB

• Given a binary tree with N stages.

• A person takes either a unit step to the left or

• Base heuristic: Always go to the right. Rollout

• A person takes either a unit step to the left or

• Base heuristic: Always go to the right. Rollout

• The base heuristic need not constitute a policy

• We can use a rolling horizon approximation in

• Consider the heuristic that continues at every

You might also like