Tracking in Reinforcement Learning

Geist, Matthieu; Pietquin, Olivier; Fricout, Gabriel

Tracking in Reinforcement Learning

Gabriel Fricout

2009, Lecture Notes in Computer Science

visibility

…

description

10 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

... LSPI [14], which allows searching an optimal control more effi-ciently, however it is a batchalgorithm which does ... For exemple, incremental natural actor-critic algorithms are presented in [17 ... TD is used as the actor part instead of LSTD, mostly because of the inability of the latter ...

Figures (4)

Algorithm 1. KTD-V, KTD-SARSA and KTD-Q order approaches, such as residual algorithms [9], the cost function minimized by KTD is thus biased. For the value function evaluation (extension to other cases is straightfor- ward), the bias is:

Fig. 2. Boyan Chain: deterministic and non-stationary case

Key takeaways

KTD is a second order algorithm (and thus sample efficient): it updates the mean parameter vector, but also the associated variance matrix.
This is comparable to approaches such as LSTD [8] (nevertheless with the additional ability to handle nonlinear parameterization).
Thus, KTD-V fails to handle the stochastic case as expected, however it converges much faster than LSTD or TD in the deterministic one.
The optimistic policy iteration scheme used in this experiment implies non-stationarity for the learned Q-function, which explains that LSTD fails to learn a near-optimal policy.
TD is used as the actor part instead of LSTD, mostly because of the inability of the latter one to handle non-stationarity.

Andrew Barto

IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2021

This retrospective describes the overall research project that gave rise to the authors' paper "Neuronlike adaptive elements that can solve difficult learning control problems" that was published in the 1983 Neural and Sensory Information Processing special issue of the IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS. This look back explains how this project came about, presents the ideas and previous publications that influenced it, and describes our most closely related subsequent research. It concludes by pointing out some noteworthy aspects of this article that have been eclipsed by its main contributions, followed by commenting on some of the directions and cautions that should inform future research.

Log In

Tracking in Reinforcement Learning

Sign up for access to the world's latest research

Abstract

Figures (4)

Key takeaways

Related papers

Related papers

Related topics