REINFORCE tutorial

This repository contains a collection of scripts and notes that explain the basics of the so-called REINFORCE algorithm, a method for estimating the derivative of an expected value with respect to the parameters of a distribution.

The method was introduced into the reinforcement learning literature by Ronald J. Williams in "Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning" (Machine Learning, 1992) but has earlier precedents.

This repository was created to provide some background material for a talk I gave 6 March 2017 at the Berlin machine learning meet-up. The slides from the talk are also available here, although they are not completely self-explanatory.

I have also included a few theoretical notes which explain various aspects of REINFORCE, Trust Region Policy Optimization, and other policy gradients methods:

"A Few Observations About Policy Gradient Approximations" contains an introductory description of the REINFORCE method;
"Policy Exploration without Back-Looking Terms" explains a term-dropping trick that reduces the variance of the gradient estimate without changing its mean;
"A Minimal Working Example of Empirical Gradient Ascent" explicitly computes the distribution and mean of the gradient estimate in a simple example;
"Policy Exploration in a Cold Universe" illustrates how the REINFORCE algorithm deals with the exploration/exploitation trade-off in a particularly malicious case.
"Is Randomization Necessary?" explains why stochastic policies may be better than deterministic when the policy class isn't convex.

These papers were originally written for internal use in my company, the robot software company micropsi industries, but are now freely available.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
code		code
pdfs		pdfs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

REINFORCE tutorial

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

REINFORCE tutorial

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages