The credit assignment problem
If a sequence ends in a terminal
state with a high reward, how do
we determine which of the actions
in that sequence were
responsible for it?
This is the credit assignment
problem
The structural credit assignment problem
How is credit assigned to the internal workings of a complex structure?
The backpropagation algorithm addresses structural credit assignment for
artificial neural networks]
Reinforcement learning principles lead to a number of alternatives:
In these methods , a single reinforcement signal is uniformly broadcast to all the
sites of learning, either neurons or individual synapses
Any task that can be learned via error backpropagation can also be learned
using this approach, although possibly more slowly
These network learning methods are consistent with the role of diffusely projecting neural
pathways by which neuromodulators can be widely and nonspecifically distributed.
Hypothesis: Dopamine mediates synaptic enhancement in the
corticostriatal pathway in the manner of a broadcast reinforcement
signal (Wickens, 1990).
The Temporal Credit Assignment Problem
How can reinforcement learning work when the learner’s behavior
is temporally extended and evaluations occur at varying and
unpredictable times?
It is especially relevant in motor control because movements
extend over time and evaluative feedback may become available,
for example, only after the end of a movement.
To address this, reinforcement learning is not only the process of
improving behavior according to given evaluative feedback; it also
includes learning how to improve the evaluative feedback itself:
adaptive critic methods.