0% found this document useful (0 votes)
66 views14 pages

Brain-Inspired Learning in Artificial Neural Netwo

This document is a review of brain-inspired learning mechanisms in artificial neural networks (ANNs), highlighting the differences between ANNs and biological brains in terms of learning processes. It discusses the integration of biologically plausible mechanisms, such as synaptic plasticity and neuromodulation, to enhance AI capabilities and addresses the challenges of lifelong learning in artificial intelligence. The review aims to identify promising research avenues that could lead to advancements in understanding intelligence through the synergy of biological and artificial learning systems.

Uploaded by

jesusgf96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views14 pages

Brain-Inspired Learning in Artificial Neural Netwo

This document is a review of brain-inspired learning mechanisms in artificial neural networks (ANNs), highlighting the differences between ANNs and biological brains in terms of learning processes. It discusses the integration of biologically plausible mechanisms, such as synaptic plasticity and neuromodulation, to enhance AI capabilities and addresses the challenges of lifelong learning in artificial intelligence. The review aims to identify promising research avenues that could lead to advancements in understanding intelligence through the synergy of biological and artificial learning systems.

Uploaded by

jesusgf96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

APL Machine Learning REVIEW [Link].

org/aip/aml

Brain-inspired learning in artificial neural


networks: A review
Cite as: APL Mach. Learn. 2, 021501 (2024); doi: 10.1063/5.0186054
Submitted: 3 November 2023 • Accepted: 27 March 2024 •
Published Online: 9 May 2024

Samuel Schmidgall,1,a) Rojin Ziaei,2 Jascha Achterberg,3,4 Louis Kirsch,5 S. Pardis Hajiseyedrazi,6
and Jason Eshraghian7

AFFILIATIONS
1
Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland 21212, USA
2
Department of Information Science, University of Maryland, College Park, Maryland 20742, USA
3
Intel Labs, Santa Clara, California 95054, USA
4
Centre for Neural Circuits and Behavior, University of Oxford, Oxford OX1 3SR, United Kingdom
5
The Swiss AI Lab, IDSIA, 6962 Viganello, Switzerland
6
Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland 20742, USA
7
Department of Electrical and Computer Engineering, University of California, Santa Cruz, Santa Cruz, California 95064, USA

a)
Author to whom correspondence should be addressed: sschmi46@[Link]

ABSTRACT
Artificial neural networks (ANNs) have emerged as an essential tool in machine learning, achieving remarkable success across diverse
domains, including image and speech generation, game playing, and robotics. However, there exist fundamental differences between ANNs’
operating mechanisms and those of the biological brain, particularly concerning learning processes. This paper presents a comprehensive
review of current brain-inspired learning representations in artificial neural networks. We investigate the integration of more biologically
plausible mechanisms, such as synaptic plasticity, to improve these networks’ capabilities. Moreover, we delve into the potential advantages
and challenges accompanying this approach. In this review, we pinpoint promising avenues for future research in this rapidly advancing field,
which could bring us closer to understanding the essence of intelligence.
© 2024 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY) license
([Link] [Link]

INTRODUCTION challenging tasks. These tasks include, but are not limited to, the
generation of images and text from human-provided prompts,4–7
The dynamic interrelationship between memory and learn- the control of complex robotic systems,8–10 the mastery of strategy
ing is a fundamental hallmark of intelligent biological systems. It games such as Chess and Go,11 and a multimodal amalgamation
empowers organisms to not only assimilate new knowledge but of these.12
also to continuously refine their existing abilities, enabling them While ANNs have made significant advancements in various
to adeptly respond to changing environmental conditions. This fields, there are still major limitations in their ability to continuously
adaptive characteristic is relevant on various time scales, encom- learn and adapt such as biological brains.13–15 Unlike current mod-
passing both long-term learning and rapid short-term learning els of machine intelligence, animals can learn throughout their entire
via short-term plasticity mechanisms, highlighting the complexity lifespan, which is essential for stable adaptation to changing environ-
and adaptability of biological neural systems.1–3 The development ments. This ability, known as lifelong learning, remains a significant
of artificial systems that draw high-level, hierarchical inspiration challenge for artificial intelligence, which primarily optimizes prob-
from the brain has been a long-standing scientific pursuit span- lems consisting of fixed labeled datasets, causing it to struggle
ning several decades. While earlier attempts were met with lim- to generalize to new tasks or retain information across repeated
ited success, the most recent generation of artificial intelligence learning iterations.14 Addressing this challenge is an active area
(AI) algorithms has achieved significant breakthroughs in many of research, and the potential implications of developing AI with

APL Mach. Learn. 2, 021501 (2024); doi: 10.1063/5.0186054 2, 021501-1


© Author(s) 2024
APL Machine Learning REVIEW [Link]/aip/aml

lifelong learning abilities could have far-reaching impacts across effects on neural function. Different types of neuromodulation have
multiple domains. been identified, including acetylcholine, dopamine, and serotonin,
In this paper, we offer a unique review that seeks to identify which have been linked to various functions such as attention, learn-
the mechanisms of learning in the brain that have inspired cur- ing, and emotion.23 Neuromodulation has been suggested to play a
rent artificial intelligence algorithms. The scope of this review is role in various forms of plasticity, including short-19 and long-term
for algorithms that modify the parameters of a neural network, plasticity.22
such as synaptic plasticity, and how they relate to the brain. To Metaplasticity: The ability of neurons to modify both their
better understand the biological processes underlying natural intel- function and structure based on activity is what characterizes synap-
ligence, the first section will explore the low-level components tic plasticity. These modifications that occur at the synapse must
that shape neuromodulation, from synaptic plasticity to the role be precisely organized so that changes occur at the right time
of local and global dynamics that shape neural activity. This will and in the right quantity. This regulation of plasticity is referred
be related back to ANNs in the third section, where we compare to as metaplasticity, or the “plasticity of synaptic plasticity,” and
and contrast ANNs with biological neural systems. This will give plays a vital role in safeguarding the constantly changing brain
us a logical basis that seeks to justify why the brain has more to from its own saturation.24–26 Essentially, metaplasticity alters the
offer AI beyond the inheritance of current artificial models. Fol- ability of synapses to generate plasticity by inducing a change in
lowing that, we will delve into algorithms for artificial learning that the physiological state of neurons or synapses. Metaplasticity has
emulate these processes to improve the capabilities of AI systems. been proposed as a fundamental mechanism in memory stability,
Finally, we will discuss various applications of these AI techniques learning, and regulating neural excitability. While similar, metaplas-
in real-world scenarios, highlighting their potential impact on fields ticity can be distinguished from neuromodulation, with metaplastic
such as robotics, lifelong learning, and neuromorphic computing. and neuromodulatory events often overlapping in time during the
By doing so, we aim to provide a comprehensive understanding modification of a synapse.
of the interplay between learning mechanisms in the biological Neurogenesis: The process by which newly formed neurons are
brain and artificial intelligence, highlighting the potential benefits integrated into existing neural circuits is referred to as neurogene-
that can arise from this synergistic relationship. We hope our find- sis. Neurogenesis is most active during embryonic development but
ings will encourage a new generation of brain-inspired learning is also known to occur throughout the adult lifetime, particularly in
algorithms. the subventricular zone of the lateral ventricles,27 the amygdala,28
and in the dentate gyrus of the hippocampal formation.29 In adult
PROCESSES THAT SUPPORT LEARNING IN THE BRAIN mice, neurogenesis has been demonstrated to increase when living
in enriched environments vs in standard laboratory conditions.30
A grand effort in neuroscience aims at identifying the under- In addition, many environmental factors, such as exercise31,32 and
lying processes of learning in the brain. Several mechanisms have stress,33,34 have been demonstrated to change the rate of neuroge-
been proposed to explain the biological basis of learning at varying nesis in the rodent hippocampus. Overall, while the role of neuro-
levels of granularity—from the synapse to population-level activ- genesis in learning is not fully understood, it is believed to play an
ity. However, the vast majority of biologically plausible models important role in supporting learning in the brain.
of learning are characterized by plasticity that emerges from the Glial cells: Glial cells, or neuroglia, play a vital role in support-
interaction between local and global events.16 Below, we introduce ing learning and memory by modulating neurotransmitter signaling
various forms of plasticity and how these processes interact in at synapses, the small gaps between neurons where neurotransmit-
more detail. ters are released and received.35 Astrocytes, one type of glial cell, can
Synaptic plasticity: Plasticity in the brain refers to the capac- release and reuptake neurotransmitters, as well as metabolize and
ity of experience to modify the function of neural circuits. The detoxify them. This helps to regulate the balance and availability of
plasticity of synapses specifically refers to the modification of the neurotransmitters in the brain, which is essential for normal brain
strength of synaptic transmission based on activity and is currently function and learning.36 Microglia, another type of glial cell, can also
the most widely investigated mechanism by which the brain adapts modulate neurotransmitter signaling and participate in the repair
to new information.17,18 There are two broader classes of synap- and regeneration of damaged tissue, which is important for learn-
tic plasticity: short- and long-term plasticity. Short-term plasticity ing and memory.37 In addition to repair and modulation, structural
acts on the scale of tens of milliseconds to minutes and has an changes in synaptic strength require the involvement of different
important role in short-term adaptation to sensory stimuli and types of glial cells, with the most notable influence coming from
short-lasting memory formation.19 Long-term plasticity acts on the astrocytes.36 However, despite their crucial involvement, we have
scale of minutes to more and is thought to be one of the primary yet to fully understand the role of glial cells. Understanding the
processes underlying long-term behavioral changes and memory mechanisms by which glial cells support learning at synapses is an
storage.20 important area of ongoing research.
Neuromodulation: In addition to the plasticity of synapses,
another important mechanism by which the brain adapts to new
information is through neuromodulation.3,21,22 Neuromodulation DEEP NEURAL NETWORKS AND PLASTICITY
refers to the regulation of neural activity by chemical signaling
Artificial and spiking neural networks
molecules, often referred to as neurotransmitters or hormones.
These signaling molecules can alter the excitability of neural circuits Artificial neural networks have played a vital role in machine
and the strength of synapses and can have both short- and long-term learning over the past several decades. These networks have seen

APL Mach. Learn. 2, 021501 (2024); doi: 10.1063/5.0186054 2, 021501-2


© Author(s) 2024
APL Machine Learning REVIEW [Link]/aip/aml

FIG. 1. Graphical depiction of long-term potentiation (LTP) and depression (LTD) at the synapse of biological neurons. (a) Synaptically connected pre- and post-synaptic
neurons. (b) Synaptic terminal, the connection point between neurons. (c) Synaptic growth (LTP) and synaptic weakening (LTD). (d) (Top) Membrane potential dynamics in
the axon hillock of the neuron. (Bottom) Pre- and post-synaptic spikes. (e) Spike-timing dependent plasticity curve depicting experimental recordings of LTP and LTD.

tremendous progress toward solving a variety of challenging prob- neurons (or via sensory input) in the form of a membrane potential.
lems. Many of the most impressive accomplishments in AI have Once a neuron’s membrane potential surpasses a threshold value, it
been realized through the use of large ANNs trained on tremendous fires a binary “spike” to all of its outgoing (post-synaptic) connec-
amounts of data. While there have been many technical advance- tions. Spikes have been theoretically demonstrated to contain more
ments, many of the accomplishments in AI can be explained by information than rate-based representations of information (such as
innovations in computing technology, such as large-scale GPU in ANNS), despite being both binary and sparse in time.41 In addi-
accelerators and the accessibility of data. While the application of tion, modeling studies have shown the advantages of SNNs, such as
large-scale ANNs has led to major innovations, there are still many better energy efficiency, the ability to process noisy and dynamic
challenges ahead. A few of the most pressing practical limitations of data, and the potential for more robust and fault-tolerant comput-
ANNs are that they are not efficient in terms of power consump- ing.42 These benefits are not solely attributed to their increased bio-
tion and they are not very good at processing dynamic and noisy logical plausibility but also to the unique properties of spiking neural
data. In addition, ANNs are not able to learn beyond their train- networks that distinguish them from conventional artificial neu-
ing period (e.g., during deployment), from which data assumes an ral networks. A simple working model of a leaky integrate-and-fire
independent and identically distributed (IID) form without time, neuron is described as follows:
which does not reflect physical reality where information is highly dV
temporally and spatially correlated. These limitations have led to τm = EL − V(t) + Rm Iinj (t),
dt
their application requiring vast amounts of energy when deployed
in large-scale settings38 and have also presented challenges toward where V(t) is the membrane potential at time t, τ m is the mem-
integration into edge computing devices, such as robotics and brane time constant, EL is the resting potential, Rm is the mem-
wearable devices.39 brane resistance, I inj (t) is the injected current, V th is the threshold
Looking toward neuroscience at a solution, researchers have potential, and V reset is the reset potential. When the membrane
been exploring spiking neural networks (SNNs) as an alternative to potential reaches the threshold potential, the neuron spikes, and the
ANNs40 (Fig. 1). SNNs are a class of ANNs that are designed to more membrane potential is reset to the reset potential [if V(t) ≥ V th ,
closely resemble the behavior of biological neurons. The primary then V(t) ← V reset ].
difference between ANNs and SNNs is the idea that SNNs incorpo- Homeostatic regulation is an additional process that maintains
rate the notion of timing into their communication. Spiking neurons the stability of internal conditions against external changes. This
accumulate information across time from connected (presynaptic) regulation is achieved through feedback mechanisms that adjust

APL Mach. Learn. 2, 021501 (2024); doi: 10.1063/5.0186054 2, 021501-3


© Author(s) 2024
APL Machine Learning REVIEW [Link]/aip/aml

physiological processes, ensuring optimal functioning and equilib- A simple Hebbian learning rule can be described mathemati-
rium within the organism. In SNNs, homeostatic regulation also cally using the following equation:
involves adjusting the spike threshold of neurons to stabilize net-
work activity. This adjustment can be mathematically described Δwij = ηxi xj ,
as V th (t + 1) = V th (t) + β ⋅ (Atarget − Aactual ), where V th represents
the neuron’s threshold potential, β is a modulation parameter, and where Δwij is the change in the weight between neuron i and neuron
Atarget and Aactual denote the target and actual firing rates. Homeo- j, η is the learning rate, and xi is the “activity” in neurons i, often
static regulation has been shown to be useful in learning applications thought of as the neuron firing rate. This rule states that if the two
of SNNs.43–45 neurons are activated at the same time, their connection should be
Despite these potential advantages, SNNs are still in the early strengthened.
stages of development, and there are several challenges that need One potential drawback of the basic Hebbian rule is its instabil-
to be addressed before they can be used more widely. One of ity. For example, if xi and xj are initially weakly positively correlated,
the most pressing challenges is regarding how to optimize the this rule will increase the weight between the two, which will in turn
synaptic weights of these models, as traditional backpropagation- reinforce the correlation, leading to even larger weight increases, etc.
based methods from ANNs fail due to the discrete and sparse Thus, some form of stabilization is needed. This can be done simply
nonlinearity. Irrespective of these challenges, there do exist some by bounding the weights or by more complex rules that take into
works that push the boundaries of what was thought possible account additional factors such as the history of the pre- and post-
with modern spiking networks, such as large spike-based trans- synaptic activity or the influence of other neurons in the network
former models.46 Spiking models are of great importance for this (see Ref. 53 for a practical review of many such rules).
review since they form the basis of many brain-inspired learning Three-factor rules: Hebbian reinforcement learning: By incor-
algorithms. porating information about rewards, Hebbian learning can also be
used for reinforcement learning. An apparently plausible idea is
simply to multiply the Hebbian update by the reward directly, as
follows:
Hebbian and spike-timing dependent plasticity
Δwij = ηxi xj R,
Hebbian and spike-timing dependent plasticity (STDP) are
two prominent models of synaptic plasticity that play important with R being the reward (for this time step or the whole episode).
roles in shaping neural circuitry and behavior. The Hebbian learn- Unfortunately, this idea does not produce reliable reinforcement
ing rule, first proposed by Hebb in 1949,47 posits that synapses learning. This can be perceived intuitively by noticing that, if wij is
between neurons are strengthened when they are coactive, such already at its optimal value, the rule above will still produce a net
that the activation of one neuron causally leads to the activation change and, thus, drive wij away from the optimum.
of another. STDP, on the other hand, is a more recently proposed More formally, as pointed out by Frémaux et al.,54 to properly
model of synaptic plasticity that takes into account the precise track the actual covariance between inputs, outputs, and rewards, at
timing of pre- and post-synaptic spikes48 to determine synap- least one of the terms in the xi xj R product must be centered, which is
tic strengthening or weakening. It is widely believed that STDP replaced by zero-mean fluctuations around its expected value. One
plays a key role in the formation and refinement of neural cir- possible solution is to center the rewards by subtracting a baseline
cuits during development and in the ongoing adaptation of circuits from R, generally equal to the expected value of R for this trial. While
in response to experience. In the following subsection, we will helpful, in practice, this solution is generally insufficient.
provide an overview of the basic principles of Hebbian learning A more effective solution is to remove the mean value from the
and STDP. outputs. This can be done easily by subjecting neural activations xj
Hebbian learning: Hebbian learning is based on the idea that to occasional random perturbations Δxj , taken from a suitable zero-
the synaptic strength between two neurons should be increased if centered distribution—and then using the perturbation Δxj , rather
they are both active at the same time and decreased if they are than the raw post-synaptic activation xj , in the three-factor product,
not. Hebb suggested that this increase should occur when one cell
“repeatedly or persistently takes part in firing” another cell (with Δwij = ηxi Δxj R.
causal implications). However, this principle is often expressed cor-
relatively, as in the famous aphorism “cells that fire together, wire This is the so-called “node perturbation” rule proposed by Fiete
together” (variously attributed to Löwel49 or Shatz50 ). and Seung.55,56 Intuitively, notice that the effect of the xi Δxj incre-
Hebbian learning is often used as an unsupervised learning ment is to push future xj responses (when encountering the same xi
algorithm, where the goal is to identify patterns in the input data input) in the direction of the perturbation: larger if the perturbation
without explicit feedback.51 An example of this process is the Hop- was positive, smaller if the perturbation was negative. Multiplying
field network, in which large binary patterns are easily stored in a this shift by R results in pushing future responses toward the pertur-
fully connected recurrent network by applying a Hebbian rule to the bation if R is positive and away from it if R is negative. Even if R is
(symmetric) weights.52 It can also be adapted for use in supervised not zero-mean, the net effect (in expectation) will still be to drive wij
learning algorithms, where the rule is modified to take into account toward a higher R, although the variance will be higher.
the desired output of the network. In this case, the Hebbian learn- This rule turns out to implement the REINFORCE algorithm
ing rule is combined with a teaching signal that indicates the correct (Williams’ original paper57 actually proposes an algorithm that is
output for a given input. exactly node-perturbation for spiking stochastic neurons) and, thus,

APL Mach. Learn. 2, 021501 (2024); doi: 10.1063/5.0186054 2, 021501-4


© Author(s) 2024
APL Machine Learning REVIEW [Link]/aip/aml

estimates the theoretical gradient of R over wij . It can also be Backpropagation


implemented in a biologically plausible manner, allowing recurrent
Backpropagation is a powerful error-driven global learning
networks to learn non-trivial cognitive or motor tasks from sparse,
method that changes the weight of connections between neurons
delayed rewards.58
in a neural network to produce a desired target behavior.63 This is
Spike-timing dependent plasticity: Spike-timing dependent plas-
accomplished through the use of a quantitative metric (an objec-
ticity (STDP) is a theoretical model of synaptic plasticity that allows
tive function) that describes the quality of behavior given sensory
the strength of connections between neurons to be modified based
information (e.g., visual input, written text, and robotic joint posi-
on the relative timing of their spikes. Unlike the Hebbian learning
tions). The backpropagation algorithm consists of two phases: the
rule, which relies on the simultaneous activation of pre- and post-
forward pass and the backward pass. In the forward pass, the input is
synaptic neurons, STDP takes into account the precise timing of the
propagated through the network, and the output is calculated. Dur-
pre- and post-synaptic spikes. In particular, STDP suggests that if
ing the backward pass, the error between the predicted output and
a presynaptic neuron fires just before a post-synaptic neuron, the
the “true” output is calculated, and the gradients of the loss function
connection between them should be strengthened. Conversely, if the
with respect to the weights of the network are calculated by propa-
post-synaptic neuron fires just before the presynaptic neuron, the
gating the error backward through the network. These gradients are
connection should be weakened.
then used to update the weights of the network using an optimiza-
STDP has been observed in a variety of biological systems,
tion algorithm such as stochastic gradient descent. This process is
including the neocortex, hippocampus, and cerebellum. The rule has
repeated for many iterations until the weights converge to a set of
been shown to play a crucial role in the development and plasticity
values that minimize the loss function.
of neural circuits, including learning and memory processes. STDP
Here, we provide a brief mathematical explanation of back-
has also been used as a basis for the development of artificial neural
propagation. First, we define a desired loss function, which is a
networks, which are designed to mimic the structure and function
function of the network’s outputs and the true values,
of the brain.
The mathematical equation for STDP is more complex than 1
the Hebbian learning rule and can vary depending on the specific L(y, ŷ) = ∑ (yi − ŷ i )2 ,
2 i
implementation. However, a common formulation is
where y is the true output and ŷ is the network’s output. In this


case, we are minimizing the squared error, but we could very well
⎪ A+ exp (−Δt/τ+ ) if Δt > 0, optimize for any smooth and differentiable loss function.
Δwij = ⎨

⎩ −A− exp (Δt/τ− )
⎪ if Δt < 0, Next, we use the chain rule to calculate the gradient of the loss
with respect to the weights of the network. Let wil j be the weight
between neuron i in layer l and neuron j in layer l + 1, and let ali
where Δwij is the change in the weight between neuron i and neuron
be the activation of neuron i in layer l. Then, the gradients of the loss
j, Δt is the time difference between the pre- and post-synaptic spikes,
with respect to the weights are given by
A+ and A− are the amplitudes of the potentiation and depression,
respectively, and τ + and τ − are the time constants for the potentia- l+1 l+1
∂L ∂L ∂a j ∂zj
tion and depression, respectively. This rule states that the strength = l+1 l+1 ,
of the connection between the two neurons will be increased or ∂wi j ∂a j ∂zj ∂wil j
l

decreased depending on the timing of their spikes relative to each


other. where zjl+1 is the weighted sum of the inputs to neuron j in layer
l + 1. We can then use these gradients to update the weights of the
PROCESSES THAT SUPPORT LEARNING IN ARTIFICIAL network using gradient descent,
NEURAL NETWORKS
∂L
wil j = wil j − α ,
There are two primary approaches for weight optimization in ∂wil j
artificial neural networks: error-driven global learning and brain-
inspired local learning. In the first approach, the network weights where α is the learning rate. By repeatedly calculating the gradients
are modified by driving a global error to its minimum value. This and updating the weights, the network gradually learns to minimize
is achieved by delegating errors to each weight and synchroniz- the loss function and make more accurate predictions. In practice,
ing modifications between each weight. In contrast, brain-inspired gradient descent methods are often combined with approaches to
local learning algorithms aim to learn in a more biologically plausi- incorporate momentum in the gradient estimate, which has been
ble manner by modifying weights from dynamical equations using shown to significantly improve generalization.64
locally available information. Both optimization approaches have The impressive accomplishments of backpropagation have led
unique benefits and drawbacks. In the following sections, we will dis- neuroscientists to investigate whether it can provide a better under-
cuss the most commonly utilized form of error-driven global learn- standing of learning in the brain. While it remains debated as to
ing, backpropagation, followed by in-depth discussions of brain- whether backpropagation variants could occur in the brain,65,66 it
inspired local algorithms. It is worth mentioning that these two is clear that backpropagation in its current formulation is biolog-
approaches are not mutually exclusive and will often be integrated ically implausible. Alternative theories suggest complex feedback
in order to complement their respective strengths.59–62 circuits or the interaction of local activity and top-down signals

APL Mach. Learn. 2, 021501 (2024); doi: 10.1063/5.0186054 2, 021501-5


© Author(s) 2024
APL Machine Learning REVIEW [Link]/aip/aml

(a “third-factor”) could support a similar form of backprop-like research is focused on improving the efficiency and scalability of
learning.65 these algorithms, as well as discovering where and when it makes
Despite its impressive performance, there are still fundamen- sense to use these approaches instead of gradient descent.
tal algorithmic challenges that follow from repeatedly applying
backpropagation to network weights. One such challenge is a phe-
nomenon known as catastrophic forgetting, where a neural network BRAIN-INSPIRED REPRESENTATIONS OF LEARNING
forgets previously learned information when training on new data.13 IN ARTIFICIAL NEURAL NETWORKS
This can occur when the network is fine-tuned on new data or when Local learning algorithms
the network is trained on a sequence of tasks without retaining the
knowledge learned from previous tasks. Catastrophic forgetting is a Unlike global learning algorithms such as backpropagation,
significant hurdle for developing neural networks that can continu- which require information to be propagated through the entire net-
ously learn from diverse and changing environments. Another chal- work, local learning algorithms focus on updating synaptic weights
lenge is that backpropagation requires backpropagating information based on local information from nearby or synaptically connected
through all the layers of the network, which can be computationally neurons (Fig. 2). These approaches are often strongly inspired by
expensive and time-consuming, especially for very deep networks. the plasticity of biological synapses. As will be seen, by leverag-
This can limit the scalability of deep learning algorithms and make ing local learning algorithms, ANNs can learn more efficiently
it difficult to train large models on limited computing resources. and adapt to changing input distributions, making them better
Nonetheless, backpropagation has remained the most widely used suited for real-world applications. In this section, we will review
and successful algorithm for applications involving artificial neural recent advances in brain-inspired local learning algorithms and their
networks. potential for improving the performance and robustness of ANNs
(see Fig. 2).
Evolutionary and genetic algorithms
Backpropagation-derived local learning
Another class of global learning algorithms that has gained
significant attention in recent years is evolutionary and genetic Backpropagation-derived local learning algorithms are a class
algorithms. These algorithms are inspired by the process of nat- of local learning algorithms that attempt to emulate the mathemati-
ural selection and, in the context of ANNs, aim to optimize the cal properties of backpropagation. Unlike the traditional backprop-
weights of a neural network by mimicking the evolutionary process. agation algorithm, which involves propagating error signals back
In genetic algorithms,67 a population of neural networks is initialized through the entire network, backpropagation-derived local learning
with random weights, and each network is evaluated on a specific algorithms update synaptic weights based on local error gradients
task or problem. The networks that perform better on the task are computed using backpropagation. This approach is computation-
then selected for reproduction, whereby they produce offspring with ally efficient and allows for online learning, making it suitable for
slight variations in their weights. This process is repeated over sev- applications where training data are continually arriving.
eral generations, with the best-performing networks being used for One prominent example of backpropagation-derived local
reproduction, making their behavior more likely across generations. learning algorithms is the Feedback Alignment (FA) algorithm,71,72
Evolutionary algorithms operate similarly to genetic algorithms but which replaces the weight transport matrix used in backpropagation
use a different approach by approximating a stochastic gradient.68,69 with a fixed random matrix, allowing the error signal to propagate
This is accomplished by perturbing the weights and combining the from direct connections, thus avoiding the need for backpropagating
network objective function performances to update the parameters. error signals. A brief mathematical description of feedback align-
This results in a more global search of the weight space, which can be ment is as follows: let wout be the weight matrix connecting the last
more efficient at finding optimal solutions compared to local search layer of the network to the output, and win be the weight matrix con-
methods such as backpropagation.70 necting the input to the first layer. In feedback alignment, the error
One advantage of these algorithms is their ability to search signal is propagated from the output to the input using the fixed ran-
a vast parameter space efficiently, making them suitable for prob- dom matrix B rather than the transpose of wout. The weight updates
lems with large numbers of parameters or complex search spaces. are then computed using the product of the input and the error sig-
In addition, they do not require a differentiable objective function, nal, Δwin = −ηxz, where x is the input, η is the learning rate, and z is
which can be useful in scenarios where the objective function is dif- the error signal propagated backward through the network, similar
ficult to define or calculate (e.g., spiking neural networks). However, to traditional backpropagation.
these algorithms also have some drawbacks. One major limitation Direct Feedback Alignment72 (DFA) simplifies the weight
is the high computational cost required to evaluate and evolve a transport chain compared with FA by directly connecting the output
large population of networks. Another challenge is the potential for layer error to each hidden layer. The Sign-Symmetry (SS) algorithm
the algorithm to become stuck in local optima or to converge too is similar to FA except the feedback weights symmetrically share
quickly, resulting in suboptimal solutions. In addition, the use of signs. Recent progress in feedback alignment explores incorporat-
random mutations can lead to instability and unpredictability in the ing backprojections with cortical hierarchies73 and is completely
learning process. phase free (no forward or backward passes). While FA has exhib-
Regardless, evolutionary and genetic algorithms have shown ited impressive results on small datasets such as MNIST and CIFAR,
promising results in various applications, particularly when optimiz- their performance on larger datasets such as ImageNet is often sub-
ing non-differentiable and non-trivial parameter spaces. Ongoing optimal.74 On the other hand, recent studies have shown that the

APL Mach. Learn. 2, 021501 (2024); doi: 10.1063/5.0186054 2, 021501-6


© Author(s) 2024
APL Machine Learning REVIEW [Link]/aip/aml

FIG. 2. Feedforward neural network computes an output given an input by propagating the input information downstream. The precise value of the output is determined by
the weight of synaptic coefficients. To improve the output for a task given an input, the synaptic weights are modified. Synaptic plasticity algorithms represent computational
models that emulate the brain’s ability to strengthen or weaken synapses—connections between neurons—based on their activity, thereby facilitating learning and memory
formation. Three-factor plasticity refers to a model of synaptic plasticity in which changes to the strength of neural connections are determined by three factors: presynaptic
activity, post-synaptic activity, and a modulatory signal, facilitating more nuanced and adaptive learning processes. The feedback alignment algorithm is a learning technique
in which artificial neural networks are trained using random, fixed feedback connections rather than symmetric weight matrices, demonstrating that successful learning can
occur without precise backpropagation. Backpropagation is a fundamental algorithm in machine learning and artificial intelligence used to train neural networks by calculating
the gradient of the loss function with respect to the weights in the network.

SS algorithm is capable of achieving comparable performance to diversity and neuron-type-specific local neuromodulation may be
backpropagation, even on large-scale datasets.75 critical pieces of the biological credit-assignment puzzle. In this
Eligibility propagation60,76 (e-prop) extends the idea of feed- work, the authors instantiate a simplified computational model
back alignment for spiking neural networks, combining the advan- based on eligibility propagation to explore this theory and show
tages of both traditional error backpropagation and biologically that their model, which includes both dopamine-like temporal dif-
plausible learning rules, such as spike-timing-dependent plastic- ference and neuropeptide-like local modulatory signaling, leads to
ity (STDP). For each synapse, the e-prop algorithm computes and improvements over previous methods such as e-prop and feedback
dzj (t)
maintains an eligibility trace eji (t) = dW ji
, derived based on real- alignment.
Generalization properties: Techniques in deep learning have
time recurrent learning.77,78 Eligibility traces measure the total con- made tremendous strides toward understanding the generalization
tribution of this synapse to the neuron’s current output, taking into
of their learning algorithms. A particularly useful discovery was
account all past inputs.3 This can be computed and updated in a
that flat minima tend to lead to better generalization.82 What is
purely forward manner, without backward passes. This eligibility
meant by this is that, given a perturbation ε in the parameter space
trace is then multiplied by an estimate of the gradient of the error
(synaptic weight values), more significant performance degrada-
over the neuron’s output L j (t) = dzdE(t)
j (t)
to obtain the actual weight tion is observed around narrower minima. Learning algorithms that
gradient dE(t)
dWji
. Lj (t) itself is computed from the error at the out- find flatter minima in parameter space ultimately lead to better
put neurons, either by using symmetric feedback weights or by using generalization.
fixed feedback weights, as in feedback alignment. A possible draw- Recent work has explored the generalization properties
back of e-prop is that it requires a real-time error signal Lt at each exhibited by (brain-inspired) backpropagation-derived local learn-
point in time since it only takes into account past events and is ing rules.83 Compared with backpropagation through time,
blind to future errors. In particular, it cannot learn from delayed backpropagation-derived local learning rules exhibit worse and
error signals that extend beyond the time scales of individual neu- more variable generalization, which does not improve by scaling the
rons (including short-term adaptation),60 in contrast with methods step size due to the gradient approximation being poorly aligned
such as REINFORCE and node-perturbation. In addition, the weight with the true gradient. While it is perhaps unsurprising that local
update is an approximation of the true gradient, which can lead to approximations of an optimization process are going to have worse
difficulties in spatial scaling. generalization properties than their complete counterpart, this work
In the work of Refs. 79 and 80, a normative theory for synap- opens the door toward asking new questions about what the best
tic learning based on recent genetic findings81 of neuronal signaling approach toward designing brain-inspired learning algorithms is.
architectures is demonstrated. They propose that neurons commu- It also opens the question as to whether backpropagation-derived
nicate their contribution to the learning outcome to nearby neurons local learning rules are even worth exploring given that they are
via cell-type-specific local neuromodulation and that neuron-type fundamentally going to exhibit sub-par generalization.

APL Mach. Learn. 2, 021501 (2024); doi: 10.1063/5.0186054 2, 021501-7


© Author(s) 2024
APL Machine Learning REVIEW [Link]/aip/aml

In conclusion, while backpropagation-derived local learning Plasticity with spiking neurons: Recent advances in backpropa-
rules present themselves as a promising approach to designing gating through the non-differentiable part of spiking neurons with
brain-inspired learning algorithms, they come with limitations that surrogate gradients have allowed for differentiable plasticity to be
must be addressed. The poor generalization of these algorithms used to optimize plasticity rules in spiking neural networks.61 In
highlights the need for further research to improve their perfor- Ref. 62, the capability of this optimization paradigm is demonstrated
mance and to explore alternative brain-inspired learning rules. It through the use of a differentiable spike-timing dependent plasticity
also opens the question as to whether backpropagation-derived rule to enable “learning to learn” on an online one-shot continual
local learning rules are even worth exploring given that they are learning problem and an online one-shot image class recognition
fundamentally going to exhibit sub-par generalization. problem. A similar method was used to optimize the third-factor sig-
nal using the gradient approximation of e-prop as the plasticity rule,
introducing a meta-optimization form of e-prop.92 Recurrent neural
Meta-optimized plasticity rules networks tuned by evolution can also be used for meta-optimized
learning rules. Evolvable Neural Units93 (ENUs) introduce a gat-
Meta-optimized plasticity rules offer an effective balance ing structure that controls how the input is processed and stored,
between error-driven global learning and brain-inspired local learn- and dynamic parameters are updated. This work demonstrates the
ing. Meta-learning can be defined as the automation of the search evolution of individual somatic and synaptic compartment mod-
for learning algorithms themselves, where, instead of relying on els of neurons and shows that a network of ENUs can learn to
human engineering to describe a learning algorithm, a search pro- solve a T-maze environment task, independently discovering spik-
cess to find that algorithm is employed.84 The idea of meta-learning ing dynamics and reinforcement-type learning rules. Meta-learning
naturally extends to brain-inspired learning algorithms, such that has also been introduced to optimize the natural physical structure
the brain-inspired mechanism of learning itself can be optimized, of spiking reservoir systems to determine the optimal initialization
thereby allowing for the discovery of more efficient learning without before a task is learned.94
manual tuning of the rule. In the following section, we discuss var- Plasticity in RNNs and Transformers: Independent of research
ious aspects of this research, starting with differentiably optimized aiming at learning plasticity using update rules, Transformers have
synaptic plasticity rules. recently been shown to be good intra-lifetime learners.5,95,96 The
Differentiable plasticity: One instantiation of this principle in process of in-context learning works not through the update of
the literature is differentiable plasticity, which is a framework that synaptic weights but purely within the network activations. As in
focuses on optimizing synaptic plasticity rules in neural networks Transformers, this process can also happen in recurrent neural
through gradient descent.85,86 In these rules, the plasticity rules are networks.97 While in-context learning initially appears to be a dif-
described in such a way that the parameters governing their dynam- ferent mechanism from synaptic plasticity, these processes have
ics are differentiable, allowing for backpropagation to be used for been demonstrated to exhibit a strong relationship. One excit-
meta-optimization of the plasticity rule parameters (e.g., the η term ing connection discussed in the literature is the realization that
in the simple Hebbian rule or the A+ term in the STDP rule). This parameter-sharing by the meta-learner often leads to the interpre-
allows the weight dynamics to precisely solve a task that requires tation of activations as weights.98 This demonstrates that, while these
the weights to be optimized during execution time, referred to as models may have fixed weights, they exhibit some of the same learn-
intra-lifetime learning. ing capabilities as models with plastic weights. Another connection
Differentiable plasticity rules are also capable of the differen- is that self-attention in the Transformer involves outer and inner
tiable optimization of neuromodulatory dynamics.61,86 This frame- products that can be cast as learned weight updates99 that can even
work includes two main variants of neuromodulation: global neuro- implement gradient descent.100,101
modulation, where the direction and magnitude of weight changes Evolutionary and genetic meta-optimization: Much like differ-
are controlled by a network-output-dependent global parameter, entiable plasticity, evolutionary and genetic algorithms have been
and retroactive neuromodulation, where the effect of past activity used to optimize the parameters of plasticity rules in a variety of
is modulated by a dopamine-like signal within a short time win- applications,102 including adaptation to limb damage on robotic
dow. This is enabled by the use of eligibility traces, which are used to systems.103,104 Recent work has also enabled the optimization of
keep track of which synapses contributed to recent activity, and the both plasticity coefficients and plasticity rule equations through the
dopamine signal modulates the transformation of these traces into use of Cartesian genetic programming,105 presenting an automated
actual plastic changes. approach for discovering biologically plausible plasticity rules based
Methods involving differentiable plasticity have seen improve- on the specific task being solved. In these methods, the genetic or
ments in a wide range of applications, from sequential associative evolutionary optimization process acts similarly to the differentiable
tasks87 to familiarity detection88 to robotic noise adaptation.61 process in that it optimizes the plasticity parameters in an outer-loop
This method has also been used to optimize short-term plasticity process, while the plasticity rule optimizes the reward in an inner-
rules,88,89 which exhibit improved performance in reinforce- loop process. These methods are appealing since they have a much
ment and temporal supervised learning problems. While these lower memory footprint compared to differentiable methods since
methods show much promise, differentiable plasticity approaches they do not require backpropagating errors over time. However,
take a tremendous amount of memory, as backpropagation is used while memory efficient, they often require a tremendous amount of
to optimize multiple parameters for each synapse over time. Practi- data to get comparable performance to gradient-based methods.106
cal advancements with these methods will likely require parameter Self-referential meta-learning: While synaptic plasticity has
sharing90 or a more memory-efficient form of backpropagation.91 two-levels of learning, the meta-learner and the discovered

APL Mach. Learn. 2, 021501 (2024); doi: 10.1063/5.0186054 2, 021501-8


© Author(s) 2024
APL Machine Learning REVIEW [Link]/aip/aml

learning rule, self-referential meta-learning107,108 extends this hier- In recent years, advances in neuromorphic computing have led
archy. In plasticity approaches, only a subset of the network to the development of various platforms, such as Intel’s Loihi,120
parameters are updated (e.g., the synaptic weights), whereas the IBM’s TrueNorth,121 and SpiNNaker,122 which offer specialized
meta-learned update rule remains fixed after meta-optimization. hardware architectures for implementing SNNs and brain-inspired
Self-referential architectures enable a neural network to modify all learning algorithms. These platforms provide a foundation for fur-
of its parameters in a recursive fashion. Thus, the learner can also ther exploration of neuromorphic computing systems, enabling
modify the meta-learner. This, in principle, allows arbitrary levels of researchers to design, simulate, and evaluate novel neural network
learning, meta-learning, meta–meta-learning, etc. Some approaches architectures and learning rules. As neuromorphic computing con-
meta-learn the parameter initialization of such a system.107,109 Find- tinues to progress, it is expected to play a pivotal role in the future of
ing this initialization still requires a hardwired meta-learner. In other artificial intelligence, driving innovation and enabling the develop-
works, the network self-modifies in a way that eliminates even this ment of more efficient, versatile, and biologically plausible learning
meta-learner.108,110 Sometimes the learning rule to be discovered has systems.123,124
structural search space restrictions that simplify self-improvement, Robotic learning: Brain-inspired learning in neural networks
where a gradient-based optimizer can discover itself111 or an evo- has the potential to overcome many of the current challenges present
lutionary algorithm can optimize itself.112 Despite their differ- in the field of robotics by enabling robots to learn and adapt to
ences, both synaptic plasticity and self-referential approaches aim to their environment in a more flexible way.125,126 Traditional robotics
achieve self-improvement and adaptation in neural networks. systems rely on pre-programmed behaviors, which are limited in
Generalization of meta-optimized learning rules: The extent their ability to adapt to changing conditions. In contrast, as we have
to which discovered learning rules generalize to a wide range of shown in this review, neural networks can be trained to adapt to new
tasks is a significant open question—in particular, when should situations by adjusting their internal parameters based on the data
they replace manually derived general-purpose learning rules such they receive.
as backpropagation? A particular observation that poses a chal- Because of their natural relationship to robotics, brain-inspired
lenge to these methods is that when the search space is large and learning algorithms have a long history in robotics.125 Toward this
few restrictions are put on the learning mechanism,97,113,114 gen- end, synaptic plasticity rules have been introduced for adapting
eralization is shown to become more difficult. However, toward robotic behavior to domain shifts such as motor gains and rough
amending this, in variable shared meta-learning,98 flexible learn- terrain,61,127–129 as well as for obstacle avoidance130–132 and artic-
ing rules were parameterized by parameter-shared recurrent neural ulated (arm) control.133,134 Brain-inspired learning rules have also
networks that locally exchange information to implement learning been used to explore how learning occurs in the insect brain using
algorithms that generalize across classification problems not seen robotic systems as an embodied medium.135–138
during meta-optimization. Similar results have also been shown for Deep reinforcement learning (DRL) represents a signifi-
the discovery of reinforcement learning algorithms.115 cant success of brain-inspired learning algorithms, combining
the strengths of neural networks with the theory of reinforce-
ment learning in the brain to create autonomous agents capa-
APPLICATIONS OF BRAIN-INSPIRED LEARNING ble of learning complex behaviors through interaction with their
environment.139–141 By utilizing a reward-driven learning pro-
Neuromorphic Computing: Neuromorphic computing repre- cess emulating the activity of dopamine neurons142 as opposed
sents a paradigm shift in the design of computing systems, with to the minimization of, e.g., classification or regression error,
the goal of creating hardware that mimics the structure and func- DRL algorithms guide robots toward learning optimal strategies
tionality of the biological brain.42,116,117 This approach seeks to to achieve their goals, even in highly dynamic and uncertain
develop artificial neural networks that not only replicate the brain’s environments.143,144 This powerful approach has been demonstrated
learning capabilities but also its energy efficiency and inherent in a variety of robotic applications, including dexterous manipula-
parallelism. Neuromorphic computing systems often incorporate tion, robotic locomotion,145 and multi-agent coordination.146
specialized hardware, such as neuromorphic chips or memris- Lifelong and online learning: Lifelong and online learning are
tive devices, to enable the efficient execution of brain-inspired essential applications of brain-inspired learning in artificial intelli-
learning algorithms.117,118 These systems have the potential to gence, as they enable systems to adapt to changing environments
drastically improve the performance of machine learning appli- and continuously acquire new skills and knowledge.14 Traditional
cations, particularly in edge computing and real-time processing machine learning approaches, in contrast, are typically trained on
scenarios. a fixed dataset and lack the ability to adapt to new information or
A key aspect of neuromorphic computing lies in the develop- changing environments. The mature brain is an incredible medium
ment of specialized hardware architectures that facilitate the imple- for lifelong learning, as it is constantly learning while remaining rel-
mentation of spiking neural networks, which more closely resemble atively fixed in size across the span of a lifetime.147 As this review has
the information processing mechanisms of biological neurons. Neu- demonstrated, neural networks endowed with brain-inspired learn-
romorphic systems operate based on the principle of brain-inspired ing mechanisms, similar to the brain, can be trained to learn and
local learning, which allows them to achieve high energy efficiency, adapt continuously, improving their performance over time.
low-latency processing, and robustness against noise, which are crit- The development of brain-inspired learning algorithms that
ical for real-world applications.119 The integration of brain-inspired enable artificial systems to exhibit this capability has the poten-
learning techniques with neuromorphic hardware is vital for the tial to significantly enhance their performance and capabilities and
successful application of this technology. has wide-ranging implications for a variety of applications. These

APL Mach. Learn. 2, 021501 (2024); doi: 10.1063/5.0186054 2, 021501-9


© Author(s) 2024
APL Machine Learning REVIEW [Link]/aip/aml

applications are particularly useful in situations where data are benefit: not only does it promise to invigorate innovation in engi-
scarce or expensive to collect, such as in robotics148 or autonomous neering, but it also brings us closer to unraveling the intricate
systems,149 as they allow the system to learn and adapt in real-time processes at play within the brain. With more realistic models, we
rather than requiring large amounts of data to be collected and can probe deeper into the complexities of brain computation from
processed before learning can occur. the novel perspective of artificial intelligence.
One of the primary objectives in the field of lifelong learning is
to alleviate a major issue associated with the continuous application CONCLUSION
of backpropagation on ANNs, a phenomenon known as catastrophic
forgetting.13 Catastrophic forgetting refers to the tendency of an In this review, we investigate the integration of more bio-
ANN to abruptly forget previously learned information upon learn- logically plausible learning mechanisms into ANNs. This further
ing new data. This happens because the weights in the network integration presents itself as an important step for both neuro-
that were initially optimized for earlier tasks are drastically altered science and artificial intelligence. This is particularly relevant amid
to accommodate the new learning, thereby erasing or overwrit- the tremendous progress that has been made in artificial intelligence
ing the previous information. This is because the backpropagation with large language models and embedded systems, which are in
algorithm does not inherently factor in the need to preserve previ- critical need of more energy efficient approaches for learning and
ously acquired information while facilitating new learning. Solving execution. In addition, while ANNs are making great strides in these
this problem has remained a significant hurdle in AI for decades. applications, there are still major limitations in their ability to adapt
We posit that by employing brain-inspired learning algorithms that such as biological brains, which we see as a primary application of
emulate the dynamic learning mechanisms of the brain, we may brain-inspired learning mechanisms.
be able to capitalize on the proficient problem-solving strategies As we strategize for future collaboration between neuroscience
inherent to biological organisms. and AI toward more detailed brain-inspired learning algorithms,
Toward understanding the brain: The worlds of artificial intel- it is important to acknowledge that the past influences of neu-
ligence and neuroscience have been greatly benefiting from each roscience on AI have seldom been about a straightforward appli-
other. Deep neural networks, specially tailored for certain tasks, cation of ready-made solutions to machines.157 More often than
show striking similarities to the human brain in how they han- not, neuroscience has stimulated AI researchers by posing intrigu-
dle spatial150–152 and visual153–155 information. This overlap hints at ing algorithmic-level questions about aspects of animal learning
the potential of artificial neural networks (ANNs) as useful models and intelligence. It has provided preliminary guidance toward vital
in our efforts to better understand the brain’s complex mechan- mechanisms that support learning. Our perspective is that by har-
ics. A new movement referred to as the neuroconnectionist research nessing the insights drawn from neuroscience, we can significantly
program156 embodies this combined approach, using ANNs as a accelerate advancements in the learning mechanisms used in ANNs.
computational language to form and test ideas about how the brain Similarly, experiments using brain-like learning algorithms in AI
computes. This perspective brings together different research efforts, can accelerate our understanding of neuroscience.
offering a common computational framework and tools to test
specific theories about the brain.
While this review highlights a range of algorithms that imitate ACKNOWLEDGMENTS
the brain’s functions, we still have a substantial amount of work to We thank the OpenBioML collaborative workspace, to which
do to fully grasp how learning actually happens in the brain. The use several of the authors of this work were connected. This material
of backpropagation and backpropagation-like local learning rules is based upon work supported by the National Science Foundation
to train large neural networks may provide a good starting point Graduate Research Fellowship for Comp/IS/Eng—Robotics under
for modeling brain function. Much productive investigation has Grant Nos. DGE2139757 and 2235440.
occurred to see what processes in the brain may operate similarly to
backpropagation,65 leading to new perspectives and theories in neu-
AUTHOR DECLARATIONS
roscience. Even though backpropagation in its current form might
not occur in the brain, the idea that the brain might develop similar Conflict of Interest
internal representations to ANNs despite such different mechanisms
of learning is an exciting open question that may lead to a deeper The authors have no conflicts to disclose.
understanding of the brain and AI.
Author Contributions
Explorations are now extending beyond static network dynam-
ics to the networks that unravel as a function of time, much like Samuel Schmidgall: Conceptualization (lead); Investigation (lead);
the brain. As we further develop algorithms for continual and life- Visualization (lead); Writing – original draft (lead); Writing – review
long learning, it may become clear that our models need to reflect & editing (lead). Rojin Ziaei: Conceptualization (equal); Writing –
the learning mechanisms observed in nature more closely. This shift original draft (equal); Writing – review & editing (equal). Jascha
in focus calls for the integration of local learning rules—those that Achterberg: Writing – review & editing (equal). Louis Kirsch: Writ-
mirror the brain’s own methods—into ANNs. ing – original draft (supporting); Writing – review & editing (sup-
We are convinced that adopting more biologically authentic porting). S. Pardis Hajiseyedrazi: Visualization (equal); Writing –
learning rules within ANNs will not only yield the aforementioned review & editing (equal). Jason Eshraghian: Supervision (equal);
benefits, but it will also serve to point neuroscience researchers in Writing – original draft (supporting); Writing – review & editing
the right direction. In other words, it is a strategy with a two-fold (equal).

APL Mach. Learn. 2, 021501 (2024); doi: 10.1063/5.0186054 2, 021501-10


© Author(s) 2024
APL Machine Learning REVIEW [Link]/aip/aml

23
DATA AVAILABILITY D. A. McCormick, D. B. Nestvogel, and B. J. He, “Neuromodulation of brain
state and behavior,” Annu. Rev. Neurosci. 43, 391–415 (2020).
Data sharing is not applicable to this article as no new data were 24
W. C. Abraham and M. F. Bear, “Metaplasticity: The plasticity of synaptic
created or analyzed in this study. plasticity,” Trends Neurosci. 19, 126–130 (1996).
25
W. C. Abraham, “Metaplasticity: Tuning synapses and networks for plasticity,”
Nat. Rev. Neurosci. 9, 387 (2008).
REFERENCES 26
P. Yger and M. Gilson, “Models of metaplasticity: A review of concepts,” Front.
1 Comput. Neurosci. 9, 138 (2015).
K. M. Newell, Y.-T. Liu, and G. Mayer-Kress, “Time scales in motor learning and 27
D. A. Lim and A. Alvarez-Buylla, “The adult ventricular–subventricular zone
development,” Psychol. Rev. 108, 57 (2001).
2 (V-SVZ) and olfactory bulb (OB) neurogenesis,” Cold Spring Harbor Perspect.
M. G. Stokes, “‘Activity-silent’ working memory in prefrontal cortex: A dynamic
Biol. 8, a018820 (2016).
coding framework,” Trends Cognit. Sci. 19, 394–405 (2015). 28
3 S. S. Roeder, P. Burkardt, F. Rost, J. Rode, L. Brusch, R. Coras, E. Englund, K.
W. Gerstner, M. Lehmann, V. Liakoni, D. Corneil, and J. Brea, “Eligibility traces
Håkansson, G. Possnert, M. Salehpour et al., “Evidence for postnatal neurogenesis
and plasticity on behavioral time scales: Experimental support of NeoHebbian
in the human amygdala,” Commun. Biol. 5, 366 (2022).
three-factor learning rules,” Front. Neural Circuits 12, 53 (2018). 29
4 H. G. Kuhn, H. Dickinson-Anson, and F. H. Gage, “Neurogenesis in the dentate
I. Beltagy, K. Lo, and A. Cohan, “SciBERT: A pretrained language model for
gyrus of the adult rat: Age-related decrease of neuronal progenitor proliferation,”
scientific text,” arXiv:1903.10676 (2019).
5 J. Neurosci. 16, 2027–2033 (1996).
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Nee- 30
G. Kempermann, H. G. Kuhn, and F. H. Gage, “Experience-induced neurogen-
lakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot
esis in the senescent dentate gyrus,” J. Neurosci. 18, 3206–3212 (1998).
learners,” Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020). 31
6 H. Van Praag, T. Shubert, C. Zhao, and F. H. Gage, “Exercise enhances learning
A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-
and hippocampal neurogenesis in aged mice,” J. Neurosci. 25, 8680–8685 (2005).
conditional image generation with clip latents,” arXiv:2204.06125 (2022). 32
7 M. S. Nokia, S. Lensu, J. P. Ahtiainen, P. P. Johansson, L. G. Koch, S. L. Britton,
C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. Denton, S. K. S. Ghasemipour,
B. K. Ayan, S. S. Mahdavi, R. G. Lopes et al., “Photorealistic text-to-image diffusion and H. Kainulainen, “Physical exercise increases adult hippocampal neurogene-
models with deep language understanding,” arXiv:2205.11487 (2022). sis in male rats provided it is aerobic and sustained,” J. Physiol. 594, 1855–1873
8 (2016).
A. Kumar, Z. Fu, D. Pathak, and J. Malik, “RMA: Rapid motor adaptation for 33
legged robots,” arXiv:2107.04034 (2021). E. D. Kirby, S. E. Muroy, W. G. Sun, D. Covarrubias, M. J. Leong, L. A. Bar-
9 chas, and D. Kaufer, “Acute stress enhances adult rat hippocampal neurogenesis
T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning
and activation of newborn neurons via secreted astrocytic FGF2,” eLife 2, e00362
robust perceptive locomotion for quadrupedal robots in the wild,” Sci. Robot. 7,
(2013).
eabk2822 (2022). 34
10 S.-H. Baik, V. Rajeev, D. Y.-W. Fann, D.-G. Jo, and T. V. Arumugam,
Z. Fu, X. Cheng, and D. Pathak, “Deep whole-body control: Learning a unified
“Intermittent fasting increases adult hippocampal neurogenesis,” Brain Behav. 10,
policy for manipulation and locomotion,” arXiv:2210.10044 (2022).
11 e01444 (2020).
D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanc- 35
K. J. Todd, A. Serrano, J.-C. Lacaille, and R. Robitaille, “Glial cells in synaptic
tot, L. Sifre, D. Kumaran, T. Graepel et al., “A general reinforcement learning
algorithm that masters chess, shogi, and Go through self-play,” Science 362, plasticity,” J. Physiol. 99, 75–83 (2006).
36
1140–1144 (2018). W.-S. Chung, N. J. Allen, and C. Eroglu, “Astrocytes control synapse formation,
12
D. Driess, F. Xia, M. S. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. function, and elimination,” Cold Spring Harbor Perspect. Biol. 7, a020370 (2015).
37
Tompson, Q. Vuong, T. Yu et al., “PaLM-E: An embodied multimodal language M. Zhou, J. Cornell, S. Salinas, and H. Y. Huang, “Microglia regulation of
model,” arXiv:2303.03378 (2023). synaptic plasticity and learning and memory,” Neural Regener. Res. 17, 705
13
J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. (2022).
38
Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al., “Overcoming R. Desislavov, F. Martínez-Plumed, and J. Hernández-Orallo, “Compute and
catastrophic forgetting in neural networks,” Proc. Natl. Acad. Sci. U. S. A. 114, energy consumption trends in deep learning inference,” arXiv:2109.05472 (2021).
39
3521–3526 (2017). F. Daghero, D. J. Pagliari, and M. Poncino, “Energy-efficient deep learning
14
G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, “Continual lifelong inference on edge devices,” in Advances in Computers (Elsevier, 2021), Vol. 122,
learning with neural networks: A review,” Neural networks 113, 54–71 (2019). pp. 247–301.
15 40
D. Kudithipudi, M. Aguilar-Simon, J. Babb, M. Bazhenov, D. Blackiston, J. Bon- M. Pfeiffer and T. Pfeil, “Deep learning with spiking neurons: Opportunities and
gard, A. P. Brna, S. Chakravarthi Raja, N. Cheney, J. Clune et al., “Biological challenges,” Front. Neurosci. 12, 774 (2018).
41
underpinnings for lifelong learning machines,” Nat. Mach. Intell. 4, 196–210 W. Maass, “Networks of spiking neurons: The third generation of neural
(2022). network models,” Neural Networks 10, 1659–1671 (1997).
16 42
V. M. Ho, J.-A. Lee, and K. C. Martin, “The cell biology of synaptic plasticity,” C. D. Schuman, S. R. Kulkarni, M. Parsa, J. P. Mitchell, P. Date, and B. Kay,
Science 334, 623–628 (2011). “Opportunities for neuromorphic computing algorithms and applications,” Nat.
17 Comput. Sci. 2, 10–19 (2022).
A. Citri and R. C. Malenka, “Synaptic plasticity: Multiple forms, functions, and
43
mechanisms,” Neuropsychopharmacology 33, 18–41 (2008). A. Gilra and W. Gerstner, “Predicting non-linear dynamics by stable local
18 learning in a recurrent spiking neural network,” eLife 6, e28295 (2017).
W. C. Abraham, O. D. Jones, and D. L. Glanzman, “Is plasticity of synapses the
44
mechanism of long-term memory storage?,” NPJ Sci. Learn. 4, 9–10 (2019). K. D. Carlson, M. Richert, N. Dutt, and J. L. Krichmar, “Biologically plausi-
19
R. S. Zucker and W. G. Regehr, “Short-term synaptic plasticity,” Annu. Rev. ble models of homeostasis and STDP: Stability and learning in spiking neural
Physiol. 64, 355–405 (2002). networks,” in 2013 International Joint Conference on Neural Networks (IJCNN)
20 (IEEE, 2013), pp. 1–8.
R. Yuste and T. Bonhoeffer, “Morphological changes in dendritic spines asso-
45
ciated with long-term synaptic plasticity,” Annu. Rev. Neurosci. 24, 1071–1089 B. Walters, C. Lammie, S. Yang, M. V. Jacob, and M. Rahimi Azghadi,
(2001). “Unsupervised character recognition with graphene memristive synapses,” Neural
21 Comput. Appl. 36, 1569–1584 (2023).
N. Frémaux and W. Gerstner, “Neuromodulated spike-timing-dependent plas-
46
ticity, and theory of three-factor learning rules,” Front. Neural Circuits 9, 85 R.-J. Zhu, Q. Zhao, and J. K. Eshraghian, “SpikeGPT: Generative pre-trained
(2016). language model with spiking neural networks,” arXiv:2302.13939 (2023).
22 47
Z. Brzosko, S. B. Mierau, and O. Paulsen, “Neuromodulation of spike-timing- D. O. Hebb, The Organization of Behavior: A Neuropsychological Theory
dependent plasticity: Past, present, and future,” Neuron 103, 563–581 (2019). (Psychology Press, 2005).

APL Mach. Learn. 2, 021501 (2024); doi: 10.1063/5.0186054 2, 021501-11


© Author(s) 2024
APL Machine Learning REVIEW [Link]/aip/aml

48
H. Markram, W. Gerstner, and P. J. Sjöström, “A history of spike-timing- time,” in International Conference on Artificial Neural Networks (Springer, 2023),
dependent plasticity,” Front. Synaptic Neurosci. 3, 4 (2011). pp. 556–559.
49 74
S. Löwel and W. Singer, “Selection of intrinsic horizontal connections in the S. Bartunov, A. Santoro, B. Richards, L. Marris, G. E. Hinton, and T. Lillicrap,
visual cortex by correlated neuronal activity,” Science 255, 209–212 (1992). “Assessing the scalability of biologically-motivated deep learning algorithms and
50
C. J. Shatz, “The developing brain,” Sci. Am. 267, 60–67 (1992). architectures,” in Advances in Neural Information Processing Systems 31 (Curran
51 Associates, 2018).
W. Gerstner, W. M. Kistler, R. Naud, and L. Paninski, Neuronal Dynamics: From
75
Single Neurons to Networks and Models of Cognition (Cambridge University Press, W. Xiao, H. Chen, Q. Liao, and T. Poggio, “Biologically-plausible learning
2014), Chap. 10. algorithms can scale to large datasets,” arXiv:1811.03567 (2018).
52 76
J. J. Hopfield, “Neural networks and physical systems with emergent collective G. Bellec, F. Scherr, E. Hajek, D. Salaj, A. Subramoney, R. Legenstein, and
computational abilities,” Proc. Natl. Acad. Sci. U. S. A. 79, 2554–2558 (1982). W. Maass, “Eligibility traces provide a data-inspired alternative to backprop-
53
Z. Vasilkoski, H. Ames, B. Chandler, A. Gorchetchnikov, J. Léveillé, G. Livitz, agation through time,” in Real Neurons Hidden Units: Future Directions at
E. Mingolla, and M. Versace, “Review of stability properties of neural plastic- the Intersection of Neuroscience and Artificial Intelligence@ NeurIPS 2019,
ity rules for implementation on memristive neuromorphic hardware,” in 2011 2019.
International Joint Conference on Neural Networks (IEEE, 2011), pp. 2563–2569. 77
R. J. Williams and D. Zipser, “A learning algorithm for continually running fully
54
N. Frémaux, H. Sprekeler, and W. Gerstner, “Functional requirements recurrent neural networks,” Neural Comput. 1, 270–280 (1989).
for reward-modulated spike-timing-dependent plasticity,” J. Neurosci. 30, 78
J. K. Eshraghian, M. Ward, E. O. Neftci, X. Wang, G. Lenz, G. Dwivedi, M.
13326–13337 (2010). Bennamoun, D. S. Jeong, and W. D. Lu, “Training spiking neural networks using
55
I. R. Fiete and H. S. Seung, “Gradient learning in spiking neural net- lessons from deep learning,” Proc. IEEE 111, 1016 (2023).
works by dynamic perturbation of conductances,” Phys. Rev. Lett. 97, 048104 79
Y. H. Liu, S. Smith, S. Mihalas, E. Shea-Brown, and U. Sümbül, “Cell-
(2006). type–specific neuromodulation guides synaptic credit assignment in a spiking
56
I. R. Fiete, M. S. Fee, and H. S. Seung, “Model of birdsong learning based neural network,” Proc. Natl. Acad. Sci. U. S. A. 118, e2111821118 (2021).
on gradient estimation by dynamic perturbation of neural conductances,” J. 80
Y. H. Liu, S. Smith, S. Mihalas, E. Shea-Brown, and U. Sümbül,
Neurophysiol. 98, 2038–2057 (2007). “Biologically-plausible backpropagation through arbitrary timespans via
57
R. J. Williams, “Simple statistical gradient-following algorithms for connection- local neuromodulators,” arXiv:2206.01338 (2022).
ist reinforcement learning,” Reinf. Learn. 173, 5–32 (1992). 81
58
S. J. Smith, U. Sümbül, L. T. Graybuck, F. Collman, S. Seshamani, R. Gala,
T. Miconi, “Biologically plausible learning in recurrent neural networks O. Gliko, L. Elabbady, J. A. Miller, T. E. Bakken et al., “Single-cell transcrip-
reproduces neural dynamics observed during cognitive tasks,” eLife 6, e20899 tomic evidence for dense intracortical neuropeptide networks,” eLife 8, e47889
(2017). (2019).
59
J. C. Whittington, T. H. Muller, S. Mark, G. Chen, C. Barry, N. Burgess, and 82
S. Hochreiter and J. Schmidhuber, “Flat minima,” Neural Comput. 9, 1–42
T. E. Behrens, “The Tolman-Eichenbaum machine: Unifying space and rela-
(1997).
tional memory through generalization in the hippocampal formation,” Cell 183, 83
Y. H. Liu, A. Ghosh, B. A. Richards, E. Shea-Brown, and G. Lajoie, “Beyond
1249–1263.e23 (2020).
60 accuracy: Generalization properties of bio-plausible temporal credit assignment
G. Bellec, F. Scherr, A. Subramoney, E. Hajek, D. Salaj, R. Legenstein, and W.
rules,” arXiv:2206.00823 (2022).
Maass, “A solution to the learning dilemma for recurrent networks of spiking 84
neurons,” Nat. Commun. 11, 3625 (2020). J. Schmidhuber, “Evolutionary principles in self-referential learning. On learn-
61 ing now to learn: The meta-meta...-hook,” Ph.D. thesis, Technische Universität
S. Schmidgall, J. Ashkanazy, W. Lawson, and J. Hays, “SpikePropamine: Differ-
München, 1987.
entiable plasticity in spiking neural networks,” Front. Neurorobotics 15, 629210 85
(2021). T. Miconi, K. Stanley, and J. Clune, “Differentiable plasticity: Training plastic
62
S. Schmidgall and J. Hays, “Meta-spikePropamine: Learning to learn with neural networks with backpropagation,” in International Conference on Machine
synaptic plasticity in spiking neural networks,” Front. Neurosci. 17, 671 (2023). Learning (PMLR, 2018), pp. 3559–3568.
86
63
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by T. Miconi, A. Rawal, J. Clune, and K. O. Stanley, “Backpropamine: Training
back-propagating errors,” Nature 323, 533–536 (1986). self-modifying neural networks with differentiable neuromodulated plasticity,”
64
S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv:2002.10585 (2020).
87
arXiv:1609.04747 (2016). Y. Duan, Z. Jia, Q. Li, Y. Zhong, and K. Ma, “Hebbian and gradient-based
65
T. P. Lillicrap, A. Santoro, L. Marris, C. J. Akerman, and G. Hinton, plasticity enables robust memory and rapid learning in RNNs,” arXiv:2302.03235
“Backpropagation and the brain,” Nat. Rev. Neurosci. 21, 335–346 (2020). (2023).
88
66
J. C. Whittington and R. Bogacz, “Theories of error back-propagation in the D. Tyulmankov, G. R. Yang, and L. Abbott, “Meta-learning synaptic plasticity
brain,” Trends Cognit. Sci. 23, 235–250 (2019). and memory addressing for continual familiarity detection,” Neuron 110, 544–557
67
J. H. Holland, “Genetic algorithms,” Sci. Am. 267, 66–72 (1992). (2022).
89
68
K. De Jong, “Evolutionary computation: A unified approach,” in Proceedings H. G. Rodriguez, Q. Guo, and T. Moraitis, “Short-term plasticity neurons learn-
of the 2016 on Genetic and Evolutionary Computation Conference Companion ing to learn and forget,” in International Conference on Machine Learning (PMLR,
(Association for Computing Machinery, 2017), pp. 185–199. 2022), pp. 18704–18722.
90
69
T. Salimans, J. Ho, X. Chen, S. Sidor, and I. Sutskever, “Evolution strategies as a R. B. Palm, E. Najarro, and S. Risi, “Testing the genomic bottleneck hypothe-
scalable alternative to reinforcement learning,” arXiv:1703.03864 (2017). sis in hebbian meta-learning,” in NeurIPS 2020 Workshop on Pre-Registration in
70
X. Zhang, J. Clune, and K. O. Stanley, “On the relationship between the Machine Learning (PMLR, 2021), pp. 100–110.
91
OpenAI evolution strategy and stochastic gradient descent,” arXiv:1712.06564 A. Gruslys, R. Munos, I. Danihelka, M. Lanctot, and A. Graves, “Memory-
(2017). efficient backpropagation through time,” in Advances in Neural Information
71
T. P. Lillicrap, D. Cownden, D. B. Tweed, and C. J. Akerman, “Random Processing Systems 29 (Curran Associates, 2016).
92
feedback weights support learning in deep neural networks,” arXiv:1411.0247 F. Scherr, C. Stöckl, and W. Maass, “One-shot learning with spiking neural
(2014). networks,” bioRxiv:156513v1 (2020).
72 93
A. Nøkland, “Direct feedback alignment provides learning in deep neural P. Bertens and S.-W. Lee, “Network of evolvable neural units can learn synaptic
networks,” in Advances in Neural Information Processing Systems 29 (Curran learning rules and spiking dynamics,” Nat. Mach. Intell. 2, 791–799 (2020).
94
Associates, 2016). R. Zhu, J. Eshraghian, and Z. Kuncic, “Memristive reservoirs learn to learn,”
73
K. Max, L. Kriener, G. Pineda García, T. Nowotny, W. Senn, and M. A. in Proceedings of the 2023 International Conference on Neuromorphic Systems
Petrovici, “Learning efficient backprojections across cortical hierarchies in real (Association for Computing Machinery, 2023), pp. 1–7.

APL Mach. Learn. 2, 021501 (2024); doi: 10.1063/5.0186054 2, 021501-12


© Author(s) 2024
APL Machine Learning REVIEW [Link]/aip/aml

95 118
S. Garg, D. Tsipras, P. S. Liang, and G. Valiant, “What can transformers learn M. R. Azghadi, C. Lammie, J. K. Eshraghian, M. Payvand, E. Donati, B. Linares-
in-context? A case study of simple function classes,” Adv. Neural Inf. Process. Syst. Barranco, and G. Indiveri, “Hardware implementation of deep network accel-
35, 30583–30598 (2022). erators towards healthcare and biomedical applications,” IEEE Trans. Biomed.
96
L. Kirsch, J. Harrison, J. Sohl-Dickstein, and L. Metz, “General-purpose Circuits Syst. 14, 1138–1159 (2020).
119
in-context learning by meta-learning transformers,” arXiv:2212.04458 (2022). L. Khacef, P. Klein, M. Cartiglia, A. Rubino, G. Indiveri, and E. Chicca,
97
S. Hochreiter, A. S. Younger, and P. R. Conwell, “Learning to learn using “Spike-based local synaptic plasticity: A survey of computational models and
gradient descent,” in Artificial Neural Networks—ICANN 2001: International Con- neuromorphic circuits,” arXiv:2209.15536 (2022).
120
ference Vienna, Austria, August 21–25, 2001 Proceedings 11 (Springer, 2001), pp. M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao, S. H. Choday, G. Dimou,
87–94. P. Joshi, N. Imam, S. Jain et al., “Loihi: A neuromorphic manycore processor with
98
L. Kirsch and J. Schmidhuber, “Meta learning backpropagation and improving on-chip learning,” IEEE Micro 38, 82–99 (2018).
121
it,” Adv. Neural Inf. Process. Syst. 34, 14122–14134 (2021). F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla, N.
99 Imam, Y. Nakamura, P. Datta, G.-J. Nam et al., “TrueNorth: Design and tool flow
I. Schlag, K. Irie, and J. Schmidhuber, “Linear transformers are secretly fast
of a 65 mW 1 million neuron programmable neurosynaptic chip,” IEEE Trans.
weight programmers,” in International Conference on Machine Learning (PMLR,
Comput.-Aided Des. Integr. Circuits Syst. 34, 1537–1557 (2015).
2021), pp. 9355–9366. 122
100 E. Painkras, L. A. Plana, J. Garside, S. Temple, F. Galluppi, C. Patterson, D. R.
E. Akyürek, D. Schuurmans, J. Andreas, T. Ma, and D. Zhou, “What
Lester, A. D. Brown, and S. B. Furber, “SpiNNaker: A 1-W 18-core system-on-chip
learning algorithm is in-context learning? Investigations with linear models,”
for massively-parallel neural network simulation,” IEEE J. Solid-State Circuits 48,
arXiv:2211.15661 (2022).
101
1943–1953 (2013).
J. von Oswald, E. Niklasson, E. Randazzo, J. Sacramento, A. Mordvintsev, A. 123
F. Modaresi, M. Guthaus, and J. K. Eshraghian, “OpenSpike: An OpenRAM
Zhmoginov, and M. Vladymyrov, “Transformers learn in-context by gradient SNN accelerator,” arXiv:2302.01015 (2023).
descent,” arXiv:2212.07677 (2022). 124
102
A. Mehonic and J. Eshraghian, “Brains and bytes: Trends in neuromorphic
A. Soltoggio, K. O. Stanley, and S. Risi, “Born to learn: The inspira- technology,” APL Mach. Learn. 1, 020401 (2023).
tion, progress, and future of evolved plastic artificial neural networks,” Neural 125
D. Floreano, A. J. Ijspeert, and S. Schaal, “Robotics and neuroscience,” Curr.
Networks 108, 48–67 (2018). Biol. 24, R910–R920 (2014).
103
S. Schmidgall, “Adaptive reinforcement learning through evolving self- 126
Z. Bing, C. Meschede, F. Röhrbein, K. Huang, and A. C. Knoll, “A survey
modifying neural networks,” in Proceedings of the 2020 Genetic and Evolutionary of robotics control based on learning-inspired spiking neural networks,” Front.
Computation Conference Companion (Association for Computing Machinery, Neurorobotics 12, 35 (2018).
2020), pp. 89–90. 127
E. Grinke, C. Tetzlaff, F. Wörgötter, and P. Manoonpong, “Synaptic plasticity
104
E. Najarro and S. Risi, “Meta-learning through Hebbian plasticity in random in a recurrent neural network for versatile and adaptive behaviors of a walking
networks,” Adv. Neural Inf. Process. Syst. 33, 20719–20731 (2020). robot,” Front. Neurorobotics 9, 11 (2015).
105 128
J. Jordan, M. Schmidt, W. Senn, and M. A. Petrovici, “Evolving interpretable J. Kaiser, M. Hoff, A. Konle, J. C. Vasquez Tieck, D. Kappel, D. Reichard, A.
plasticity for spiking networks,” eLife 10, e66273 (2021). Subramoney, R. Legenstein, A. Roennau, W. Maass, and R. Dillmann, “Embodied
106
P. Pagliuca, N. Milano, and S. Nolfi, “Efficacy of modern neuro-evolutionary synaptic plasticity with online reinforcement learning,” Front. Neurorobotics 13,
strategies for continuous control optimization,” Front. Robot. AI 7, 98 (2020). 81 (2019).
107 129
J. Schmidhuber, “A ‘self-referential’weight matrix,” in ICANN’93: Proceedings S. Schmidgall and J. Hays, “Synaptic motor adaptation: A three-factor learning
of the International Conference on Artificial Neural Networks, Amsterdam, The rule for adaptive robotic control in spiking neural networks,” in Proceedings of the
Netherlands, 13–16 September 1993 (Springer, 1993), pp. 446–450. 2023 International Conference on Neuromorphic Systems, ICONS’23 (Association
108
L. Kirsch and J. Schmidhuber, “Eliminating meta optimization through self- for Computing Machinery, New York, NY, 2023).
130
referential meta learning,” arXiv:2212.14392 (2022). P. Arena, S. De Fiore, L. Patané, M. Pollino, and C. Ventura, “Insect inspired
109
K. Irie, I. Schlag, R. Csordás, and J. Schmidhuber, “A modern self-referential unsupervised learning for tactic and phobic behavior enhancement in a hybrid
weight matrix that learns to modify itself,” in International Conference on Machine robot,” in 2010 International Joint Conference on Neural Networks (IJCNN) (IEEE,
Learning (PMLR, 2022), pp. 9660–9677. 2010), pp. 1–8.
131
110
L. Kirsch and J. Schmidhuber, “Self-referential meta learning,” in First D. Hu, X. Zhang, Z. Xu, S. Ferrari, and P. Mazumder, “Digital implementation
Conference on Automated Machine Learning (Late-Breaking Workshop), 2022. of a spiking neural network (SNN) capable of spike-timing-dependent plastic-
111 ity (STDP) learning,” in 14th IEEE International Conference on Nanotechnology
L. Metz, C. D. Freeman, N. Maheswaranathan, and J. Sohl-Dickstein,
(IEEE, 2014), pp. 873–876.
“Training learned optimizers with randomly initialized learned optimizers,” 132
X. Wang, Z.-G. Hou, F. Lv, M. Tan, and Y. Wang, “Mobile robots modular nav-
arXiv:2101.07367 (2021).
112 igation controller using spiking neural networks,” Neurocomputing 134, 230–238
R. T. Lange, T. Schaul, Y. Chen, T. Zahavy, V. Dallibard, C. Lu, S. Singh, and S. (2014).
Flennerhag, “Discovering evolution strategies via meta-black-box optimization,” 133
S. A. Neymotin, G. L. Chadderdon, C. C. Kerr, J. T. Francis, and W. W. Lytton,
arXiv:2211.11260 (2022).
113
“Reinforcement learning of two-joint virtual arm reaching in a computer model
J. X. Wang, Z. Kurth-Nelson, D. Tirumala, H. Soyer, J. Z. Leibo, R. Munos, of sensorimotor cortex,” Neural Comput. 25, 3263–3293 (2013).
C. Blundell, D. Kumaran, and M. Botvinick, “Learning to reinforcement learn,” 134
S. Dura-Bernal, X. Zhou, S. A. Neymotin, A. Przekwas, J. T. Francis, and W.
arXiv:1611.05763 (2016). W. Lytton, “Cortical spiking network interfaced with virtual musculoskeletal arm
114
Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel, “Rl2: and robotic arm,” Front. Neurorobotics 9, 13 (2015).
Fast reinforcement learning via slow reinforcement learning,” arXiv:1611.02779 135
W. Ilg and K. Berns, “A learning architecture based on reinforcement learning
(2016). for adaptive control of the walking machine LAURON,” Robot. Auton. Syst. 15,
115
L. Kirsch, S. Flennerhag, H. van Hasselt, A. Friesen, J. Oh, and Y. Chen, 321–334 (1995).
“Introducing symmetries to black box meta reinforcement learning,” Proc. AAAI 136
A. J. Ijspeert, “Biorobotics: Using robots to emulate and investigate agile
Conf. Artif. Intell. 36, 7202–7210 (2022). locomotion,” Science 346, 196–203 (2014).
116 137
C. D. Schuman, T. E. Potok, R. M. Patton, J. D. Birdwell, M. E. Dean, G. S. F. Faghihi, A. A. Moustafa, R. Heinrich, and F. Wörgötter, “A computational
Rose, and J. S. Plank, “A survey of neuromorphic computing and neural networks model of conditioning inspired by Drosophila olfactory system,” Neural Networks
in hardware,” arXiv:1705.06963 (2017). 87, 96–108 (2017).
117 138
J.-Q. Yang, R. Wang, Y. Ren, J.-Y. Mao, Z.-P. Wang, Y. Zhou, and S.-T. Han, N. S. Szczecinski, C. Goldsmith, W. Nourse, and R. D. Quinn, “A perspective
“Neuromorphic engineering: From biological to spike-based hardware nervous on the neuromorphic control of legged locomotion in past, present, and future
systems,” Adv. Mater. 32, 2003610 (2020). insect-like robots,” Neuromorphic Comput. Eng. 3, 023001 (2023).

APL Mach. Learn. 2, 021501 (2024); doi: 10.1063/5.0186054 2, 021501-13


© Author(s) 2024
APL Machine Learning REVIEW [Link]/aip/aml

139 149
M. Botvinick, J. X. Wang, W. Dabney, K. J. Miller, and Z. Kurth-Nelson, K. Shaheen, M. A. Hanif, O. Hasan, and M. Shafique, “Continual learning
“Deep reinforcement learning and its neuroscientific implications,” Neuron 107, for real-world autonomous systems: Algorithms, challenges and frameworks,”
603–616 (2020). J. Intell. Robot. Syst. 105, 9 (2022).
140 150
K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “A brief A. Banino, C. Barry, B. Uria, C. Blundell, T. Lillicrap, P. Mirowski, A. Pritzel,
survey of deep reinforcement learning,” arXiv:1708.05866 (2017). M. J. Chadwick, T. Degris, J. Modayil et al., “Vector-based navigation using grid-
141
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. like representations in artificial agents,” Nature 557, 429–433 (2018).
151
Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control C. J. Cueva and X.-X. Wei, “Emergence of grid-like representations by train-
through deep reinforcement learning,” Nature 518, 529–533 (2015). ing recurrent neural networks to perform spatial localization,” arXiv:1803.07770
142
M. Watabe-Uchida, N. Eshel, and N. Uchida, “Neural circuitry of reward (2018).
152
prediction error,” Annu. Rev. Neurosci. 40, 373–394 (2017). Y. Gao, “A computational model of learning flexible navigation in a maze by
143 layout-conforming replay of place cells,” Front. Comput. Neurosci. 17, 1053097
L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A
(2023).
survey,” J. Artif. Intell. Res. 4, 237–285 (1996). 153
144 M. Schrimpf, J. Kubilius, H. Hong, N. J. Majaj, R. Rajalingham, E. B. Issa,
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (MIT
K. Kar, P. Bashivan, J. Prescott-Roy, F. Geiger et al., “Brain-score: Which arti-
Press, 2018).
145
ficial neural network for object recognition is most brain-like?,” bioRxiv:407007
X. B. Peng, P. Abbeel, S. Levine, and M. Van de Panne, “DeepMimic: Example- (2018).
guided deep reinforcement learning of physics-based character skills,” ACM 154
C. Zhuang, S. Yan, A. Nayebi, M. Schrimpf, M. C. Frank, J. J. DiCarlo, and D.
Trans. Graphics 37, 1–14 (2018). L. K. Yamins, “Unsupervised neural network models of the ventral visual stream,”
146
R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mordatch, Proc. Natl. Acad. Sci. U. S. A. 118, e2014196118 (2021).
“Multi-agent actor-critic for mixed cooperative-competitive environments,” in 155
G. Jacob, R. Pramod, H. Katti, and S. Arun, “Qualitative similarities and dif-
Advances in Neural Information Processing Systems 30 (Curran Associates, ferences in visual object representations between brains and deep networks,” Nat.
2017). Commun. 12, 1872 (2021).
147 156
C. La Rosa, R. Parolisi, and L. Bonfanti, “Brain structural plasticity: From adult A. Doerig, R. Sommers, K. Seeliger, B. Richards, J. Ismael, G. Lindsay, K. Kord-
neurogenesis to immature neurons,” Front. Neurosci. 14, 75 (2020). ing, T. Konkle, M. A. Van Gerven, N. Kriegeskorte et al., “The neuroconnectionist
148
T. Lesort, V. Lomonaco, A. Stoian, D. Maltoni, D. Filliat, and N. Díaz- research programme,” arXiv:2209.03718 (2022).
157
Rodríguez, “Continual learning for robotics: Definition, framework, learning D. Hassabis, D. Kumaran, C. Summerfield, and M. Botvinick, “Neuroscience-
strategies, opportunities and challenges,” Inf. Fusion 58, 52–68 (2020). inspired artificial intelligence,” Neuron 95, 245–258 (2017).

APL Mach. Learn. 2, 021501 (2024); doi: 10.1063/5.0186054 2, 021501-14


© Author(s) 2024

You might also like