Barendregt jcns19
Barendregt jcns19
[Link]
Received: 25 March 2019 / Revised: 25 September 2019 / Accepted: 1 October 2019 / Published online: 16 November 2019
© Springer Science+Business Media, LLC, part of Springer Nature 2019
Abstract
Decision-making in dynamic environments typically requires adaptive evidence accumulation that weights new evidence
more heavily than old observations. Recent experimental studies of dynamic decision tasks require subjects to make
decisions for which the correct choice switches stochastically throughout a single trial. In such cases, an ideal observer’s
belief is described by an evolution equation that is doubly stochastic, reflecting stochasticity in the both observations
and environmental changes. In these contexts, we show that the probability density of the belief can be represented
using differential Chapman-Kolmogorov equations, allowing efficient computation of ensemble statistics. This allows us
to reliably compare normative models to near-normative approximations using, as model performance metrics, decision
response accuracy and Kullback-Leibler divergence of the belief distributions. Such belief distributions could be obtained
empirically from subjects by asking them to report their decision confidence. We also study how response accuracy is
affected by additional internal noise, showing optimality requires longer integration timescales as more noise is added.
Lastly, we demonstrate that our method can be applied to tasks in which evidence arrives in a discrete, pulsatile fashion,
rather than continuously.
subjects use approximately normative decision strategies, The average direction of motion, which we call the state
and when and how they fail to do so, can be computa- s(t), switches in time between states s+ (right-moving) and
tionally challenging. For instance, one may wish to study s− (left-moving) as a two-state continuous time Markov
how a subject’s estimate of the environmental timescale process with hazard rate h, so P (s(t +t) = s(t)) = ht +
impacts their response accuracy, or how heuristic evidence- o(t).1 The observer is interrogated at a random time,
discounting strategies compare to optimal ones (Glaze et al. T , and reports their belief about the current direction of
2018; Radillo et al. 2019). To address these questions, pre- motion, s(T ). The most reliable state estimate is obtained by
vious work has primarily relied on Monte Carlo simulations computing the log-likelihood ratio (LLR) between choices
(Veliz-Cuba et al. 2016; Piet et al. 2018), which can be from (noisy) observations, ξ(t), of the moving dot stimulus.
computationally expensive. Assuming the observer maintains a fixed estimate of the
Here, we show how to reframe dynamic decision environmental hazard rate, h̃, this evidence-accumulation
models by deriving corresponding differential Chapman- process converges to a single stochastic differential equation
+ |ξ(t))
Kolmogorov (CK) equations (See Eq. (6)). This approach (SDE) for the belief (or LLR) y(t) = log PP (s (s− |ξ(t)) of the
allows us to quickly compute observer beliefs and observer (See Veliz-Cuba et al. 2016 and Appendix A for
performance, and compare models. Realizations of our modeling assumptions and details):
models are described by stochastic differential equations
dy(t) = g(t)dt + ρdWt − 2h̃ sinh(y(t))dt, (1)
with a drift term that switches according to a two-state
Markov process, and leak terms that discount evidence. To where g(t) is a telegraph process that switches between
describe these models using CK equations, we treat the two values, ±g, defined as the expected LLR update from
switching process as a source of dichotomous noise, and a new observation given the environmental state, with
condition on its state to track conditional belief densities. transition rate h; this provides evidence about the current
These methods allow us to quickly answer questions state, s(t). The increment of a Wiener process dWt is
about how characteristics of optimal models and their scaled by ρ, defined as the standard deviation of the LLR
approximations vary across ranges of task parameters. update from a new observation given the environmental
Nonlinear, normative models can thus be compared state. The observer’s assumed hazard rate h̃ shapes the
to approximate linear and cubic discounting models, evidence discounting process. If we assume observations of
models with internal noise, and explicitly solvable bounded the state s(t) are drawn from normal distributions, the input
accumulation models with no flux boundaries. These to the evidence accumulation model can be described by a
models all can obtain near-optimal response accuracy, but single parameter (Veliz-Cuba et al. 2016). Combining our
each has very different belief distributions. This suggests assumptions and rescaling time as ht → t, we obtain the
that subject confidence reports could be used to distinguish following SDE for the observer’s belief (See Appendix A)
subject decision strategies in data. in rescaled time (different from the units in Eq. (1)):
Detailed analyses, including belief distribution calcula-
tions, can be performed rapidly and accurately with our √ h̃
dy(t) = x(t) · m · dt + 2mdWt − 2 sinh(y(t))dt , (2)
methods, allowing us to see why each approximate model
drift noise h
performs better at different task difficulty levels. Monte nonlinear leak
Carlo methods fare much worse in terms of computation where x(t) ∈ ±1 is a telegraph process with switching rate
time and accuracy (See Fig. 9). Our methods also extend equal to 1. The parameter m gives the mean information
to tasks with pulsatile evidence, where drift and diffusion gain of the observer over the average length of time the
are replaced by jump terms. Our work thus demonstrates environment remains the same (h−1 in original units, 1 in
how partial differential equation descriptions of stochas- rescaled units). As m increases the task becomes easier.
tic decision models, previously successful in understanding Thus, we refer to m as the evidence strength. If we take
decision making in static environments (Busemeyer and h̃ = h, the explicit dependence of Eq. (2) on h vanishes,
Townsend 1992; Moehlis et al. 2004; Bogacz et al. 2006), and, as we show, the observer obtains maximal response
can be extended to dynamic environments. accuracy.
We are primarily interested in how variations of the
evidence strength, m, true hazard rate, h, and the observer’s
2 Normative models for dynamic hazard rate estimate, h̃, impact the response accuracy of
decision-making
1 Thenotation o(t) means all other terms are of smaller order than
We begin by considering the dynamic RDMD task (Glaze
t. More precisely, o(t) represents a function f (t) with the
et al. 2015; Veliz-Cuba et al. 2016); an observer looks at property lim f (t)
t = 0.
a screen of dots which move, on average, right or left. t↓0
J Comput Neurosci (2019) 47:205–222 207
an observer whose belief is represented by Eq. (2). These arises from the Wiener process. This equation could be
quantities can be changed by varying psychophysical task useful for model fitting, since an experimenter would know
parameters (Glaze et al. 2015, 2018; Piet et al. 2018), the realization of x(t), and could then fit the single free
and so provide a means of validating Eq. (2) and its parameter h̃ using response data. A simulation using a
approximations. In addition, a thorough understanding of fixed realization of x(t) shown in Fig. 1a, reveals how the
the normative model’s performance can provide insights belief density tracks the state changes, and the peak of the
into task parameter ranges in which a subject’s belief, distribution tends towards the fixed points ȳ± of Eq. (2)
y(t), is sensitive to the strategy they use (Radillo et al. where 0 = ∓m + 2 hh̃ sinh(ȳ± ).
2019). Obtaining statistics of the solutions to Eq. (2) The evolution of the belief and performance across
requires estimating the distribution of the stochastically trials is determined by extending our model to include the
evolving belief y(t) across time. Monte Carlo approaches distribution of possible realizations of x(t). Treating x(t)
can require many realizations to accurately characterize as dichotomous noise and defining the joint probability
belief distributions (See Fig. 9 in Appendix C), and can thus densities p± (y, t) := p(y, t|s(t) = s± ), we obtain a set of
be computationally prohibitive. coupled differential CK equations (Gardiner 2004):
a c
Fig. 1 The evolution of solutions to the differential Chapman- transfer probability between densities. The drift ±mdt and diffusion
√
Kolmogorov (CK) equations. a Evolution of the probability density 2mdWt terms move probability along the belief variable y. Chang-
given by Eq. (3) for a fixed realization of the telegraph process, ing variables, we define ps (y, t), where changes in state transfer
x(t). A single realization of the belief, y, (solid) and the instan- probability across the y = 0 axis. c Sample evolution of densities
taneous fixed point of Eq. (2) in the absence of Wiener process p± (y, t) and ps (y, t) using Eq. (4a), (4b), and (6) respectively. Shaded
noise (dashed) are superimposed. b Schematic of the differential CK region shows probability that contributes to accuracy of observer, com-
equations. Based on the two-state continuous time Markov process, puted using Eq. (5). Details on numerical methods are provided in
s(t), we define probability densities p± (y, t), where state transitions Appendix C
208 J Comput Neurosci (2019) 47:205–222
Response accuracy – the probability of a correct Solving the CK equations numerically, we observe several
response – is a common measure of subject performance notable features of p± (y, t) and ps (y, t) (Fig. 1c). First, the
in decision making tasks (Gold and Shadlen 2007; Ratcliff densities p± (y, t) are reflections of one another (p+ (y, t) =
and McKoon 2008). Experimentally, response accuracy is p− (−y, t)) due to the symmetry of Eq. (4a), (4b). Second,
defined as the fraction of correct responses at a specific all densities obtain stationarity on the timescale h−1 of the
interrogation time T (Glaze et al. 2015; Piet et al. 2018). In environment, so each is a unimodal function peaked on the
our model, optimal observers make choices in accordance correct side of y = 0. Stationary is reached due to the
with the sign of their belief, sign[y(t)], and response eventual equilibration between the drift and state switching.
accuracy can be computed from solutions to Eq. (4a), (4b) Most of the mass of the stationary densities is on the correct
by computing side of y = 0, and Acc(T ) > 0.5. The long tail of the
∞ 0 distribution ps (y, t) is due to both the constant transfer
Acc(T ) = p+ (y, T ) dy + p− (y, T ) dy. (5) of probability from y to −y due to the switching and the
0 −∞ Wiener process noise. Both the nonlinear leak and switching
The belief, y(t), is correct if it has the same sign as s(t). cause the accuracy Acc(T ) to saturate over time.
This fact along with the inherent odd symmetry of Eq. (4a), Before going further, we note that Eq. (6) satisfies the
(4b) suggests a change of variables −y → y in Eq. (4b). conditions for existence of an ergodic process (Gardiner
The sum, ps (y, t) := p+ (y, t) + p− (−y, t), then evolves 2004): The nonzero jump probabilities, and a positive dif-
according to fusion coefficient, ensure that the differential CK equa-
tion converges to a unique stationary density as t → ∞.
∂ps ∂ps h̃ ∂ ∂ 2 ps
= −m +2 [sinh(y)ps (y, t)] + m 2 This occurs in a relatively short time period; we there-
∂t ∂y h ∂y ∂y fore focus the remainder of our study on steady-state
drift nonlinear leak noise cases. Typically, experimental dynamic decision trials are
+ [ps (−y, t) − ps (y, t)] . (6) sufficiently long to make this assumption of stationarity
switching reasonable (Glaze et al. 2015; Piet et al. 2018).
The new density ps (y, t) defines all belief values y > 0 as 2.2 Evaluating accuracy for mistuned
correct, since the sign of beliefs in the state s− have been evidence-discounting
flipped, −y → y, while the sign of all beliefs in state
s+ remain the same. The density ps (y, t) thus describes Subjects performing decision tasks often must learn the task
beliefs relative to the state, s(t), with each environmental parameters online to improve their performance. Our model
change flipping the sign of the belief, y(t) (See Fig. 1b). can be extended to consider hazard rate learning (Radillo
Equation (5) can therefore be rewritten more simply as et al. 2017; Glaze et al. 2018), but for now we assume that
∞
Acc(T ) = 0 ps (y, T ) dy. By symmetry, we can recover the observer uses a fixed estimate h̃ of the hazard rate for
the two original densities as p± (y, t) = 12 ps (±y, t). their evidence discounting strategy (Glaze et al. 2015).
a c e
accuracy
5
10-4 0.8
-5
10-6 0.6
0 1 0 1 0 5
b d f
sensitivity
0.003
accuracy
0.5 0.02
0.35 0.01
-10 10 0 1.5 0 50
Fig. 2 Performance of normative and mistuned nonlinear observer state, s(t), is shown below (red: s+ ; yellow: s− ). d Observer accuracy,
models. a Evolution of the density ps (y, t) for m = 50 given by for varied h̃, following a change point at t = 0. e Observer steady-
Eq. (6). b Stationary densities, p̄s (y), for m = 5, 50. With stronger state accuracy as a function of h̃ for different values of m is maximized
evidence (m = 50), the stationary distribution has more mass above when h̃ = h = 1. f Curvature of the accuracy functions in e at the max-
y = 0. c Realizations of the belief variable y for different estimated imum, h = 1, as a function of m, shows the observer is most sensitive
hazard rates h̃ and fixed true hazard rate, h = 1. The environmental to changes in their estimated hazard rate h̃ when m ≈ 5
J Comput Neurosci (2019) 47:205–222 209
How does the response accuracy of an observer whose observer’s performance is sensitive to changes in the model.
belief is described by Eq. (2) change when h̃ is mistuned? See Radillo et al. (2019) for a similar analysis for a dynamic
Veliz-Cuba et al. (2016) addressed this question using decisions using pulsatile evidence.
Monte Carlo sampling, but computational costs prevented a This example illustrates how CK equations can be used
complete answer. Since Eq. (2) is rescaled, we take h = 1 to obtain response accuracy statistics, and to compare
for the remainder of our investigation; all other cases can be normative models to related nonlinear models in which the
recovered by rescaling time. Before asking how changing evidence discounting is mistuned. Such approximate models
h̃ alters accuracy, we first briefly mention how accuracy may offer plausible descriptions of subject’s strategies, but
varies with evidence strength, fixing h̃ = h = 1. The only capture some of the possibilities. In the next section,
density ps (y, t) computed using Eq. (6) rapidly converges we develop and analyze linear discounting models that
to the stationary solution, with most of its mass above zero approximate the adaptive evidence accumulation properties
(Fig. 2a). As m, increases, more mass of the stationary of the normative model and can be tuned to obtain near-
distribution moves to positive values (Fig. 2b), but the total optimal response accuracy.
mass, equal to limT →∞ Acc(T ), always saturates at a value
less than 1 due to discounting and state switching.
When the observer misestimates the hazard rate, h̃ = h, 3 Linear evidence discounting in dynamic
we expect the long term accuracy to suffer. Effects on environments
accuracy are subtle, but do follow a general pattern: over-
estimating the hazard rate (h̃ > h) causes the observer to The nonlinear model defined by Eq. (2) describes the
discount prior evidence too strongly, resulting in more errors optimal evidence-accumulation strategy when the estimated
driven by observation noise (Fig. 2c). On the other hand, hazard rate is correct. However, approximate models can
observers that underestimate the hazard rate (0 < h̃ < h) also obtain response accuracy that is near-optimal. Glaze
discount evidence too slowly and are less adaptive to change et al. (2015) and Veliz-Cuba et al. (2016) demonstrated this
points. Change point triggered response accuracy plots using a model that includes a linear leak term, −λy, in place
show both of these trends (Fig. 2d). Accuracy obtains a of the nonlinearity in the normative model. The linear model
lower ceiling value during longer epochs without environ- is more tractable and can capture the dynamics of subjects’
mental changes when the discounting rate h̃ is too high. On beliefs in behavioral data (Ossmy et al. 2013; Glaze et al.
the other hand, accuracy recovers more slowly following 2015; Piet et al. 2018). We are interested in how well its
changes when the discounting rate h̃ is too low. This bias- statistics can be matched to that of the nonlinear model and
variance tradeoff is common to binary choice experiments how sensitive this match is to perturbations in the leak rate.
in dynamic environments (Glaze et al. 2015, 2018): Low The linear discounting model is the doubly stochastic
discounting rates lead to averaging over longer sequences differential equation,
of observations thus reducing the effect of observational √
noise while increasing bias. On the other hand, high dis- dy = x(t) · mdt + 2mdWt − λydt, (7)
counting rates decrease bias but increase susceptibility to
where λ is a parameter we tune. As before, we can write
observational noise, resulting in higher variability. An opti-
differential CK equations corresponding to Eq. (7), and
mal observer balances these two sources of inaccuracy at a
define ps (y, t) = p+ (y, t) + p− (−y, t) to obtain the
given environmental hazard rate.
evolution equation
An experimenter may not be able to change the discoun-
ting rate a subject uses, but can control the strength of ∂ps ∂ps ∂ ∂ 2 ps
= −m +λ [yps ]+m 2 +[ps (−y, t) − ps (y, t)] . (8)
evidence the subject integrates. We therefore asked how the ∂t ∂y ∂y ∂y
accuracy of both ideal and mistuned observers is impacted As in the nonlinear model, an attracting stationary solution
by changes in m. Not surprisingly, accuracy increases as the to Eq. (8) exists as long as λ > 0 (Gardiner 2004). We thus
strength of evidence increases (Fig. 2e). More interestingly, focus on stationary solutions, p̄s (y), and make comparisons
the sensitivity (or curvature) of the accuracy function at with the normative model. Our goal is to see how the leak
the optimum, where h̃ = h, varies nonmonotonically with rate, λ, can be tuned so that the behavior of an observer
m, obtaining a peak at m ≈ 5 (Fig. 2f). Thus response whose belief evolves according to Eq. (7) best matches that
accuracy is most sensitive to model mistuning for tasks of of an observer using the normative model, Eq. (2).
intermediate difficulty. Intuitively, an observer will always We use two metrics to compare our models: first, we
perform close to chance (Acc ≈ 0.5) when the task is consider the accuracy of the linear model
hard (m is low), regardless of the h̃ they use. The observer ∞
will perform well (Acc ≈ 1), again regardless of h̃, if the Acc∞ (λ) = lim ps (y, t; λ) dy, (9)
task is easy (m is high). At intermediate values of m, the t→∞ 0
210 J Comput Neurosci (2019) 47:205–222
and aim to tune λ so Acc∞ (λ) is maximized. Second, to analysis of the stationary densities of the normative and
quantify the distance between the belief distributions, we linear models, as we discuss below.
compute the Kullback-Leibler (KL) divergence How important is it to tune λ in the linear model?
If linear models are more sensitive to fine tuning for
∞ pN
s (y) some task parameter ranges than the normative model,
s ||p s =
DKL p N L
pN
s (y) ln dy (10)
−∞ pL
s (y; λ)
experimentalists could use these task parameter ranges to
distinguish subjects’ strategies. When m is small the belief
between the stationary normative distribution, p N s (y), distributions p̄sL (y) and p̄sN (y) are close whether λ = λAcc
opt ,
obtained from Eq. (6), and the stationary distribution or λ = λKL opt (Fig. 3c,e), but this agreement is sensitive to
of the linear approximation, pL s (y; λ), obtained from changes in λ (Fig. 3d). Thus, both the accuracy and KL
Eq. (8). While it is possible for models to have nearby divergence are sensitive to λ when m is small. For large m
belief distributions but different realizations within trials, the two belief distributions are not close (Fig. 3e), and the
minimizing KL divergence still penalizes models with KL divergence and difference in accuracy are insensitive
divergent belief distributions, sure to have distinct trial wise to changes in λ. This disagreement in belief distribution at
realizations. We show that choosing the leak rate, λ, that high m is less important when optimizing the accuracy of
maximizes accuracy or minimizes the KL divergence leads the linear model, as we only need to maximize the mass of
to different biases (Fig. 3a,b). p̄sL above y = 0.
Similar to the nonlinear model, the response accuracy of Differences between the two models become apparent
an observer using linear discounting varies nonmonotonically if we interrogate observers about their confidence and not
with λ (Fig. 3a). Observers using small λ adapt too just their choice (Fig. 3e). We hypothesize that if one
slowly to change points, and those using a high λ exhibit were to fit behavioral data using response accuracy, and
more noise-driven errors in the state estimate. The optimal compare them to fits using subject’s confidence reports, the
value of λ is achieved by balancing these error sources, second approach would result in stronger leak rates. Indeed,
obtaining response accuracy levels very close to those of considerations of subject confidence as a proxy for LLR has
the normative model. Furthermore, the λ = λAcc opt that been an important development in recent decision making
maximizes response accuracy increases as m is increased, studies (Kiani and Shadlen 2009; Van Den Berg et al. 2016),
since evidence needs to be discounted more rapidly in and we will revisit this view in Section 7. However, most
environments with higher evidence strengths (Fig. 3c). The behavioral studies of decisions in dynamic environments
KL divergence also varies nonmonotonically with λ for all do not include confidence reports, and so model fits are
values of m (Fig. 3b), obtaining a minimum at a value typically performed by considering accuracy data. We thus
λ = λKL opt that also increases with m, but is higher than focus primarily on this measure for the remainder of our
λAcc
opt . Understanding this result requires a more detailed study. An example using confidence reports to determine
a c e Normative
Max Acc
Max Acc 3×10-3 Min KL
accuracy
14 Min KL
0.75
0.55 2 0
0 20 0 50 -10 10
b d 0.1
3 3×10 -3
sensitivity
0.004
KL
0 0 0.01 0
0 20 0 50 -10 10
Fig. 3 Linear two-alternative forced-choice task model. a Perfor- metric as a function of m. d Sensitivity of discounting rate on optimal-
mance of the linear observer as a function of discounting rate, λ, for ity metric as a function of m. e Comparison of the normative density
different evidence strengths, m. b KL divergence between the den- (black), and the linear density with λ = λAcc
opt chosen to maximize accu-
sities evolving under the normative and linear model as a function racy (red), or with λ = λKLopt chosen to minimize KL divergence (blue)
of λ, for different values of m. c Optimal discounting rate for each for m = 5 (top) and m = 50 (bottom)
J Comput Neurosci (2019) 47:205–222 211
the discounting parameter λ a subject uses is given in In either model, the Wiener processes, dWt and dXt , are
Appendix D (Fig. 10). independent.
As before, we can derive an evolution equation for the
ensemble of realizations of these stochastic processes (See
4 Tuning evidence accumulation to account Appendix E). For the nonlinear Eq. (11), the corresponding
for internal noise differential CK equation is
∂ps ∂ps h̃ ∂
We next explore the impact of additional noise sources on = −m +2 [sinh(y)ps ]
the performance of both the nonlinear and linear models. ∂t ∂y h ∂y
Since the nervous system is inherently noisy (Faisal et al. ∂ 2 ps
+ [m + D] + [ps (−y, t) − ps (y, t)] , (13)
2008), it is important to consider sources of variability on ∂y 2
top of the stochasticity of observations when developing and the corresponding differential CK equation for Eq. (12)
and fitting decision models (Smith 2010). Brunton et al. is
(2013) showed that the responses of humans and rats in
∂ps ∂ps ∂ ∂ 2 ps
an auditory clicks task are best described by models that = −m + λ [yps ] + [m + D]
include internally generated noise. Piet et al. (2018) showed ∂t ∂y ∂y ∂y 2
that the same is the case in a dynamic clicks task. With this + [ps (−y, t) − ps (y, t)] . (14)
in mind, we extend our analysis to incorporate an additional Thus, only the constants scaling the diffusion term change
independent noise source. Such variability could arise in as compared to the models without internal noise, Eqs. (6)
early sensory areas or as part of the decision process (Bankó and (8).
et al. 2011). For simplicity, we model the source of noise as How should evidence accumulation be adjusted to
an independent Wiener process, Xt , with variance scaled by maximize response accuracy as the evidence strength and
a parameter D. The nonlinear model then takes the form, internal noise change? As noted earlier, humans and rats
√ √ h̃ integrate internal noise in addition to information obtained
dy = x(t)·mdt + 2mdWt + 2DdXt −2 sinh(y)dt. (11) from the stimulus itself (Glaze et al. 2015; Piet et al.
h
2018). It is therefore plausible that they adjust their stimulus
Adding internal noise means that Eq. (11) is no longer
integration strategies to limit the impact of internal noise
a normative model: When D > 0, noise corrupts state
on performance. Increasing the magnitude of internal noise,
estimates (Fig. 4a), and maximal response accuracy is
D, reduces response accuracy in the nonlinear model
achieved when h̃ < h, as we show. The linear model is
(Fig. 4a,b), and there there is a steep drop off in response
updated similarly,
√ √ accuracy when D slightly exceeds m (Fig. 4b). This occurs
dy = x(t) · mdt + 2mdWt + 2DdXt − λydt. (12) whether h̃ is fixed or allowed to vary. When h̃ can be tuned,
a b
accuracy
0.8
0.6
0 1 10-2 103
c acc
0.8 d
2 10
1
-1 0.6 -3 3
0 2 0 50
Fig. 4 Internal noise reduces accuracy of dynamic decisions. a Super- model Eq. (11) with added internal noise as a function of h̃ and D for
imposed realizations of the nonlinear model described by Eq. (11) m = 10. The blue line shows the optimal value of h̃ for a given D,
with added internal noise as the magnitude of internal noise, D, is var- when the environmental hazard rate, h = 1, is fixed. d Leak rate λ that
ied (legend). b Performance of nonlinear model Eq. (13) as internal maximizes accuracy for a given evidence strength m and internal noise
noise amplitude D is varied for different evidence strengths m (leg- amplitude D
end). Vertical lines correspond to D = m. c Accuracy of nonlinear
212 J Comput Neurosci (2019) 47:205–222
the value of h̃ that maximizes response accuracy decreases whose statistics we expect to match those of more detailed
as D is increased (Fig. 4c): The observer must integrate over models. Figure 5a shows example realizations of this
longer timescales to average the increased internal noise stochastic process as the boundary location, β, is varied,
and obtain a reliable estimate of the state. However, this is illustrating how encounters with the no-flux boundaries
balanced by the need to adapt to change points as quickly as serve to discount evidence.
possible. Similarly, in the linear model the leak rate, λ, that The steady-state solution of the differential CK equations
maximizes accuracy decreases as D is increased (Fig. 4d). corresponding to Eq. (15) can be obtained exactly. The
In general, as internal noise increases the observer must thus evolution equations are
integrate over longer timescales to obtain the most accurate
estimate of the state. ∂p± (y, t) ∂p± (y, t) ∂ 2 p± (y, t)
= ∓m +m + p∓ (y, t) − p± (y, t) ,
∂t ∂y ∂y 2
(16)
5 Discounting by bounding observer
confidence and the no flux (Robin) boundary conditions imply that
∂p± (±β, t)
As an alternative to models that discount evidence with leak ±mp± (±β, t) − m = 0.
terms, we next consider models with no leak, and no-flux ∂y
boundaries at y = ±β (Glaze et al. 2015). This prevents the
Since Eq. (16) is an advection-diffusion equation, the proper
belief from straying outside of the range −β ≤ y ≤ β:
reflecting boundary is a Robin boundary (Gardiner 2004). It
⎧ √ can be shown that the stationary solution p̄s (y) = p̄+ (y) +
⎪
⎪ x(t) · mdt + 2m · dWt , y ∈ (−β, β),
⎨ √ p̄− (−y) to Eq. (16) restricted by the boundary conditions is
dy = min(x(t) · mdt + 2m · dWt , 0), y ≥ +β,
⎪
⎪ √
⎩ p̄s (y) = C1 + C2 eqy + (mq − (m + 1)) e−qy . (17)
max(x(t) · mdt + 2m · dWt , 0), y ≤ −β.
(15)
The constants C1 and C2 and details of the derivation are
Unlike classic DDMs for two alternative free response given in Appendix F.
tasks (Smith and Ratcliff 2004; Bogacz et al. 2006; Gold The distribution is more shallow for higher β, as the
and Shadlen 2007), the process does not terminate when the stochastic trajectories spread over the admissible belief
belief, y, reaches one of the boundaries ±β. range (Fig. 5b). This is analogous to the sharpening
More careful treatments of the reflecting boundary are (broadening) of the stationary distributions of the linear
possible, by considering the limit of a discrete-time biased model that occurs as the leak rate is increased (decreased).
random walk on a lattice (Erban and Chapman 2007), Note here that decreasing β strengthens the discounting
but we prefer the more intuitive description of Eq. (15), effect of the reflecting boundaries.
a c e
sensitivity
accuracy
3 0.8 0.1
-3
0.5 0
b0 1 0
d
5
f
0
Bounded
50
1 -0.001 Linear
3
0 1 -0.005
-3 3 0 50 0 50
Fig. 5 Bounded accumulator dynamics and performance. a Superim- accumulator as a function of m. e Sensitivity of the bounded accumu-
posed stochastic realizations of Eq. (15) with different boundaries, β lator model to changes in β near βopt as a function of m. f Difference
(legend). b Superimposed steady-state distributions, p̄s (y), computed in accuracy between the optimally tuned bounded accumulator and
from Eq. (17). c Tuning the boundary β, allows the accuracy of the normative models (Acc := AccB (m) − AccN (m)) compared to accu-
bounded accumulator, Eq. (18), to closely match that of the norma- racy difference between optimally tuned linear and normative models
tive model described by Eq. (2) (dashed lines) for various m (legend). (Acc := AccL (m) − AccN (m)) as a function of m
d The optimal bound, βopt , that maximizes accuracy of the bounded
J Comput Neurosci (2019) 47:205–222 213
To compute steady state accuracy of the bounded or even the linear model (Fig. 3e,f). In this respect, fitting
accumulator model, we can integrate Eq. (17) to obtain a subject confidence reports using the bounded accumulator
formula that depends on m and β: would give very different results; we return to this point in
Section 7.
β
Acc∞ (β) = p̄s (y) dy = C1 β
0
6 Generalized discounting functions
C2 1 − e−qβ qβ
+ e + (mq − (m + 1)) . (18) with a cubic example
q
a b e linear
8 4
0.64
0.6
0
0 10
c d
0.003
accuracy
0.8
0 0.5
0 50 -20 0 cubic
Fig. 6 Cubic discounting functions: fC (y) = −λ1 y − λ2 y 3 . a Com- AccC (m) − AccL (m)) increase with m. d Accuracy as a function of λ1
pared with the normative fN (y) = −2 sinh(y) and linear fL (y) = for different m (legend). e Schematic of potential functions of linear
−λ1 y discounting functions. b Accuracy as a function of discount- model (top) and cubic model (bottom) given the state s(t). The cubic
ing rates (λ1 , λ2 ), for m = 1. Dot denotes maximizer. c Difference in model has been mistuned so that λ2 = 1 and λ1 < 0, resulting in a
accuracy between optimally tuned cubic and linear models (Acc := double-potential well function
214 J Comput Neurosci (2019) 47:205–222
probability density p̄sC (y) and that of the normative model, the choice of discounting function, and use KL divergence
p̄sN (y), though the model is more complex (Friedman et al. to quantify differences between different models.
2001). Since the linear model already obtains near-optimal
accuracy (Fig. 3a), we find, as expected, that the best cubic
model is only slightly more accurate. In fact, accuracy drops 7 Revisiting KL divergence for fitting
rapidly as λ2 is changed from its optimal value (Fig. 6b). We observer belief distributions
also calculated the difference between the accuracy of the
optimal cubic model and optimal linear model, Acc (m) = Subject reports of confidence in decision-making tasks
AccC (m) − AccL (m) (Fig. 6c). Accuracy improves by can be associated with LLRs of normative evidence
incorporating the cubic term at high m values, since accumulation models (Kiani and Shadlen 2009). Thus,
nonlinear discounting is most needed at higher y values. it may be possible to empirically estimate the belief
However, mistuning the cubic model can considerably distribution, ps (y, t), represented by our models, by asking
limit accuracy when the attractor structure of Eq. (19) with subjects to report confidence in their choice. This provides
f (y) = fC (y) is qualitatively changed (Fig. 6d). Equilibria an additional advantage of our approach over Monte
of the noise-free model are identified by fixing x(t) = ±1 Carlo simulations, as using the latter to estimate belief
and solving the cubic equation 0 = ±m − λ1 y − λ2 y 3 . distributions can be costly and inaccurate (See Fig. 9 in
Fixing λ2 > 0, we find a critical value, λc1 < 0 for which Appendix C). As we show here, a better understanding
the attractor structure of the model switches from a single of how our normative and approximate models deviate
stable fixed point to two (Fig. 6e). Such bistable systems from one another can be gleaned by comparing their belief
can be advantageous for working memory (Brody et al. distributions and computing KL divergence measures.
2003), but can hinder belief switches necessary in dynamic We provide intuition for the differences we will see by
environments after state transitions. Figure 6d shows that plotting single stochastic realizations of all four models
the accuracy of the model decreases as λ1 is decreased (Fig. 7a). The linear and (even better) the cubic models
and the potential wells deepen. In these cases, the observer closely track the belief trajectory of the normative model,
retains an erroneous belief long after the state has changed. while the bounded accumulator model strays the farthest.
Thus, the cubic nonlinearity only marginally improves Comparing belief distributions of all four models that
accuracy, but can have deleterious effects if mistuned. minimize KL divergence with the normative model at m =
However, we may wish to use other measures of a subject’s 50 (Fig. 7b), we see that the cubic model matches the
belief to fit and validate models. In the next section we normative model far better than the linear model. This is due
therefore ask how the full belief distribution changes with to the nonlinearity incorporated by the cubic term, which
Normative Cubic
a Linear Bounded b
3×10-3
0.6
0 0
0 1 -15 15
c Bounded
d
0.4 0.1
KL
Linear
0 0
0 50 0 50
Fig. 7 Minimizing KL divergence in our approximate models. a Sin- gives normative, linear, and cubic scale, while right axis gives bounded
gle realization of linear, cubic, and bounded models tuned to minimize accumulator scale. c Minimum KL divergence for linear and bounded
KL divergence m = 50. Normative realization is shown for compar- models as a function of m. d Difference in KL between optimal linear
ison (legend). b Distributions of linear, cubic, and bounded models and cubic models (KL := KLL (m) − KLC (m)) as a function of m
tuned to minimize KL divergence, again for m = 50. Left vertical axis
J Comput Neurosci (2019) 47:205–222 215
attenuates the tail of the distribution at high y values. On the or absence of left or right clicks at each time t. See Piet
other hand, the best fit bounded accumulator distribution is et al. (2018) and Radillo et al. (2019) for details. At an
far from that of the normative model. interrogation time, T , the observer must respond which side
Computing the KL divergence between models, we arrive currently has the higher click rate r+ .
at two main conclusions: First, despite the fact that the The model for an ideal observer’s belief y(t) =
+ |ξ(t))
bounded accumulator obtains near optimal accuracy, the log P(s
P(s− |ξ(t)) is given by
corresponding belief distribution, p̄sB (y), deviates from that ∞
dy
of the normative model, p̄sN (y), at all evidence strength =κ
j j
δ t − tR − δ t − tL − 2h sinh(y), (20)
values, m (Fig. 7c). On the other hand, though the cubic dt
j =1
model only mildly increases response accuracy over the
r+
linear model, the corresponding belief distribution, p̄sC (y), where κ = ln r− is the height of each evidence increment,
matches that of the normative model, p̄sN (y), far better j
tR and
j
tLare the right and left click times, and h is the
(Fig. 7d). Our differential CK framework allowed us to hazard rate. Additionally, we define the inputs’ signal-to-
obtain these results quickly and accurately. noise ratio (SNR) as SNR = √r+r −r −
(Skellam 1946). See
+ +r−
Radillo et al. (2019) for a detailed discussion of how the
SNR shapes model response accuracy.
8 Chapman-Kolmogorov equations Rather than carrying out another detailed analysis
for clicks-task models of several different approximations and perturbations to
Eq. (20), we simply wish to show that our CK approach
Thus far, we have been concerned with models that
works and provides useful insights for click stimulus
represent evidence accumulation in a RDMD task (Glaze
models. To sample the space of possible approximate
et al. 2015), in which subjects receive a continuous flow
models, we fix h = 1 and focus on a linear discounting and
of evidence during a trial. We next examine models of
bounded accumulator model. Since Piet et al. (2018) was
observers accumulating discretely timed, pulsatile evidence.
specifically interested in internal noise perturbed versions
Piet et al. (2018) showed rats can perform an auditory
of the linear model, we start with this example, and then
clicks task of this type near-optimally. In this experiment,
consider a bounded accumulator without internal noise that
subjects are presented with two trains of clicks (one to the
affords us explicit results.
left ear, the other to the right), each generated by a Poisson
Our linear model with internal noise takes the form
process with instantaneous rates rL (t) and rR (t). The rates ∞
√
j j
evolve according to a two-state continuous time Markov dy = κ δ t − tR − δ t − tL dt − λy · dt + 2DdXt . (21)
process with hazard rate h so that P(rj (t + dt) = rj (t)) = j =1
h · dt + o(dt), rR (t) = rL (t) always, and rR,L (t) ∈ r± Here λ is the leak rate, D is the strength of the
with r+ > r− . We define the state (rR (t), rL (t)) = (r± , r∓ ) internal noise, and Xt is a Wiener process. We then
as s± . Observations ξ(t) are now comprised of the presence define conditional densities p+ (y, t) and p− (y, t) as before,
writing coupled differential CK equations as
∂p± (y, t)
= r± p± (y − κ, t) − p± (y, t) + r∓ p± (y + κ, t) − p± (y, t)
∂t
∂ ∂ 2 p± (y, t)
+λ yp± (y, t) + D + p∓ (y, t) − p± (y, t), (22)
∂y ∂y 2
Unlike our differential CK equations for models with the range of possible jump heights. Simulating Eq. (22)
continuously arriving evidence, the pulses flow probability directly, we study how response accuracy depends on the
between y±κ and y, preventing us from combining p± (y, t) leak and the click rates (Radillo et al. 2019). Similar to our
with a change of variables and obtaining a single CK linear model with a drift diffusion signal, accuracy varies
equation. Note, Piet et al. (2018) considered a generalization nonmonotonically with λ (Fig. 8a), and is maximized at
of Eq. (21) in which click amplitudes were variable too, λ = λopt for a given pair (r+ , r− ) as plotted in Fig. 8b.
introducing additional stochasticity. Such a formulation Fixing SNR does not fix λopt as Radillo et al. (2019) showed
would introduce integrals into Eq. (22) to account for for the normative model. As either r+ or r− is increased,
216 J Comput Neurosci (2019) 47:205–222
a b
accuracy
0.8 30 4
1
0.6 0
0 8 0 90
c d
0.9 3
accuracy
30
0.7 1
0
1 10 0 40
Fig. 8 Performance of heuristic strategies on the dynamic clicks task. and r− for the linear model. c Accuracy of the bounded accumulator
a Accuracy of the linear model varies nonmonotonically with the leak model given by Eq. (24) varies with the boundary value β for different
rate, λ for different SNR = √r+r −r−
(Radillo et al. 2019) with r− = 30 SNR as r− = 30 is fixed. d Heatmap of optimal β = βopt as a function
+ +r−
is fixed. b Heatmap of optimal leak rate, λ = λopt , as a function of r+ of r+ and r− for bounded accumulator clicks model
λopt increases (Fig. 8b), suggesting that increasing the rate The dependence of the optimal β = βopt , which
of true or erroneous pulses warrants stronger evidence maximizes response accuracy, on r+ and r− is nuanced.
discounting. Fixing r− = 30, for a given SNR = √r+r −r −
, there is an
+ +r−
We also consider a bounded accumulator with no explicit optimal β = βopt maximizing response accuracy (Fig. 8c).
discounting function. In parallel with Eq. (16), we define However, there is a surprisingly large range of (r+ , r− )
the model as values within which β = 1 is optimal (Fig. 8d). There
⎧ is a range of (r+ , r− ) for which the discounting timescale
⎪ ∞ j j
⎪
⎪ κ j =1 δ t − tR − δ t − tL , y ∈ (−κβ, κβ), (breadth of the interval [−βκ, βκ]) increases with task
dy ⎨ ∞ j j
= min(κ j =1 δ t − tR − δ t − tL , 0), y ≥ +κβ, difficulty, as long as r+ is sufficiently far from r− . However,
dt ⎪
⎪
⎩ max(κ ∞ δ t − t j − δ t − t j , 0), y ≤ −κβ.
⎪ when r− ≈ r+ , βopt = 1, despite task difficulty. We
j =1 R L
conjecture this bound improves performance by instituting
The observer’s belief is restricted to the interval [−κβ, κβ] a “two click” strategy, in which the observer needs to only
for some positive integer β ∈ Z>0 . The corresponding hear two clicks on the high click rate side to register a
differential CK equations can be written as a discretized correct current belief. This limits the size of erroneous
system since y only visits integer multiples of κ between excursions wrong clicks can cause, as the bounds limit the
−κβ to κβ. Rescaling n = y/κ yields the following system effect of many misatributed clicks in a row.
Our methods thus extend to models with pulsatile
∂p± evidence accumulation, illustrating their broad applicability.
= r± p± (n − 1, t) − p± +r∓ p± (n + 1, t) − p± +p∓ −p± , We can efficiently study model performance and its
∂t
(23) dependence on task parameters, and even explicitly analyze
the resulting equations to determine how approximate
for n = −β + 1, ..., β − 1, along with the boundary models perform.
equations
0 = −r± p± (−β, t) + r∓ p± (−β + 1, t) + p∓ (−β, t) − p± (−β, t),
0 = r± p± (β − 1, t) − r∓ p± (β, t) + p∓ (β, t) − p± (β, t).
9 Conclusion
As with the continuum version of the bounded accumulator, Decision-making models are key to understanding how
the stationary solution p̄(n) can be obtained explicitly animals integrate evidence to make choices in nature.
Animals most likely use heuristic strategies in dynamic
p̄s (n) = C1 + C2 q+
n
+ C3 q−
n
, (24) tasks as they can be easier to implement, and have utility that
is close to optimal (Rahnev and Denison 2018). Normative
with constants q± , C1 , C2 , and C3 and the derivation given models are still useful, however, as subject performance
in Appendix G. can be benchmarked against them, allowing possible
J Comput Neurosci (2019) 47:205–222 217
insights into how and why organisms fail to perform Models can thus guide one’s choice of task parameters
optimally (Geisler 2003). Investigating optimal models when setting up experiments to determine the strategies
and their approximations requires simulations across large subjects use to make decisions in dynamic environments.
parameter spaces; these necessarily require rapid simulation As in Radillo et al. (2019), we found that accuracy is most
techniques to obtain refined results. Efficient computational sensitive to one’s choice of model and tuning when tasks
methods are therefore essential for the analysis of evidence are of intermediate difficulty. In contrast, tasks that are
accumulation models, and their application to experiment easy (hard) are performed well (poorly) by most models.
design. Also, the full belief distributions generated by our methods
Using differential CK equations to describe ensembles could be subsampled to produce randomized responses
of decision model realizations speeds up computation for comparison with subject data (Drugowitsch 2016). It
and describes the time-dependent probability density may also be feasible to use our differential CK equations
of an observer’s belief. Thus, traditional metrics of to model trial-to-trial belief distributions of subjects, as
performance (e.g., accuracy) and other less common model affected by internal noise hidden to the experimentalist. This
comparison metrics (KL divergence) can be computed approach was recently developed in Piet et al. (2019) to
rapidly. This opens new avenues for comparing normative account for for variability in subject responses.
and heuristic decision making models, and for determining Model development can also help to inspire new
task parameter ranges to distinguish models. There is experimental tasks, based on predictions and ideas that
also hope that in high throughput experiments, sufficient arise from mathematically describing subjects’ decision
data could be collected to specify subject confidence processes. One possible extension of the tasks we have
distributions, which could be fit, or compared to model discussed here could consider stochastic switches in
predictions (Piet et al. 2019). evidence quality within trials. Past work has focused on both
Doubly stochastic and jump-diffusion models appear theoretical predictions and experimental results associated
in a number of other contexts in neuroscience and with task difficulty switching between trials (Drugowitsch
beyond (Hanson 2007; Horsthemke and Lefever 2006). et al. 2012; Zhang et al. 2014), suggesting subjects’
For instance, dichotomous and white noise have been decision thresholds may vary with time as task difficulty
included in linear integrate and fire (LIF) models to model is inferred throughout the trial. When both the state and
voltage or channel fluctuations (Droste and Lindner 2014; difficulty switch stochastically within a trial, the effective
2017; Salinas and Sejnowski 2002). The interspike interval state is governed by a multi-state continuous time Markov
statistics of these models can be analyzed directly by process. Details that could be introduced into such multi-
considering the corresponding differential CK equations. state models, such as asymmetric evidence qualities and
Unlike the models we consider here, the LIF model the ability to turn off evidence, offer a rich framework
includes a single absorbing boundary and reset condition, for applying the stochastic methods we have developed
which must be treated carefully when defining the flow of here. Another extension we could consider in our models is
probability through state space. the recently developed click task model with stochastically
We have studied a number heuristic models and drawn click heights (Piet et al. 2018; Radillo et al. 2019).
computed how their performance depends on both task Jumps would then be represented by an integral over the
parameters and evidence discounting parameters. Well entire belief space, requiring new computational methods
tuned heuristic models, such as the linear and bounded for efficient simulation of the associated differential CK
accumulator models, can in fact exhibit near-normative equations.
performance (Glaze et al. 2015; Veliz-Cuba et al. 2016; The analysis we have presented only considers environ-
Radillo et al. 2019). There are specific parameter regimes ments governed by two-state continuous time Markov pro-
(low versus high m) in which certain heuristic models cesses with symmetric transition rates, since this paradigm
perform better; our differential CK methods have allowed has been the focus of recent experimental work in humans
us to explore these regimes rapidly. Importantly, Brunton and rodents (Glaze et al. 2015, 2018; Piet et al. 2018). Dif-
et al. (2013), and Piet et al. (2018) have shown that ferential CK equations could also be formulated in the case
internal noise determines subject performance in decision of asymmetric transition rates as well as multiple discrete
tasks, in addition to the variability of the signal. We have state or even continuum state Markov processes (Veliz-Cuba
confirmed that including internal noise causes optimal et al. 2016), and it would be interesting to see how previ-
evidence-discounting to be weakened as noise increases, ous work on such models could be extended by leveraging
and that accuracy drops off precipitously once the amplitude this approach. In addition, there has been previous work
of internal noise reaches that of the signal. analyzing the effects of adaptive evidence integration under
Our approach can also be used by experimentalists non-stationary psychophysical task conditions (Heath 1992;
testing observer performance in dynamic-decision tasks. Eckhoff et al. 2008; Ossmy et al. 2013). While this past
218 J Comput Neurosci (2019) 47:205–222
so we can compute the limits of Eq. (29) as We start by conditioning on x(t) = +1 to find an
equation for p+ (y, t). In the absence of state transitions in
2μ2 the background Markov process, the evolution equation for
g(t) = lim gt (t) ∈ ± = ±g, (30a) p+ (y, t) is now given by the Fokker-Planck equation
t→0 σ2
4μ2 ∂p+ ∂p+ h̃ ∂ ∂ 2 p+
ρ 2 (t) = lim ρt
2
(t) = 2 = ρ 2 , (30b) = −m +2 sinh(y)p+ + m .
t→0 σ ∂t ∂y h ∂y ∂y 2
where g(t) ∈ {+g, −g} is a telegraph process with On the other hand, in the absence of drift and diffusion,
probability masses P(±g, t) evolving as Ṗ(±g, t) = the evolution of p+ (y, t) is governed exclusively by the
h [P(∓g, t) − P(±g, t)] and ρ 2 (t) = ρ 2 remains constant. two-state Markov process; in this case, we can write the
Therefore, the continuum limit (t → 0) of Eq. (28) is evolution equation in the form of a discrete master equation:
dy(t) = g(t)dt + ρdWt − 2h sinh(y(t))dt, (31) dp+
= p− − p+ .
where dW is a standard Wiener process. Equation (31) dt
provide the normative model of evidence accumulation for Note here that because of the rescaling of time in Eq. (2),
an observer who knows the hazard rate h and wishes to infer the transition rates of the Markov process equal one. Given
the sign of g(t) at time t with maximal accuracy (Glaze et al. these two components, we can follow Gardiner (2004)
2015; Veliz-Cuba et al. 2016). to write the CK equation that describes the evolution of
However, we are also interested in near-normative p+ (y, t) when drift, diffusion, and switching are all present
models in which the observer assumes an incorrect hazard as
rate h̃ = h. In such a case, the analysis proceeds as before, ∂p+ ∂p+ h̃ ∂ ∂ 2 p+
= −m +2 sinh(y)p+ + m + p− − p+ ,
with the probabilistic inference process simply involving h̃ ∂t ∂y h ∂y ∂y 2
now rather than h, and the result is which is Eqs. (4a); (4b) is obtained similarly by
dy(t) = g(t)dt + ρdWt − 2h̃ sinh(y(t))dt. (32) conditioning on x(t) = −1.
To determine response accuracy, we are interested in
Lastly, note that if indeed the original observations ξ are the probability the observer responds with the correct state
drawn from normal distributions, Eq. (30a), (30b) states of the Markov process. Therefore, at an interrogation time
g(t) ∈ ±g where g = 2μ2 /σ 2 and ρ 2 = 2g. Rescaling t = T where x(T ) = +1, the observer is correct if y > 0.
time ht → t, we can then express Eq. (32) in terms of the ∞
This probability is given by the integral 0 p+ (y, T ) dy.
following rescaled equation Similarly, if x(T ) = −1, the observer is correct if
0
√ h̃ y < 0, which happens with probability −∞ p− (y, T ) dy.
dy(t) = x(t)mdt + 2mdWt − 2 sinh(y)dt, Therefore, the total response accuracy at time T is given by
h
summing these two integrals:
where m = 2μ /(hσ ) and x(t) ∈ ±1 is a telegraph process
2 2
∞ 0
with hazard rate 1, as shown in Eq. (2) of the main text.
Acc(T ) = p+ (y, T ) dy + p− (y, T ) dy,
0 −∞
which is Eq. (5).
Appendix B: Derivation Finally, in order to more efficiently simulate the
of the Chapman-Kolmogorov equations evolution of p+ (y, t) and p− (y, t), we notice that because
the nonlinear leak is an odd function, a change of variables
Here we outline the derivation of the CK equations given −y → y in Eq. (4b) allows us to combine Eqs. (4a) and
by Eqs. (4a), (4b) and (6) in the main text. Our goal is to (4b) into a single PDE. Defining ps (y, t) := p+ (y, t) +
write a PDE that describes the evolution of an ensemble p− (−y, t), we obtain the evolution equation
of belief trajectories described by the SDE in Eq. (2)
across all realizations of both sources of stochasticity – ∂ps ∂ps h̃ ∂ ∂2
= −m +2 [sinh(y)ps ]+m 2 +[ps (−y, t) − ps (y, t)] ,
observation noise and state switching. We cannot simply ∂t ∂y h ∂y ∂y
write a Fokker-Planck equation to describe the desired which is Eq. (6).
probability distribution, as the process is doubly stochastic;
the diffusion is described by a scaled Wiener process and
the sign of the drift is controlled by a two-state Markov Appendix C: Finite difference methods
process. Therefore, following Droste and Lindner (2017), for Chapman-Kolmogorov equations
we condition our process on the state of x(t), and seek to
obtain evolution equations for the conditional distributions We used a finite difference method to simulate the
p± (y, t) := p(y, t|x(t) = ±). differential CK equations. The method is exemplified here
220 J Comput Neurosci (2019) 47:205–222
a 103 105 b
104 106 c
Time (s)
Time (s)
accuracy
0.76 102 0.1 101 10-2
Error
Error
0.72 1 0 10-1 -3 10-4
0.5 1.5 103 samples 106 10 10-1
Fig. 9 Comparison of CK equations with Monte Carlo sampling. simulations calculated against results from CK equations. c Runtime
a Calculation of accuracy of mistuned nonlinear leak model for m = 5. (red) and L∞ error (blue) of finite difference simulations of the nonlin-
Monte Carlo simulations run with varied number of samples superim- ear model Eq. (6) under refinement of belief discretization, compared
posed (legend). b Runtime (red) and L2 error (blue) of Monte Carlo with the y = 10−4 case. Because our method is first-order accurate
simulations as a function of sample size. Runtime of CK equations in time and second-order accurate in belief, the method is first-order
(black dashed) superimposed for comparison. L2 error of Monte Carlo accurate when y is decreased and t is held constant
for the normative CK equation from Eq. (6), but a similar results that are close to those from the CK equations takes
approach was used for the linear, cubic, and pulsatile much longer to run (Fig. 9b), as we can obtain good accu-
equations. For stability purposes, our method uses centered racy (error ≈ 10−2 ) from the CK equations in less than a
differences in y and backward-Euler in t. This gives the second (Fig. 9c).
following finite difference approximations of the functions
and their derivatives in Eq. (6):
∂ps (y, t) ps (y, t + t) − ps (y, t) Appendix D: Distinguishing linear
≈ , ps (y, t) ≈ ps (y, t + t),
∂t t
∂ps (y, t) ps (y + y, t + t) − ps (y − y, t + t)
discounting models using confidence
≈ ,
∂y 2y
∂ 2 ps (y, t) ps (y − y, t + t) − 2ps (y, t + t) + ps (y + y, t + t) Can we use the distributions obtained from the CK
≈ ,
∂y 2 (y)2 equations to distinguish which model a subject uses? To
illustrate an example, we considered distinguishing the two
where t and y are timestep and spacestep of the linear discounting models given by Eq. (7) and asked how
simulation, respectively. Substituting into Eq. (6) and many trials an experimentalist would need to run in order
solving for ps (y, t) at each point on a mesh y for y gives the to tell if a subject was using λAcc or λKL . We sampled
system of equations: from the distributions obtained from Eq. (8) and performed
a likelihood ratio test to find the number of samples needed
Aps (y, t + t) = p(y, t), (33)
for the experimentalist to have 90% confidence in the
where A is tridiagonal with elements along the primary off- model being used. Assuming a symmetric prior P (λAcc ) =
diagonal. This system can be inverted at each timestep and P (λKL ) = 0.5, and defining Yj = yj (T ) as the belief report
used to calculate the updates ps (y, t + t). at the end (t = T ) of trial j , the likelihood ratio is computed
For the boundary conditions, we imposed no-flux using Bayes’ rule and independence after N trials as
conditions at the mesh boundaries ±b. For a standard drift-
P (λAcc |Y1:N )
N
P (Yj |λAcc )
diffusion equation with drift A(y) and diffusion constant = ,
B(y), this condition takes the form P (λKL |Y1:N ) P (Yj |λKL )
j =1
1 ∂
J (±b, t) = A(±b)p(±b, t) − B(±b)2 p(±b, t) = 0. (34)
2 ∂y
103
mean trials
and we counted the number of trials required to obtain Lastly, defining the sum p̄s (y) = p̄+ (y) + p̄− (−y), we
≥ 90% confidence in either model (so P (λ|Y1:N ) ≥ 0.9 for obtain
either λ ∈ {λAcc , λKL }). The results of this simulation are
p̄s (y) = C1 + C2 eqy + (mq − (m + 1)) e−qy .
given in Fig. 10, showing that as the strength m of evidence
increases, the mean number of trials N(m) required to The no flux boundary conditions p̄s (±β) − ∂ p̄s (±β)
∂y = 0
distinguish models decreases precipitously. This is to be along with the normalization requirement
β
p̄s (y)dy = 1
−β
expected based on the fact that the belief distributions give explicit expressions for the constants
become more separated as m increases (Fig. 3e).
1 m(q − 1) sinh(qβ)
C1 = − C2
2β qβ
Appendix E: Deriving differential CK and
equation with internal noise
!−1
m sinh(qβ)
C2 = 2β (q − 1) eqβ + − (q +1)(mq − (m+ 1))e−qβ .
Here we provide intuition for the form of the diffusion qβ
coefficient in Eq. (13) for the belief distribution of a
normative observer strategy with additional internal noise
of
√ strength D. Starting
√ with the SDE in Eq. (12), because
2mdWt and 2DdXt are increments of independent Appendix G: Steady state solution
Wiener processes, we√can define a new Wiener process of the clicks-task bounded accumulator
√
AdZt = 2mdWt + 2DdXt that has the same statistics model
as the original summed Wiener processes (Gardiner 2004).
To determine the appropriate effective diffusion constant A, Considering
Eq. (23),we look
for stationary solutions of the
we note that p̄+ (n) C1
form = α n , yielding the characteristic
p̄− (n) C2
Var[AdZt ] = A2 Var[dZt ] = A2 t equation
and r+ α n−1 + r− α n+1 − rs α n r− α n−1 + r+ α n+1 −rs α n
√ √
Var[ 2mdWt + 2DdXt ] = 2mVar[dWt ]+2DVar[dXt ] = 2mt+2Dt.
−α 2n = 0, (36)
√
This requires A = 2(m + D), and means Eq. (12) can be where rs = r+ + r− + 1. Solving Eq. (36) gives α = 1
rewritten as with eigenfunction C1 = C2 and two roots α = q± of
h̃ the quadratic α 2 − α(r+ + r+ 2 + r + r 2 )/(r r ) + 1 =
− − + −
dy = x(t) · mdt + 2(m + D)dZt − 2 sinh(y)dt, 0. Superimposing the eigenfunctions, redefining constants,
h
and defining p̄s (n) = p̄+ (n) + p̄− (−n) gives the general
which following Gardiner (2004), has the differential CK solution
equation given by Eq. (13).
p̄s (n) = C1 + C2 q+
n
+ C3 q−
n
.
The constants C1 , C2 , and C3 can be determined by
β
Appendix F: Steady state solution normalization n=−β p̄s (n) = 1 and the stationary
of the bounded accumulator model boundary conditions
r+ p̄s (β − 1) − r− p̄s (β) + p̄s (−β) − p̄s (β) = 0,
Steady state solutions of Eq. (16) are derived first by noting
that ∂t p± = 0 implies r− p̄s (−β + 1) − r+ p̄s (−β) + p̄s (β) − p̄s (−β) = 0.
Long term accuracy of the bounded accumulator is then
0 = ∓mp̄± (y) ± mp̄± (y) + p̄∓ (y) − p̄± (y) , (35)
determined by the weighted sum Acc∞ (β) = 12 p̄s (0) +
(±β). β
with boundary conditions p̄± (y)(±β) = p̄± n=1 p̄s (n).
p̄+ (y) A αy
Equation (35) has solutions = e , with
p̄− (y) B
characteristic equation m2 α 4 − m2 + 2m α 2 = 0. The References
characteristic
roots are α = 0, ±q, where we define q =
1 + m2 . For α = 0, we have A = B, whereas for α = ±q, Bankó, É.M., Gál, V., Körtvélyes, J., Kovács, G., Vidnyánszky, Z.
(2011). Dissociating the effect of noise on sensory processing
the symmetry p̄+ (y) = p̄− (−y) implies B = (mq − (m + and overall decision difficulty. Journal of Neuroscience, 31(7),
1))A for α = +q and A = (mq − (m + 1))B for α = −q. 2663–2674.
222 J Comput Neurosci (2019) 47:205–222
Behrens, T.E., Woolrich, M.W., Walton, M.E., Rushworth, M.F. Kiani, R., & Shadlen, M.N. (2009). Representation of confidence
(2007). Learning the value of information in an uncertain world. associated with a decision by neurons in the parietal cortex.
Nature Neuroscience, 10(9), 1214. Science, 324(5928), 759–764.
Billingsley, P. (2008). Probability and measure. Wiley. Moehlis, J., Brown, E., Bogacz, R., Holmes, P., Cohen, J.D.
Bogacz, R., Brown, E., Moehlis, J., Holmes, P., Cohen, J.D. (2006). (2004). Optimizing reward rate in two alternative choice tasks:
The physics of optimal decision making: a formal analysis of mathematical formalism. Center for the study of brain, mind and
models of performance in two-alternative forced-choice tasks. behavior (pp. 04–01). Princeton University.
Psychological Review, 113(4), 700. Ossmy, O., Moran, R., Pfeffer, T., Tsetsos, K., Usher, M., Donner,
Brea, J., Urbanczik, R., Senn, W. (2014). A normative theory T.H. (2013). The timescale of perceptual evidence integration can
of forgetting: lessons from the fruit fly. PLoS Computational be adapted to the environment. Current Biology, 23(11), 981–986.
Biology, 10(6), e1003640. Piet, A.T., El Hady, A., Brody, C.D. (2018). Rats adopt the optimal
Brody, C.D., Romo, R., Kepecs, A. (2003). Basic mechanisms for timescale for evidence integration in a dynamic environment.
graded persistent activity: discrete attractors, continuous attractors, Nature Communications, 9(1), 4265.
and dynamic representations. Current Opinion in Neurobiology, 13(2), Piet, A., Hady, A.E., Boyd-Meredith, T., Brody, C. (2019). Neural
204–211. dynamics during changes of mind. In Computational and Systems
Brunton, B.W., Botvinick, M.M., Brody, C.D. (2013). Rats and Neuroscience (p. 2019). Lisbon.
humans can optimally accumulate evidence for decision-making. Radillo, A.E., Veliz-Cuba, A., Josić, K., Kilpatrick, Z.P. (2017).
Science, 340(6128), 95–98. Evidence accumulation and change rate inference in dynamic
Busemeyer, J.R., & Townsend, J.T. (1992). Fundamental derivations environments. Neural Computation, 29(6), 1561–1610.
from decision field theory. Mathematical Social Sciences, 23(3), Radillo, A.E., Veliz-Cuba, A., Josić, K. (2019). Performance
255–282. of normative and approximate evidence accumulation on the
Droste, F., & Lindner, B. (2014). Integrate-and-fire neurons driven by dynamic clicks task. Neurons, Behavior, Data analysis, and
asymmetric dichotomous noise. Biological Cybernetics, 108(6), Theory. submitted.
825–843. Rahnev, D., & Denison, R.N. (2018). Suboptimality in perceptual
Droste, F., & Lindner, B. (2017). Exact results for power spectrum and decision making. Behavioral and Brain Sciences, 41, e223.
susceptibility of a leaky integrate-and-fire neuron with two-state [Link]
noise. Physical Review E, 95(1), 012411. Ratcliff, R. (1978). A theory of memory retrieval. Psychological
Drugowitsch, J. (2016). Fast and accurate monte carlo sampling of first-
Review, 85(2), 59.
passage times from wiener diffusion models. Scientific Reports, 6,
Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: the-
20490.
ory and data for two-choice decision tasks. Neural Computation,
Drugowitsch, J., Moreno-Bote, R., Churchland, A.K., Shadlen, M.N.,
20(4), 873–922.
Pouget, A. (2012). The cost of accumulating evidence in perceptual
Salinas, E., & Sejnowski, T.J. (2002). Integrate-and-fire neurons driven
decision making. Journal of Neuroscience, 32(11), 3612–3628.
by correlated stochastic input. Neural Computation, 14(9), 2111–
Eckhoff, P., Holmes, P., Law, C., Connolly, P., Gold, J. (2008). On
2155.
diffusion processes with variable drift rates as models for decision
Skellam, J.G. (1946). The frequency distribution of the difference
making during learning. New Journal of Physics, 10(1), 015006.
Eissa, T.L., Barendregt, N.W., Gold, J.I., Josić, K., Kilpatrick, Z.P. between two poisson variates belonging to different populations.
(2019). Hierarchical inference interactions in dynamic environ- Journal of the Royal Statistical Society Series A (General), 109(Pt
ments. In Computational and Systems Neuroscience. Lisbon. 3), 296–296.
Erban, R., & Chapman, S.J. (2007). Reactive boundary conditions for Smith, P.L. (2010). From poisson shot noise to the integrated ornstein–
stochastic simulations of reaction–diffusion processes. Physical uhlenbeck process: neurally principled models of information
Biology, 4(1), 16. accumulation in decision-making and response time. Journal of
Faisal, A.A., Selen, L.P., Wolpert, D.M. (2008). Noise in the nervous Mathematical Psychology, 54(2), 266–283.
system. Nature Reviews Neuroscience, 9(4), 292. Smith, P.L., & Ratcliff, R. (2004). Psychology and neurobiology of
Friedman, J., Hastie, T., Tibshirani, R. (2001). The elements of simple decisions. Trends in Neurosciences, 27(3), 161–168.
statistical learning, Vol. 1. New York: Springer series in statistics. Urai, A.E., Braun, A., Donner, T.H. (2017). Pupil-linked arousal is
chap 7: Model Assessment and Selection. driven by decision uncertainty and alters serial choice bias. Nature
Gardiner, C. (2004). Handbook of stochastic methods: for physics, Communications, 8, 14637.
chemistry & the natural sciences, (series in synergetics, vol. 13). Van Den Berg, R., Anandalingam, K., Zylberberg, A., Kiani, R.,
Geisler, W.S. (2003). Ideal observer analysis. The Visual Neuro- Shadlen, M.N., Wolpert, D.M. (2016). A common mechanism
sciences, 10(7), 12–12. underlies changes of mind about decisions and confidence. Elife,
Glaze, C.M., Kable, J.W., Gold, J.I. (2015). Normative evidence 5, e12192.
accumulation in unpredictable environments. Elife, 4, e08825. Veliz-Cuba, A., Kilpatrick, Z.P., Josic, K. (2016). Stochastic models of
Glaze, C.M., Filipowicz, A.L., Kable, J.W., Balasubramanian, V., evidence accumulation in changing environments. SIAM Review,
Gold, J.I. (2018). A bias–variance trade-off governs individual 58(2), 264–289.
differences in on-line learning in an unpredictable environment. Wilson, R.C., Nassar, M.R., Gold, J.I. (2010). Bayesian online learning
Nature Human Behaviour, 2(3), 213. of the hazard rate in change-point problems. Neural Computation,
Gold, J.I., & Shadlen, M.N. (2007). The neural basis of decision 22(9), 2452–2476.
making. Annual Review of Neuroscience, 30. Yu, A.J., & Cohen, J.D. (2008). Sequential effects: superstition or
Hanson, F.B. (2007). Applied stochastic processes and control for Jump- rational behavior? Advances in Neural Information Processing
diffusions: modeling, analysis, and computation, vol 13. SIAM. Systems, 21, 1873–1880.
Heath, R.A. (1992). A general nonstationary diffusion model for two- Zhang, S., Lee, M.D., Vandekerckhove, J., Maris, G., Wagenmakers,
choice decision-making. Mathematical Social Sciences, 23(3), E.J. (2014). Time-varying boundaries for diffusion models of deci-
283–309. sion making and response time. Frontiers in Psychology, 5, 1364.
Horsthemke, W., & Lefever, R. (2006). Noise-induced transitions:
theory and applications in physics, chemistry and biology. Publisher’s note Springer Nature remains neutral with regard to
Springer Series in Synergetics. Berlin: Springer. jurisdictional claims in published maps and institutional affiliations.