0% found this document useful (0 votes)
19 views15 pages

An FPGA Implementation of Bayesian Inference With

This research presents an FPGA implementation of Bayesian inference using spiking neural networks (SNNs), aiming to enhance the efficiency and speed of inference operations. The proposed hardware accelerator leverages parallelization techniques and the PYNQ framework to optimize resource consumption while significantly improving computational speed. Experimental results demonstrate that this FPGA-based approach can effectively support complex probabilistic model inference in embedded systems, highlighting its potential for various applications.

Uploaded by

Supriya R L
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views15 pages

An FPGA Implementation of Bayesian Inference With

This research presents an FPGA implementation of Bayesian inference using spiking neural networks (SNNs), aiming to enhance the efficiency and speed of inference operations. The proposed hardware accelerator leverages parallelization techniques and the PYNQ framework to optimize resource consumption while significantly improving computational speed. Experimental results demonstrate that this FPGA-based approach can effectively support complex probabilistic model inference in embedded systems, highlighting its potential for various applications.

Uploaded by

Supriya R L
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

TYPE Original Research

PUBLISHED 05 January 2024


DOI 10.3389/fnins.2023.1291051

An FPGA implementation of
OPEN ACCESS Bayesian inference with spiking
neural networks
EDITED BY
Priyadarshini Panda,
Yale University, United States

REVIEWED BY
Garrick Orchard, Haoran Li1 , Bo Wan2,3 , Ying Fang4,5*, Qifeng Li6*, Jian K. Liu7 and
Facebook Reality Labs Research, United States
Yujie Wu, Lingling An1,2
Tsinghua University, China
1
Qi Xu, Guangzhou Institute of Technology, Xidian University, Guangzhou, China, 2 School of Computer
Dalian University of Technology, China Science and Technology, Xidian University, Xi’an, China, 3 Key Laboratory of Smart Human Computer
Interaction and Wearable Technology of Shaanxi Province, Xi’an, China, 4 College of Computer and
*CORRESPONDENCE
Cyber Security, Fujian Normal University, Fuzhou, China, 5 Digital Fujian Internet-of-Thing Laboratory of
Ying Fang Environmental Monitoring, Fujian Normal University, Fuzhou, China, 6 Research Center of Information
[email protected] Technology, Beijing Academy of Agriculture and Forestry Sciences, National Engineering Research
Qifeng Li Center for Information Technology in Agriculture, Beijing, China, 7 School of Computer Science,
[email protected] University of Birmingham, Birmingham, United Kingdom
RECEIVED 08 September 2023
ACCEPTED 06 December 2023
PUBLISHED 05 January 2024
Spiking neural networks (SNNs), as brain-inspired neural network models based
CITATION on spikes, have the advantage of processing information with low complexity
Li H, Wan B, Fang Y, Li Q, Liu JK and An L (2024)
An FPGA implementation of Bayesian inference and efficient energy consumption. Currently, there is a growing trend to design
with spiking neural networks. hardware accelerators for dedicated SNNs to overcome the limitation of running
Front. Neurosci. 17:1291051. under the traditional von Neumann architecture. Probabilistic sampling is an
doi: 10.3389/fnins.2023.1291051
effective modeling approach for implementing SNNs to simulate the brain to
COPYRIGHT
achieve Bayesian inference. However, sampling consumes considerable time. It is
© 2024 Li, Wan, Fang, Li, Liu and An. This is an
open-access article distributed under the terms highly demanding for specific hardware implementation of SNN sampling models
of the Creative Commons Attribution License to accelerate inference operations. Hereby, we design a hardware accelerator
(CC BY). The use, distribution or reproduction based on FPGA to speed up the execution of SNN algorithms by parallelization.
in other forums is permitted, provided the
original author(s) and the copyright owner(s) We use streaming pipelining and array partitioning operations to achieve model
are credited and that the original publication in operation acceleration with the least possible resource consumption, and
this journal is cited, in accordance with combine the Python productivity for Zynq (PYNQ) framework to implement the
accepted academic practice. No use,
distribution or reproduction is permitted which model migration to the FPGA while increasing the speed of model operations.
does not comply with these terms. We verify the functionality and performance of the hardware architecture on the
Xilinx Zynq ZCU104. The experimental results show that the hardware accelerator
of the SNN sampling model proposed can significantly improve the computing
speed while ensuring the accuracy of inference. In addition, Bayesian inference
for spiking neural networks through the PYNQ framework can fully optimize
the high performance and low power consumption of FPGAs in embedded
applications.Taken together, our proposed FPGA implementation of Bayesian
inference with SNNs has great potential for a wide range of applications, it can
be ideal for implementing complex probabilistic model inference in embedded
systems.

KEYWORDS

spiking neural networks, probabilistic graphical models, Bayesian inference, importance


sampling, FPGA

1 Introduction
Neuroscience research plays an increasingly important role in accelerating and inspiring
the development of artificial intelligence (Demis et al., 2017; Zador et al., 2022). Spikes are the
fundamental information units in the neural systems of the brain (Bialek et al., 1999; Yu et al.,
2020), which also play an important role in information transcoding and representation in
artificial systems (Zhang et al., 2020; Gallego et al., 2022; Xu et al., 2022). Spiking neural

Frontiers in Neuroscience 01 frontiersin.org


Li et al. 10.3389/fnins.2023.1291051

networks (SNNs) utilize spikes as brain-inspired models are ASIC-based design implementations: Compared with general
proposed as a new generation of computational framework (Maass, integrated circuits, ASIC has the advantages of smaller size,
1997). SNNs have received extensive attention and can utilize many lower power consumption, improved reliability, improved
properties of artificial neural networks for deep learning in various performance,and enhanced confidentiality. ASICs can also reduce
tasks (Kim et al., 2018; Shen et al., 2021; Yang et al., 2022). costs compared to general-purpose integrated circuits in mass
Numerous neuroscience experiments (Ernst and Banks, 2002; production. Ma et al. (2017) designed a highly-configurable
Körding and Wolpert, 2004) have shown that the cognitive neuromorphic hardware coprocessor based on SNN implemented
and perceptual processes of the brain can be expressed as a with digital logic, called Darwin neural processing unit (NPU),
probabilistic reasoning process based on Bayesian reasoning. From which was fabricated as ASIC in SMIS’s 180 nm process for
the macroscopic perspective, Bayesian models have explained resource-constrained embedded scenarios. Tung et al. (2023)
how the brain processes uncertain information and have been proposed a design scheme for a spiking neural network ASIC
successfully applied in various fields of brain science (Shi et al., chip and developed a built-in-self-calibration (BSIC) architecture
2013; Chandrasekaran, 2017; Alais and Burr, 2019). In contrast, based on the chip to realize the network to perform high-precision
recent studies focus on implementing SNNs using probabilistic inference under a specified range of process parameter variations.
graphical models (PGMs) at the micro level (Yu et al., 2018a,b, Wang et al. (2023) proposed an ASIC learning engine consisting of
2019; Fang et al., 2019). However, the realization of PGMs is a memristor and an analog computing module for implementing
considerably slow due to the sampling process. Since probabilistic trace-based online learning in a spiking neural network, which
sampling on SNNs involves massive probabilistic computations significantly reduces energy consumption compared to existing
that can consume a lot of time and many computationally ASIC products of the same type. However, ASIC requires a long
intensive operations are involved in processing the data in the development cycle and is risky. Once there is a problem, the whole
neural network, the inference speed will be even slower with piece will be discarded. Consequently, we do not consider the use
the scale of the problem. In some practical application scenarios of ASIC for design here.
such as medical diagnosis, environmental monitoring, intelligent FPGA-based design implementations: FPGA has a shorter
monitoring, etc., these problems lead to poor real-time application, development cycle compared to ASIC, is flexible in use, can be used
which causes a series of problems. Therefore, we want to do some repeatedly, and has abundant resources.
acceleration and improvements to meet the demand for speed Ferianc et al. (2021) proposed an FPGA-based hardware
in real applications. At present, there are dedicated hardware design to accelerate Bayesian recurrent neural networks (RNNs),
designs for SNNs (Cai et al., 2018; Liu et al., 2019; Fang et al., it can achieve up to 10 times speedup compared with GPU
2020; Han et al., 2020; Zhu et al., 2022), and for PGMs based on implementation. Wang (2022) implemented a hardware accelerator
conventional artificial neural networks (Cai et al., 2018; Liu et al., on FPGA for the training and inference process of Bayesian belief
2020; Fan et al., 2021; Ferianc et al., 2021). Yet, there are few propagation neural network (BCPNN), and the computing speed
studies for hardware platforms to implement PGM-based SNNs. of the accelerator can improve the CPU counterpart by two orders
Therefore, it is highly demanding and meaningful for hardware of magnitude. However, RNN and BCPNN in the above two
acceleration of PGM-based SNNs, not only for simulation speed- designs are essentially traditional neural network architectures,
up but for neuromorphic computing implementation (Christensen which are different from the hardware implementation of the
et al., 2022). SNN architecture and cannot be directly applied to our SNN
In this study, we address this question by utilizing FPGA implementation.
hardware to implement a recently developed PGM-badsed SNN In addition, Fan et al. (2021) proposed a novel FPGA-
model, named the sampling-tree model (STM) (Yu et al., 2019). The based hardware architecture to accelerate BNNs inferred through
STM is an implementation of spiking neural circuits for Bayesian Monte Carlo, it can achieve up to nine times better compute
inference using importance sampling. In particular, The STM is efficiency compared with other state-of-the-art BNN accelerators.
a typical probabilistic graphical model based on a hierarchical Awano and Hashimoto (2023) proposed a Bayesian neural network
tree structure with a deep hierarchical structure of layer-on-layer hardware accumulator called B2N2, i.e., Bernoulli random number-
iteration and uses a multi-sampling mode based on sampling based Bayesian neural network accumulator, which reduces
coupled with population probability coding. Each node in the resource consumption by 50% compared to the same type of
model contains a large number of spiking neurons that represent FPGA implementation. For the above two designs, the hardware
samples. The STM process information based on spikes, where architecture proposed by Fan and Awano cannot be used for
spiking neurons integrate input spikes over time and fire a spike the acceleration of the STM, because the variational inference
when their membrane potential crosses a threshold. With these model and the Monte Carlo inference model are not suitable
properties, the STM is a typical example of PGM-based SNN for for importance sampling, but STM needs to be sampled through
Bayesian inference. The software implementation of sampling- importance sampling. In other words, the hardware architecture is
based SNN is very time-consuming, and actual tasks are limited different due to the different models, so we cannot use these two
by the model running speed on CPU. Therefore, to fulfill our hardware architectures to accelerate STM on the FPGA.
requirements for the running speed of the model, it is necessary to In summary, many previous designs were implemented on
choose a hardware platform for designing a hardware accelerator. FPGAs because ASIC is less flexible and complex than FPGAs
Here we need to consider which hardware platform is chosen (Ju et al., 2020). GPUs often perform very well on applications
to better implement the design of the accelerator. that benefit from parallelism, and are currently the most widely

Frontiers in Neuroscience 02 frontiersin.org


Li et al. 10.3389/fnins.2023.1291051

used platform for implementing neural networks. However, GPUs implement. Therefore, STM employs the tree structure of Bayesian
are not able to handle spike communication well in real-time, networks to convert global inference into local inference through
while the high energy consumption of GPUs leads to limitations network decomposition. Importance sampling is introduced to
in some embedded scenarios. Therefore, we chose the FPGA as a perform local inference, which ensures that each group of
compromise solution, which provides reasonable cost, low power neurons works simply, making the model suitable for large-scale
consumption, and flexibility for our design. Furthermore, for some distributed computing.
FPGA-based design implementations, due to the limitations of the Unlike the traditional method of sampling from a distribution
traditional ANN neural network architecture (Que et al., 2022) of interest, we use importance sampling to implement Bayesian
and some inference models are not suitable for sampling (Fan inference for spiking neural networks, which is a method of
et al., 2022), we also need to design a hardware implementation sampling from a simple distribution to achieve the estimation of
suitable for importance sampling (Shi and Griffiths, 2009). Based a certain function value. When given the variable y, the conditional
on the above design reference and our previous work that the expectation of a function f (x) is estimated by importance
STM of a neural network model for Bayesian inference, we finally sampling as:
chose FPGA to complete the design of the STM accelerator, and
also complete the neural network model construction of Bayesian P
X
inference on FPGA with the help of PYNQ framework to achieve x f (x)P(y|x)P(x)
E(f (x)|y) = f (x)P(x|y) = P
the acceleration of STM. The overall design idea is as follows. x x P(y|x)P(x)
Firstly, optimize the model inference part of the algorithm to make E(f (x)P(y|x))P(x) X P(y|xi )
full use of FPGA resources to improve program parallelism, thus = ≈ f (xi ) P i
, xi ∼ P(x).
E(P(y|x))P(x) i x xi P(y|x )
reducing the computing delay, and complete the design of custom
(1)
hardware IP cores. Secondly, the designed IP core is connected
to the whole hardware system, and the overall hardware module
where xi follows the distribution P(x). This equation transforms
control is realized according to the preset algorithm flow through
the conditional expectation E(f (x)|y) into a weighted combination
the PYNQ framework. P
of normalized conditional probabilities P(y|xi )/ xi P(y|xi ).
The main contributions of this work are as follows:
Importance sampling can be used to draw a large number of
samples from a simple prior, and skillfully convert the posterior
• We are the first work targeting acceleration of STM on
distribution into the ratio of likelihood, thereby estimating the
the FPGA board, and the inference results of the STM
expectation of the posterior distribution.
implemented on the FPGA are similar to the inference results
implemented by the CPU;
• We implemented the acceleration of the STM on a Xilinx Zynq
ZCU104 FPGA board, and we also found that the acceleration 2.2 Sampling-tree model with spiking
on the FPGA increases with the problem size, such as the neural network
number of model layers, the number of neurons, and other
factors; To build a general-purpose neural network for large-scale
• We demonstrate that the neural circuits we implemented on Bayesian models, the STM was proposed in the previous work (Yu
the FPGA board can be used to solve practical cognitive et al., 2019), as shown in Figure 1. As a spiking neural network
problems, such as the integration of multisensory, it can model for Bayesian inference, STM is also a probabilistic graph
also efficiently perform complex Bayesian reasoning tasks in model with an overall hierarchical structure. Each node in the graph
embedded scenarios. has a large number of neurons as sample data.
The STM is used to explain how Bayesian inference algorithms
can be implemented through neural networks in the brain, building
2 Related work large-scale Bayesian models for SNN. In contrast to other Bayesian
inference methods, the STM focuses on multiple sets of neurons
2.1 Bayesian inference with importance to achieve probabilistic inference in PGM with multiple nodes
sampling and edges. Performing neural sampling on deep tree-structured
neural circuits can transform global inference problems into local
Existing neural networks using variational-based inference inference tasks and achieve approximate inference. Furthermore,
methods such as belief propagation (BP) (Yedidia et al., 2005) since the STM does not have neural circuits specifically designed
and Monte Carlo (MC) (Nagata and Watanabe, 2008) can obtain for a specific task, it can be generalized to solve other inference
accurate inference results in some Bayesian models. However, problems. In summary, the STM is a general neural network model
most Bayesian models in the real world are more complex. that can be used for distributed large-scale Bayesian inference.
When using BP (George and Hawkins, 2009) or MCMC (Buesing In this model, the root node of the Bayesian network
et al., 2011) to implement Bayesian model inference, each or is the problem or reason that needs to be inferred in our
each group of neurons generally has to implement a different experiment, the leaf node represents the information or evidence
and complex computation in these neural networks. In addition, we receive from the outside world, and the branch nodes are
since spiking neural networks require multiple iterations to obtain the intermediate variable of the reasoning problem. From the
optimal Bayesian inference results, they are more complicated to macroscopic perspective, the STM is a probabilistic graphical

Frontiers in Neuroscience 03 frontiersin.org


Li et al. 10.3389/fnins.2023.1291051

FIGURE 1
Sampling-tree model. (A) An example of the STM in spiking neural networks. (B) A tree-structured Bayesian network corresponding to the STM in (A).

model with a hierarchical tree structure. From the neuron level,


each node in the model contains a group of spiking neurons,
and multiple connections between these neurons. Each spiking
neuron is regarded as a sample from a special distribution, and the
information transmission or probability calculation in the model is
achieved through the connections between neurons.

2.3 Hardware implementation using PYNQ


framework

PYNQ provides a Jupyter-based framework and Python API


for designing programmable logic circuits using the Xilinx adaptive
computing platform instead of using ASIC-style design tools.
PYNQ consists of three layers: application layer, software layer, and
hardware layer. The overall framework is shown in Figure 2. There
have been many works implementing neural network acceleration
FIGURE 2
on FPGAs with the help of the PYNQ framework before this. Overall framework of using PYNQ to develop Zynq.
Tzanos et al. (2019) implemented the acceleration of the
Naive Bayesian neural network algorithm on the Xilinx PYNQ-Z1
board. The hardware accelerator was evaluated on Naive Bayes-
based machine learning applications. Ju et al. (2020) proposed larger-scale impulsive neural network models, and the Monte Carlo
a hardware architecture to enable efficient implementation of inference method is not suitable for sampling models.
SNNs and validate it on the Xilinx ZCU102. However, this design In our work, we focus on ensuring the inference accuracy
directly mapped each different computing stage to a hardware of the STM on FPGAs while improving performance. Since
layer. Although this approach can improve the parallelism of the the PYNQ framework provides a Python environment that
program, this direct mapping method would consume a great integrates hardware Overlay for easy porting. And with the PYNQ
deal of the hardware resources or even exceed them. Awano and framework, we can implement hardware execution in parallel while
Hashimoto (2020) proposed an efficient inference algorithm for creating high-performance embedded applications, and execute
BNN, named BYNQNet, and its FPGA implementation. The Monte more complex analysis algorithms through Python programs,
Carlo inference method that this design was based on belongs to the performance of which can be close to desktop workstations.
variational inference, which is very complicated in implementing It also has the advantages of high integration, small size, and

Frontiers in Neuroscience 04 frontiersin.org


Li et al. 10.3389/fnins.2023.1291051

FIGURE 3
The example of Bayesian network. (A) A simple Bayesian neural network model. (B) The neural network architecture of the STM for the basic network
as in (A).

low power consumption. When using the PYNQ framework, codes (PPCs) (Ma et al., 2006, 2014). According to PPCs, the
the tight coupling between PS (Processing System, i.e., ARM activities of these neurons encoding stimuli inputs, I1 , I2 , and
processor) and PL (Programmable Logic, i.e. FPGA part) can others, can be obtained neuronal activity of the root node A.
achieve better responsiveness, higher reconfigurability, and richer For the second problem, we divide it into two steps, one is the
interface functions than traditional methods. The simplicity and calculation of the posterior probability, and the other is the neural
efficiency of the Python language and the acceleration provided implementation of the posterior probability. Based on importance
by programmable logic are also fully utilized. Finally, Xilinx has sampling, we can estimate the posterior probability by the ratio
simplified and improved the design of Zynq-based products on the approximation of the likelihood function, as shown in Eq. (2).
PYNQ framework by combining a hybrid library that implements
acceleration within Python and programmable logic. This is a
significant advantage over traditional SoC approaches that cannot
P(I1 , I2 |Bi1 , Bi2 ) · P(Bi1 , Bi2 )
use programmable logic. Therefore, we implement the Bayesian P(B1 = Bi1 , B2 = Bi2 |I1 , I2 ) = R
P(I1 , I2 |B1 , B2 ) · P(B1 , B2 )dB1 , B2
neural network inference algorithm on Xilinx ZCU104 with the
help of the PYNQ framework. P(I1 , I2 |Bi1 , Bi2 )
≈P i i
.
i P(I1 , I2 |B1 , B2 )
(2)
3 System analysis
Then, for the neural implementation of posterior probability,
In this section, we first summarize the basis of our work
Shi and Griffiths (2009) have shown that divisive normalization
on implementing probabilistic inference algorithms for the brain P
E(ri / i ri ) is commonly found in the cerebral cortex by
through neural networks. We then analyze the difficulties of
neuroscience experiments, and Eq. (3) has been proved, where ri
accelerating the probabilistic inference algorithm for running
is the firing rate of the ith neuron.
neural network models and briefly describe how we address
these difficulties.
X P(I1 , I2 |Bi1 , Bi2 )
E(ri / ri ) = P i i
. (3)
3.1 Neural network implementation i i P(I1 , I2 |B1 , B2 )

In this subsection, we take the neural network shown in Next, we will describe the processes and mechanisms of
Figure 3A as an example, and we consider the following two probabilistic inference implemented in the neural network
aspects in the implementation of the neural network: First, for (adapted from Fang et al. 2019). First, for the process of
the stimulus encoding problem, it is important to know how to probabilistic inference, the neural network processes external
accomplish the activities of neurons from stimulus input. Second, stimulus inputs I1 and I2 together in a bottom-up manner, as
for the estimation of posterior probability, it is also necessary to shown in Figure 3B. Second for the process of generation, which
consider how the activities of neurons realize the estimation of is to generate sampling neurons and the opposite of the inference
posterior probability because our final inference result requires the process. Based on the generative model in Figure 3A, we can get
expectation over posterior distribution. sampling neurons Bi1 and Bi2 from P(B1 ) and P(B2 ), respectively. In
For the first problem, we convert stimulus input information other words, we can get that the sampling neurons follow B1 , B2 ∼
into the activities of neurons through probabilistic population N(0, σ 2 ).

Frontiers in Neuroscience 05 frontiersin.org


Li et al. 10.3389/fnins.2023.1291051

FIGURE 4
Data interaction architecture between PS and PL, here we use m_axi interface for data transmission.

FIGURE 5
The design idea and overall computing architecture. (A) The program flow of the model on the ZCU104 board. (B) The hardware architecture of the
model.

3.2 Difficulties in designing the accelerator lot of resources, and clocks in the process of encoding, summing,
multiplying, and normalizing neurons, in which loops may also be
In this work, the communication settings between PS and PL nested. Although pipelines can be added to the loops to improve
should be considered first in the design of the accelerator. Since the parallelism of the model operation, the optimization is not
the design requires frequent data interactions during operation, satisfactory due to the large number of bases. Therefore, we propose
the selection of a suitable data interface can ensure the stability a highly parallelized structure by introducing an array division
of data transmission while improving the time required for data method to divide the array into blocks, which can further unroll
transmission. The second is the design in the PL part, the design the loop and make each loop execute independently to improve
of this part is mainly to complete the work of the FPGA, which the degree of program parallelization. In short, it is a method of
usually needs to achieve the purpose of acceleration by reducing exchanging space for time.
the Latency of the design.
For the communication setting between PS and PL, since the
BRAM in PL part is not enough to store a large amount of 4 Software and hardware
data and parameters, it is necessary to exchange data frequently optimizations
between the PL and PS parts. Therefore, in order to achieve
high-speed read/write operations for large-scale data, we use The design idea and overall architecture of this work are
the m_axi interface to realize it. Figure 4 shows the data shown in Figure 5, which consists of ARM, AXI interface, and
interaction architecture between PS and PL. The m_axi interface custom IP core designed by Vivado HLS. In the IP core part,
has independent read-and-write channels, supports burst transfer we mainly use the structure of the streaming pipeline to reduce
mode, and the potential performance can reach 17GB/s, which fully Latency and thus improve the operation speed. As mentioned in
meets our data scale and transfer speed requirements. the previous section, we use the AXI master interface provided
Furthermore, for the design of the PL part, since each node in by Xilinx for data transmission between PS and PL, and the
the model contains a large number of neurons, it will take up a prior distribution and sample data that are ready to participate

Frontiers in Neuroscience 06 frontiersin.org


Li et al. 10.3389/fnins.2023.1291051

FIGURE 6
Design optimization ideas consisting of on-chip BRAM and processing elements (PE) using array division.

TABLE 1 Comparison of resource consumption and Latency between the


in inference will be allocated and stored in the on-chip BRAM.
normal and the case using array division.
When the operation is finished, the result will also be returned to
the off-chip DDR memory through the AXI master interface for BRAM DSP FF LUT Latency
subsequent processing.
Normal 14 172 25,934 38,817 11,170
In our work, we use the Vivado HLS tool provided by Xilinx
to complete the design of the hardware IP core. This tool allows Array division 14 179 28,142 43,849 6,698

the synthesis of digital hardware directly using the high-level


description developed in C/C++. With this tool we can convert
C/C++ designs into RTL implementations for deployment on the bottlenecks and increase the degree of parallelism while using DSP
FPGA, thereby significantly reducing the time required for FPGA as much as possible. Table 1 is based on the Bayesian network model
development using traditional RTL descriptions. Therefore, the shown in Figure 3A. In the case of setting 1,000 neurons in each
hardware architecture of the STM accelerator is designed by the node, the resource consumption and latency of not using array
programming language C++. segmentation and using array segmentation are compared. It can
be seen that the resource consumption increases slightly with array
segmentation, but the Latency decreases significantly.
In addition, to further reduce resource utilization and improve
4.1 IP-core optimization
performance, we use a bit-width of 32 bits for each operation
through a simple quantization of floating-point operations. This
As mentioned in the last section, while adding the PIPELINE
kind of quantization has a relatively low negative impact on
directive to the loop, we also use the method of array division to
accuracy and can improve the performance of each IP core without
further improve the parallelism of the operation.
reducing the parameters and input accuracy. At the same time, to
Here we take the sum of arrays as an example to illustrate
alleviate the problem of the maximum frequency increase caused
how to improve parallelism. Under normal circumstances, the
by reusing the same hardware components, especially BRAM
summation of an array is to iterate through each element of the
resources, we added input and output registers to each BRAM
array and accumulate them in turn. But even if we use the pipeline
instance to meet the 10 ns clock cycle of each IP core. Algorithm 1
structure here, the accumulated value needs to be continuously
shows the pseudocode of the IP core design. By default, all
read and written during the accumulation process. To prevent the
nested loops are executed sequentially. During this process, Vivado
emergence of dirty data, which leads to a time gap between the two
HLS provides different pragmas to affect scheduling and resource
loops, thus slowing down the speed of operation. In contrast, after
allocation.
we divide the original large-scale array into 10 blocks through array
division, the subscripts of the array elements are accumulated every
10. In this way, the two adjacent loops in the accumulation process
do not read and write to the same memory, thereby eliminating the 4.2 Interface signal control
time interval that would normally occur, to achieve the degree of
parallelization of accumulation, as shown in Figure 6. When we compile the PL-side custom core, we need to set up
Finally, adding all blocks is the result of the array summation. the top-level file containing the form parameters and return values.
The purpose of the manual expansion is to avoid memory access These parameters are mapped to the hardware circuitry to generate

Frontiers in Neuroscience 07 frontiersin.org


Li et al. 10.3389/fnins.2023.1291051

Require: Get sample data and prior distribution b1,


b2, b3, a.
Ensure: Posterior probability post.
1. Calculate the likelihood distribution based on
sample data and prior probabilities.
for i in NumA do {Likelihood loop1}
for j in NumB do {Likelihood loop2}
la ← b1, b2, b3, a
end for
end for
2. Summation by array division.
3. Calculate the Posterior probability post based on
Eq. (2).
for i in NumA do {Posterior loop1}
for j in NumB do {Posterior loop2}
post ← la, sum(la)
end for
end for
4. Return calculation result.

Algorithm 1. IP-core design in pseudo-code. FIGURE 7


Timing diagram of ap_ctrl_hs four handshake signal functions. We
mainly use ap_start interface to send read data commands to the
FPGA, and detect ap_dong interface in real-time to determine
interface signals, which can be controlled to not only help set better whether the FPGA has completed the work.
constraints but also to better control the input and output data flow
according to the port timing. In addition, control logic needs to be
extracted to form a state machine, so some handshake signals such
as ap_start and ap_done will be formed. synaptic strengths to the PL part at runtime, which implements
Common interface constraints can be divided into Block- the specific neural network model. The main interface is used to
Level Protocols and Port-Level Protocols. Here we mainly use the connect the PL and PS parts of the SoC to ensure high-performance
ap_ctrl_hs signal in Block-Level Protocols, which contains four communication and data exchange between the IP-core and the
handshake signals ap_start, ap_idle, ap_ready, and ap_done. The PS in the streaming architecture. At the same time, the interlayer
ap_start signal is active high and indicates when the design starts pipeline inside each IP-core is highly customized to build a Co-
working. The ap_idle signal is active low and indicates whether the design with reset and GPIO. Both external stimulus values and
design is idle. The ap_ready signal indicates whether the design is synaptic strength values are stored in the cache of the BRAM in
currently ready to receive new inputs. The ap_done signal indicates the PL part to improve the data reading speed for STM inference.
when the data on the output signal line is valid. The specific
functional timing diagram is shown in Figure 7.
According to the timing diagram, we only need to pull the 5 Simulations
ap_start signal high and the design will automatically read or
write data through the AXI bus while performing the inference We use the Intel i7-10700 and i5-12500, two of the more
operation. When the ap_done signal is read high, it means that the capable CPUs currently available, as benchmarks to compare
design has been completed, and the valid operation result can be the performance of model inference implemented on FPGAs.
obtained by reading the memory allocated for return. We test the performance and accuracy of the STM on the
FPGA board for Bayesian inference on two brain perception
problems: causal inference and multisensory integration.
4.3 Hardware–software streaming The evaluation metrics include inference effectiveness and
architecture processing speed on the model. In terms of inference
effectiveness, causal inference is evaluated by the error
After the IP core has been designed, it is added to the rate varies with sample size, and multisensory integration
Zynq block design to create the complete hardware architecture, is evaluated by comparison of the inference results and
as shown in Figure 8. The axi_interconnection module ensures theoretical value.
communication between the IP core, PS system, and AXI interface.
The axi_intc module controls the communication interruption of
the interface. 5.1 Causal inference
Following the initialization of the design, the PS part will
be used to implement the bitstream loading of the SNN. It also Causal inference is the process by which the brain infers
allows the PS to pass the values of external stimuli and SNN the causal effect between cause and outcomes when it receives

Frontiers in Neuroscience 08 frontiersin.org


Li et al. 10.3389/fnins.2023.1291051

FIGURE 8
Hardware streaming architecture block design targeting the Soc with the m_axi interface between the PL and PS.

external information (Shams and Beierholm, 2010). The core By multiplying and linearly combining the normalized results
problem of causal inference is to calculate the probability with the synaptic weights, the posterior probability can be
of the cause, which can be expressed as the expectation calculated:
value defined on the posterior distribution. The calculation of
the posterior probability is converted into the calculation of
the prior probability and the likelihood probability through X X P(Bi , Bi , Bi , Bi |Al )
P(A = a|B1 , B2 , B3 , B4 ) = I(Al = a) P 1i 2i 3i 4i l .
importance sampling, to realize the simulation of the causal l P(B1 , B2 , B3 , B4 |A )
l i
inference process in the brain. In this experiment, we verify the
(5)
accuracy and efficiency of Bayesian inference in the STM on
the Xilinx ZCU104 FPGA board because probabilistic sampling
The results of the accuracy test are shown in Figure 9B. The
on SNNs involves a large number of probabilistic calculations
error rate of the stimulus estimation keeps decreasing as the sample
that can consume a lot of time, and the processing of the data
size increases, and when there are 2,000 sampled neurons, the error
in the inference process involves many computation-intensive
rate of stimulus estimation is already quite small. In addition, the
operations, and the CPU is not able to handle these tasks very
inference accuracy of the implementation on the FPGA is similar
quickly.
to that on the PC. Therefore, the STM we run on the FPGA board
In this paper, the validity of the model is verified from the
can guarantee the accuracy of inference.
accuracy of inference when different samples are input, and the
In terms of performance, we compare the design with
STM is modeled by the Bayesian network shown in Figure 9A.
multithreading and multiprogramming implementations on
Where B1 , B2 , B3 , and B4 represent the input stimulus in
traditional computing platforms, and the results are shown in
causal inference and A denotes the cause. The tuning curve
Table 2. It shows the processing time for each neuron sampling
of each spiking neuron can be represented as the state of the
when the number of sampled neurons is 4,000. It can be seen
variable. We suppose that the prior and conditional distributions
from the results that multithreading and multiprogramming do
are known, the distributions of these spiking neurons follow
not achieve the desired speedup but have the opposite effect. The
the prior distribution P(B1 , B2 , B3 , B4 ), and the tuning curve
possible reasons for this situation have been analyzed as follows:
of the neuron i is proportional to the likelihood distribution
(1) Multithreaded execution is not strictly parallel, and global
P(Bi1 , Bi2 , Bi3 , Bi4 |A). We can then normalize the output of Poisson
interpreter locks (GILs) can prevent parallel execution of multiple
spiking neurons through shunt inhibition and synaptic inhibition.
threads, so it may not be possible to take full advantage of multicore
Here we use yi to denote the individual firing rate of the
CPUs; (2) In terms of multiprogramming, perhaps the problem did
spiking neuron i and Y to denote the overall firing rate, and
not reach a certain size, resulting in the process creation process
then:
taking longer than the runtime. In addition, communication
between multiprocesses requires passing a large amount of sample
P(Bi , Bi , Bi , Bi |A) data, which introduces some overhead. For the above reasons, we
E(yi /Y = n) = P 1 i 2 i 3 i 4 i . (4)
i P(B1 , B2 , B3 , B4 |A)
finally considered using vectorization operations to vectorize the

Frontiers in Neuroscience 09 frontiersin.org


Li et al. 10.3389/fnins.2023.1291051

FIGURE 9
Simulation of causal inference. (A) The neural network architecture of the basic Bayesian network. (B) Comparison of error rates under PC and FPGA
platforms.

TABLE 2 Results of sampling time and speed-up of each neuron in the two-layer model.

Platform
Intel i7-10700 Intel i5-12500 ARM Xilinx
Processing 2.90 GHz 2.50 GHz ZCU104
time/neural (ms)
Normal 8.556 4.315 53.814 0.389

Multithreading 2 12.217 6.091

4 13.098 6.907

10 13.355 7.578

20 13.778 8.386

50 16.085 10.631

100 20.323 14.772

Multiprogramming 2 344.00 250.88

4 394.43 278.46

8 564.03 454.81

16 948.47 844.73

Vectorization 3.662 2.993


Bold values represents the optimal time on the corresponding platform.

sample data to reduce the number of loops and avoid the speed CPU decreases exponentially as the problem size increases when
limitations caused by nested loops. the need to shorten the inference time on the network model
From Table, we can see that vectorization is significantly faster through improvements and optimizations becomes even more
than serial execution, multithreading, and multiprogramming, important. In this section, we will use a multi-layer neural network
while the processing speed of the model on the FPGA is model to test large-scale Bayesian inference based on the sampling
significantly better than that of the PC. tree on the FPGA board. The STM is modeled by the Bayesian
network shown in Figure 10A, where I1 , I2 and I3 denote the input
stimuli in causal inference, A denotes the cause, and the rest are
5.2 Causal inference with multi-layer intermediate variables.
neural network In this simulation, we use several spiking neurons to encode
variables C1 , C2 , and C3 respectively, and the distribution of these
The simulation in the previous section verified the causal neurons follows the prior distribution P(C1 , C2 ) and P(C3 ). In
inference under a simple model. The inference speed on the addition, the tuning curves of these neurons are proportional to

Frontiers in Neuroscience 10 frontiersin.org


Li et al. 10.3389/fnins.2023.1291051

FIGURE 10
Simulation of causal inference with a multi-layer neural network. (A) The Bayesian model for multi-layer network structure. (B) Comparison of error
rates under PC and FPGA platforms.

j
the distribution P(I1 , I2 |C1i , C2i ) and P(I3 |C3 ). We can obtain the FPGA is more pronounced than in the two-layer model, even more
j than doubling.
average firing rates of spiking neurons C1i , C2i , and C3 , respectively:

P(I1 , I2 |C1i , C2i )


E(C1i , C2i ) = P , (6)
i i
i P(I1 , I2 |C1 , C2 ) 5.3 Multisensory integration

j In our daily life, we will obtain information from the


j P(I3 |C3 ) outside world from the sense such as vision, hearing, and tough
E(C3 ) = P j
. (7)
j P(I3 |C3 ) simultaneously, and the human brain can integrate this sensory
information in the optimal way to get detailed information about an
The firing rate calculation of neurons in other layers is similar
external object (Wozny et al., 2008). Some experiments have proved
to this layer. The firing rate of each layer is multiplied and fed
that the linear combination of different neuronal population
back to the next layer in the form of synaptic weights, and then
activities with probabilistic population coding corresponds to the
the posterior probability can be calculated:
process of multisensory integration (Ma et al., 2006). Here, to
demonstrate that our design can be generalized to other cognitive
P(A = a|I1 , I2 , I3 ) =
problems, we show that the STM on the FPGA board can
X X P(Bk1 , Bk2 |Al ) X P(C1i , C2i , C3 |Bk1 , Bk2 )
j
solve multisensory integration problems with high performance
I(Al = a) P k k l P j k k
l P(B1 , B2 |A ) i,j
i i and accuracy, and the final results can demonstrate that this
l k k P(C1 , C2 , C3 |B1 , B2 )
j work achieves good performance in the multisensory integration
P(I1 , I2 |C1i , C2i )P(I3 |C3 ) problem as well.
P i i P P(I |Cj )
i P(I1 , I2 |C1 , C2 ) j 3 3 The simulation first considers the visual-auditory-haptic
(8) integration problem, and the STM is modeled by the Bayesian
network shown in Figure 11A. Here S denotes the position of the
Similar to the simple model, the result of the STM under the object stimulus, SV , SH , and SA denote visual, auditory, and haptic
multi-layer neural network on the FPGA is shown in Figure 10B. cues, respectively. We suppose that P(S) is a uniform distribution,
From the figure, we can see that the model running on the P(SV |S), P(SH |S), and P(SA |S) are three Gaussian distributions,
FPGA can guarantee the accuracy of the inference. Moreover, the respectively. When given SV , SH , and SA , we can use importance
performance comparison is shown in Table 3, in the multilayer sampling to infer the posterior probability of S, as:
network model, multithreading and multiprogramming are equally
limited to achieve the desired results, so the same vectorization
X
operation is used to optimize the program. We can also see P(S = s|SV , SH , SA ) = I(S = s)P(S|SV , SH , SA )
the processing speed of the STM on FPGA is also improved S
compared with the traditional computing platform. In addition, X P(SV , SH , SA |Si )
= iI(Si = s) P , Si ∼ P(s).
we can find that due to the increase in the problem size of the iP(SV , SH , SA |Si )
multi-layer model, the acceleration of the model implemented on (9)

Frontiers in Neuroscience 11 frontiersin.org


Li et al. 10.3389/fnins.2023.1291051

TABLE 3 Results of sampling time and speed-up of each neuron in the multi-layer model.

Platform
Intel i7-10700 Intel i5-12500 ARM Xilinx
Processing 2.90 GHz 2.50 GHz ZCU104
time/neural (ms)
Normal 1.103 0.635 12.75 0.024

Multithreading 2 1.048 0.622

4 1.019 0.617

10 1.006 0.617

20 1.012 0.618

50 1.012 0.618

100 1.013 0.624

Multiprogramming 2 1.174 0.749

4 1.056 0.706

8 1.113 0.762

16 1.371 1.097

Vectorization 0.569 0.403


Bold values represents the optimal time on the corresponding platform.

FIGURE 11
Simulation of multisensory integration. (A) Left: The Bayesian model for visual-auditory-haptic integration, Right: Comparison of model inference
results and theoretical values on FPGA. (B) Left: The Bayesian model for visual-haptic integration, Right: Comparison of model inference results and
theoretical values on FPGA.

Frontiers in Neuroscience 12 frontiersin.org


Li et al. 10.3389/fnins.2023.1291051

TABLE 4 Results of sampling time and speed-up of each neuron in the simulation of multisensory integration.

Platform
Intel i7-10700 Intel i5-12500 ARM Xilinx
Processing 2.90 GHz 2.50 GHz ZCU104
time/neural (ms)
Normal 7.632 5.169 94.608 0.328

Vectorization 3.882 2.160


Bold values represents the optimal time on the corresponding platform.

In our simulation, multisensory integration inference is ASIC cannot be easily changed once it is completed. In contrast,
achieved through neural circuits based on PPC and normalization. FPGAs are programmable hardware that can be changed at any
We use 1,000 spiking neurons to encode stimuli whose states follow time according to demand without having to remanufacture the
the prior distribution P(S). We suppose that the tuning curve of hardware, and this flexibility is the reason why we ultimately chose
the neuron i is proportional to the distribution P(SV , SH , SA |Si ), FPGAs. FPGA is a compromise between the above two platforms,
and then use shunting inhibition and synaptic depression to make although some aspects of the performance is not up to the two, but
the output of spiking neurons normalized, the result will be fed it is a combination of the advantages of the two. It also provides
into the next spiking neuron with synaptic weights I(Si = s). reasonable cost, low power consumption, and reconfigurability for
Figure 11A shows the simulation results, where the inference result neuromorphic computing acceleration.
obtained from the STM on the FPGA board is in good agreement Thirdly, The experimental results and data on causal inference
with the theoretical values. Similar to the visual-auditory-haptic validate our conclusion: in the two-layer model, we can then see
integration, we also add a simulation of visual-haptic integration that the inference accuracy of the implementation on the FPGA
to improve the completeness, which is illustrated in Figure 11B. can approximate that of the implementation on the CPU, with an
Furthermore, the performance comparison is shown in Table 4, accuracy of up to 98%, and at the same time achieve a multifold
which shows a significant improvement in the sampling speed of speedup. The acceleration effect becomes more and more obvious
each neuron on the FPGA. Since the results of multi-threading and as the problem size increases, which is proved in the multi-layer
multi-process experiments were not ideal in previous experiments, model, and from the results we can see that the acceleration effect
only vectorization methods are compared here. The results also in the multi-layer model is more than twice as much as that in the
show that the running speed on FPGA is still better than that two-layer model. Moreover, in the experiments on multisensory
on CPU. integration, the experimental results also demonstrate that our
design implementation can be used for other real-world cognitive
problems while guaranteeing the accuracy of reasoning and the
6 Conclusion acceleration effect.
Finally, the hardware acceleration method proposed in the
In this work, we design an FPGA-based hardware accelerator paper can simulate the working principle of biological neurons
for PGM-based SNNs with the help of the PYNQ framework. very well. Meanwhile, due to the characteristics of low power
Firstly, the STM, as a novel SNN simulation model for causal consumption and real-time response of FPGA, this method can
inference, can convert a global complex inference problem into have a wide range of applications in the embedded field. The
a local simple inference problem, thus realizing high-precision realized causal inference problems can be used in policy evaluation,
approximate inference. Furthermore, as a generalized neural financial decision-making and other fields, and the multisensory
network model, the STM does not formulate a neural network for a integration can be used in vehicle environment perception, medical
specific task and thus can be generalized to other problems. Our diagnosis and other fields. Specifically, in application scenarios
hardware implementation is based on this solid and innovative such as smart home application environments, causal inference
theoretical model, which solves the problem of slow model can be used to achieve reasoning about factors affecting health
computation based on its realization of large-scale multi-layer and provide personalized health advice. Sensory cues such as
complex model inference. vision and hearing are combined to provide a better perceive
Secondly, As the first work to realize the hardware acceleration the home environment and thus provide intelligent control. Our
of the STM, we chose the FPGA platform as the acceleration work provides a solution for such application scenarios and these
platform of the model. For CPUs and GPUs, both of them need to practical applications are expected to promote the progress of
go through operations such as fetching instructions, decoding, and the neuromorphic computing field and make it better meet the
various branch logic jumps, and the energy consumption of GPUs practical application requirements. In addition, so far the STM
is too high. In contrast, the function of each logic unit of an FPGA does not consider learning, which is an important aspect of
is determined at the time of reprogramming and does not require adaptation between tasks. All the results of our simulations are
these instruction operations, so FPGAs can enjoy lower latency based on inference with known prior probabilities and conditional
and energy consumption. Compared to hardware platform ASICs, probabilities. Therefore, in future work, we need to combine
FPGAs are more flexible. Although ASICs are superior to FPGAs in learning and inference into one framework and introduce some
terms of throughput, latency, and power consumption, their high learning mechanisms to make the model more complete and
cost and long cycle time cannot be ignored, and the design of an flexible for multiple tasks.

Frontiers in Neuroscience 13 frontiersin.org


Li et al. 10.3389/fnins.2023.1291051

Data availability statement Guangdong Province of China (Grant No. 2022A1515011424),


the Science and Technology Planning Project of Guangdong
The raw data supporting the conclusions of this article will be Province of China (Grant No. 2023A0505050126), the Outstanding
made available by the authors, without undue reservation. Scientist Cultivation Program of Beijing Academy of Agriculture
and Forestry Sciences (Grant No. JKZX202214), the Natural
Science Foundation of Fujian Province of China (Grant No.
Author contributions 2022J01656), the Foundation of National Key Laboratory of
Human Factors Engineering (Grant No. 6142222210101), and the
HL: Methodology, Data curation, Investigation, Software, Key Industry Innovation Chain Projects of Shaanxi, China (Grant
Validation, Writing – original draft. BW: Methodology, No. 2021ZDLGY07-04).
Conceptualization, Supervision, Writing – review & editing. QL:
Methodology, Writing – review & editing, Project administration.
YF: Methodology, Writing – review & editing, Conceptualization, Conflict of interest
Formal analysis, Software, Supervision, Writing – original draft. JL:
Formal analysis, Supervision, Writing – review & editing, Project The authors declare that the research was conducted in the
administration. LA: Supervision, Writing – review & editing, absence of any commercial or financial relationships that could be
Conceptualization, Methodology. construed as a potential conflict of interest.

Funding Publisher’s note


The author(s) declare financial support was received for All claims expressed in this article are solely those of the
the research, authorship, and/or publication of this article. authors and do not necessarily represent those of their affiliated
This work was partially supported by the National Natural organizations, or those of the publisher, the editors and the
Science Foundation of China (Grant No. 62072355), the Key reviewers. Any product that may be evaluated in this article, or
Research and Development Program of Shaanxi Province of China claim that may be made by its manufacturer, is not guaranteed or
(Grant No. 2022KWZ-10), the Natural Science Foundation of endorsed by the publisher.

References
Alais, D., and Burr, D. (2019). “Cue combination within a Bayesian Fan, H., Ferianc, M., Rodrigues, M., Zhou, H., Niu, X., Luk, W., et al. (2021).
framework," in Multisensory Processes (New York, NY: Springer), 9–31. “High-performance FPGA-based accelerator for Bayesian neural networks," in 2021
doi: 10.1007/978-3-030-10461-0_2 58th ACM/IEEE Design Automation Conference (San Francisco, CA: IEEE), 1063–1068.
doi: 10.1109/DAC18074.2021.9586137
Awano, H., and Hashimoto, M. (2020). “BYNQNET: Bayesian neural network with
quadratic activations for sampling-free uncertainty estimation on FPGA," in 2020 Fang, H., Mei, Z., Shrestha, A., Zhao, Z., Li, Y., Qiu, Q., et al. (2020).
Design, Automation and Test in Europe Conference and Exhibition (Grenoble: IEEE), “Encoding, model, and architecture: systematic optimization for spiking
1402–1407. doi: 10.23919/DATE48585.2020.9116302 neural network in FPGAs," in Proceedings of the 39th International
Conference on Computer-Aided Design (IEEE), 1–9. doi: 10.1145/3400302.34
Awano, H., and Hashimoto, M. (2023). B2N2: resource efficient Bayesian 15608
neural network accelerator using Bernoulli sampler on FPGA. Integration 89, 1–8.
doi: 10.1016/j.vlsi.2022.11.005 Fang, Y., Yu, Z., Liu, J. K., and Chen, F. (2019). A unified
neural circuit of causal inference and multisensory integration.
Bialek, W., Rieke, F., van Steveninck, R., and Warland, D. (1999). Spikes: Exploring Neurocomputing 358, 355–368. doi: 10.1016/j.neucom.2019.
the Neural Code (Computational Neuroscience). Cambridge, MA: The MIT Press. 05.067
Buesing, L., Bill, J., Nessler, B., and Maass, W. (2011). Neural dynamics as sampling: Ferianc, M., Que, Z., Fan, H., Luk, W., and Rodrigues, M. (2021). “Optimizing
a model for stochastic computation in recurrent networks of spiking neurons. PLoS Bayesian recurrent neural networks on an FPGA-based accelerator," in 2021
Comput. Biol. 7, 188–200. doi: 10.1371/journal.pcbi.1002211 International Conference on Field-Programmable Technology (Auckland: IEEE), 1–10.
Cai, R., Ren, A., Liu, N., Ding, C., Wang, L., Qian, X., et al. (2018). VIBNN: doi: 10.1109/ICFPT52863.2021.9609847
hardware acceleration of Bayesian neural networks. ACM SIGPLAN Notices 53, Gallego, G., Delbruck, T., Orchard, G., Bartolozzi, C., Taba, B., Censi, A., et al.
476–488. doi: 10.1145/3296957.3173212 (2022). Event-based vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44,
Chandrasekaran, C. (2017). Computational principles and models of multisensory 154–180. doi: 10.1109/TPAMI.2020.3008413
integration. Curr. Opin. Neurobiol. 43, 25–34. doi: 10.1016/j.conb.2016.11.002 George, D., and Hawkins, J. (2009). Towards a mathematical theory of cortical
Christensen, D. V., Dittmann, R., Linares-Barranco, B., Sebastian, A., et al. (2022). micro-circuits. PLoS Comput. Biol. 5, e1000532. doi: 10.1371/journal.pcbi.1000532
2022 roadmap on neuromorphic computing and engineering. Neuromorphic Comput. Han, J., Li, Z., Zheng, W., and Zhang, Y. (2020). Hardware implementation
Eng. 2, 022501. doi: 10.1088/2634-4386/ac4a83 of spiking neural networks on FPGA. Tsinghua Sci. Technol. 25, 479–486.
Demis, H., Dharshan, K., Christopher, S., and Matthew, B. (2017). doi: 10.26599/TST.2019.9010019
Neuroscience-inspired artificial intelligence. Neuron 95, 245–258.
Ju, X., Fang, B., Yan, R., Xu, X., and Tang, H. (2020). An FPGA implementation of
doi: 10.1016/j.neuron.2017.06.011
deep spiking neural networks for low-power and fast classification. Neural Comput. 32,
Ernst, M. O., and Banks, M. S. (2002). Humans integrate visual and 182–204. doi: 10.1162/neco_a_01245
haptic information in a statistically optimal fashion. Nature 415, 429–433.
Kim, J., Koo, J., Kim, T., and Kim, J.-J. (2018). Efficient synapse memory
doi: 10.1038/415429a
structure for reconfigurable digital neuromorphic hardware. Front. Neurosci. 12, 829.
Fan, H., Ferianc, M., Que, Z., Liu, S., Niu, X., Rodrigues, M. R., et doi: 10.3389/fnins.2018.00829
al. (2022). FPGA-based acceleration for Bayesian convolutional neural
Körding, K. P., and Wolpert, D. M. (2004). Bayesian integration in sensorimotor
networks. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 41, 5343–5356.
learning. Nature 427, 244–247. doi: 10.1038/nature02169
doi: 10.1109/TCAD.2022.3160948

Frontiers in Neuroscience 14 frontiersin.org


Li et al. 10.3389/fnins.2023.1291051

Liu, K., Cui, X., Zhong, Y., Kuang, Y., Wang, Y., Tang, H., et al. (2019). A hardware Wang, D. (2022). Design and Implementation of FPGA-based Hardware Accelerator
implementation of SNN-based spatio-temporal memory model. Front. Neurosci. 13, for Bayesian Confidence [Master’s Thesis]. Turku: The University of Turku Quality.
835. doi: 10.3389/fnins.2019.00835
Wang, D., Xu, J., Li, F., Zhang, L., Cao, C., Stathis, D., et al.
Liu, L., Wang, D., Wang, Y., Lansner, A., Hemani, A., Yang, Y., et al. (2020). (2023). A memristor-based learning engine for synaptic trace-based
A “FPGA-based hardware accelerator for Bayesian confidence propagation neural online learning. IEEE Trans. Biomed. Circuits Syst. 17, 1153–1165.
network," in 2020 IEEE Nordic Circuits and Systems Conference (Oslo: IEEE), 1–6. doi: 10.1109/TBCAS.2023.3291021
doi: 10.1109/NorCAS51424.2020.9265129
Wozny, D. R., Beierholm, U. R., and Shams, L. (2008). Human trimodal perception
Ma, D., Shen, J., Gu, Z., Zhang, M., Zhu, X., Xu, X., et al. (2017). Darwin: a follows optimal statistical inference. J. Vis. 8, 24. doi: 10.1167/8.3.24
neuromorphic hardware co-processor based on spiking neural networks. J. Syst. Archit.
Xu, Q., Shen, J., Ran, X., Tang, H., Pan, G., Liu, J. K., et al. (2022). Robust transcoding
77, 43–51. doi: 10.1016/j.sysarc.2017.01.003
sensory information with neural spikes. IEEE Trans. Neural Netw. Learn. Syst. 33,
Ma, W., Beck, J. M., Latham, P. E., and Pouget, A. (2006). Bayesian inference with 1935–1946. doi: 10.1109/TNNLS.2021.3107449
probabilistic population codes. Nat. Neurosci. 9, 1432–1438. doi: 10.1038/nn1790
Yang, Z., Guo, S., Fang, Y., and Liu, J. K. (2022). “Biologically plausible variational
Ma, W., and Jazayeri, M. (2014). Neural coding of uncertainty and probability. Ann. policy gradient with spiking recurrent winner-take-all networks," in 33rd British
Rev. Neurosci. 37, 205–220. doi: 10.1146/annurev-neuro-071013-014017 Machine Vision Conference 2022 (London: BMVA Press), 21–24.
Maass, W. (1997). Networks of spiking neurons: the third generation of neural Yedidia, J. S., Freeman, W. T., and Weiss, Y. (2005). Constructing free-energy
network models. Neural Netw. 10, 1659–1671. doi: 10.1016/S0893-6080(97)00011-7 approximations and generalized belief propagation algorithms. IEEE Trans. Inf. Theory
51, 2282–2312. doi: 10.1109/TIT.2005.850085
Nagata, K., and Watanabe, S. (2008). Exchange Monte Carlo sampling from
Bayesian posterior for singular learning machines. IEEE Trans. Neural Netw. 19, Yu, Z., Chen, F., and Liu, J. K. (2019). Sampling-tree model: efficient
1253–1266. doi: 10.1109/TNN.2008.2000202 implementation of distributed Bayesian inference in neural networks. IEEE Trans.
Cogn. Dev. Syst. 12, 497–510. doi: 10.1109/TCDS.2019.2927808
Que, Z., Nakahara, H., Fan, H., Li, H., Meng, J., Tsoi, K. H., et al. (2022). “Remarn: a
reconfigurable multi-threaded multi-core accelerator for recurrent neural networks," in Yu, Z., Deng, F., Guo, S., Yan, Q., Liu, J. K., Chen, F., et al. (2018a). Emergent
ACM Transactions on Reconfigurable Technology and Systems (New York, NY: ACM). inference of hidden Markov models in spiking winner-take-all neural networks. IEEE
doi: 10.1145/3534969 Trans. Cybern. 50, 1347–1354. doi: 10.1109/TCYB.2018.2871144
Shams, L., and Beierholm, U. R. (2010). Causal inference in perception. Trends Yu, Z., Liu, J. K., Jia, S., Zhang, Y., Zheng, Y., Tian, Y., et al. (2020). Toward the next
Cogn. Sci. 14, 425–432. doi: 10.1016/j.tics.2010.07.001 generation of retinal neuroprosthesis: visual computation with spikes. Engineering 6,
Shen, J., Liu, J. K., and Wang, Y. (2021). Dynamic spatiotemporal pattern 449–461. doi: 10.1016/j.eng.2020.02.004
recognition with recurrent spiking neural network. Neural Comput. 33, 2971–2995. Yu, Z., Tian, Y., Huang, T., and Liu, J. K. (2018b). Winner-take-all
doi: 10.1162/neco_a_01432 as basic probabilistic inference unit of neuronal circuits. arXiv [preprint].
Shi, L., and Griffiths, T. (2009). Neural implementation of hierarchical Bayesian 10.48550/arXiv.1808.00675
inference by importance sampling. Adv. Neural. Inf. Process Syst. 22. Zador, A., Escola, S., Richards, B., Ölveczky, B., Bengio, Y., Boahen, K., et al. (2022).
Shi, Z., Church, R. M., and Meck, W. H. (2013). Bayesian optimization of time Toward next-generation artificial intelligence: catalyzing the NeuroAI revolution.
perception. Trends Cogn. Sci. 17, 556–564. doi: 10.1016/j.tics.2013.09.009 arXiv [preprint]. doi: 10.1038/s41467-023-37180-x

Tung, C., Hou, K.-W., and Wu, C.-W. (2023). “A built-in self-calibration scheme Zhang, Y., Jia, S., Zheng, Y., Yu, Z., Tian, Y., Ma, S., et al. (2020).
for memristor-based spiking neural networks," in 2023 International VLSI Symposium Reconstruction of natural visual scenes from neural spikes with deep
on Technology, Systems and Applications (VLSI-TSA/VLSI-DAT) (HsinChu: IEEE), 1–4. neural networks. Neural Netw. 125, 19–30. doi: 10.1016/j.neunet.2020.
doi: 10.1109/VLSI-TSA/VLSI-DAT57221.2023.10134261 01.033
Tzanos, G., Kachris, C., and Soudris, D. (2019). “Hardware acceleration on Zhu, Y., Zhang, Y., Xie, X., and Huang, T. (2022). An FPGA
gaussian naive bayes machine learning algorithm," in 2019 8th International accelerator for high-speed moving objects detection and tracking with
Conference on Modern Circuits and Systems Technologies (Thessaloniki: IEEE), 1–5. a spike camera. Neural Comput. 34, 1812–1839. doi: 10.1162/neco_
doi: 10.1109/MOCAST.2019.8741875 a_01507

Frontiers in Neuroscience 15 frontiersin.org

You might also like