An FPGA Implementation of Bayesian Inference With
An FPGA Implementation of Bayesian Inference With
An FPGA implementation of
OPEN ACCESS Bayesian inference with spiking
neural networks
EDITED BY
Priyadarshini Panda,
Yale University, United States
REVIEWED BY
Garrick Orchard, Haoran Li1 , Bo Wan2,3 , Ying Fang4,5*, Qifeng Li6*, Jian K. Liu7 and
Facebook Reality Labs Research, United States
Yujie Wu, Lingling An1,2
Tsinghua University, China
1
Qi Xu, Guangzhou Institute of Technology, Xidian University, Guangzhou, China, 2 School of Computer
Dalian University of Technology, China Science and Technology, Xidian University, Xi’an, China, 3 Key Laboratory of Smart Human Computer
Interaction and Wearable Technology of Shaanxi Province, Xi’an, China, 4 College of Computer and
*CORRESPONDENCE
Cyber Security, Fujian Normal University, Fuzhou, China, 5 Digital Fujian Internet-of-Thing Laboratory of
Ying Fang Environmental Monitoring, Fujian Normal University, Fuzhou, China, 6 Research Center of Information
[email protected] Technology, Beijing Academy of Agriculture and Forestry Sciences, National Engineering Research
Qifeng Li Center for Information Technology in Agriculture, Beijing, China, 7 School of Computer Science,
[email protected] University of Birmingham, Birmingham, United Kingdom
RECEIVED 08 September 2023
ACCEPTED 06 December 2023
PUBLISHED 05 January 2024
Spiking neural networks (SNNs), as brain-inspired neural network models based
CITATION on spikes, have the advantage of processing information with low complexity
Li H, Wan B, Fang Y, Li Q, Liu JK and An L (2024)
An FPGA implementation of Bayesian inference and efficient energy consumption. Currently, there is a growing trend to design
with spiking neural networks. hardware accelerators for dedicated SNNs to overcome the limitation of running
Front. Neurosci. 17:1291051. under the traditional von Neumann architecture. Probabilistic sampling is an
doi: 10.3389/fnins.2023.1291051
effective modeling approach for implementing SNNs to simulate the brain to
COPYRIGHT
achieve Bayesian inference. However, sampling consumes considerable time. It is
© 2024 Li, Wan, Fang, Li, Liu and An. This is an
open-access article distributed under the terms highly demanding for specific hardware implementation of SNN sampling models
of the Creative Commons Attribution License to accelerate inference operations. Hereby, we design a hardware accelerator
(CC BY). The use, distribution or reproduction based on FPGA to speed up the execution of SNN algorithms by parallelization.
in other forums is permitted, provided the
original author(s) and the copyright owner(s) We use streaming pipelining and array partitioning operations to achieve model
are credited and that the original publication in operation acceleration with the least possible resource consumption, and
this journal is cited, in accordance with combine the Python productivity for Zynq (PYNQ) framework to implement the
accepted academic practice. No use,
distribution or reproduction is permitted which model migration to the FPGA while increasing the speed of model operations.
does not comply with these terms. We verify the functionality and performance of the hardware architecture on the
Xilinx Zynq ZCU104. The experimental results show that the hardware accelerator
of the SNN sampling model proposed can significantly improve the computing
speed while ensuring the accuracy of inference. In addition, Bayesian inference
for spiking neural networks through the PYNQ framework can fully optimize
the high performance and low power consumption of FPGAs in embedded
applications.Taken together, our proposed FPGA implementation of Bayesian
inference with SNNs has great potential for a wide range of applications, it can
be ideal for implementing complex probabilistic model inference in embedded
systems.
KEYWORDS
1 Introduction
Neuroscience research plays an increasingly important role in accelerating and inspiring
the development of artificial intelligence (Demis et al., 2017; Zador et al., 2022). Spikes are the
fundamental information units in the neural systems of the brain (Bialek et al., 1999; Yu et al.,
2020), which also play an important role in information transcoding and representation in
artificial systems (Zhang et al., 2020; Gallego et al., 2022; Xu et al., 2022). Spiking neural
networks (SNNs) utilize spikes as brain-inspired models are ASIC-based design implementations: Compared with general
proposed as a new generation of computational framework (Maass, integrated circuits, ASIC has the advantages of smaller size,
1997). SNNs have received extensive attention and can utilize many lower power consumption, improved reliability, improved
properties of artificial neural networks for deep learning in various performance,and enhanced confidentiality. ASICs can also reduce
tasks (Kim et al., 2018; Shen et al., 2021; Yang et al., 2022). costs compared to general-purpose integrated circuits in mass
Numerous neuroscience experiments (Ernst and Banks, 2002; production. Ma et al. (2017) designed a highly-configurable
Körding and Wolpert, 2004) have shown that the cognitive neuromorphic hardware coprocessor based on SNN implemented
and perceptual processes of the brain can be expressed as a with digital logic, called Darwin neural processing unit (NPU),
probabilistic reasoning process based on Bayesian reasoning. From which was fabricated as ASIC in SMIS’s 180 nm process for
the macroscopic perspective, Bayesian models have explained resource-constrained embedded scenarios. Tung et al. (2023)
how the brain processes uncertain information and have been proposed a design scheme for a spiking neural network ASIC
successfully applied in various fields of brain science (Shi et al., chip and developed a built-in-self-calibration (BSIC) architecture
2013; Chandrasekaran, 2017; Alais and Burr, 2019). In contrast, based on the chip to realize the network to perform high-precision
recent studies focus on implementing SNNs using probabilistic inference under a specified range of process parameter variations.
graphical models (PGMs) at the micro level (Yu et al., 2018a,b, Wang et al. (2023) proposed an ASIC learning engine consisting of
2019; Fang et al., 2019). However, the realization of PGMs is a memristor and an analog computing module for implementing
considerably slow due to the sampling process. Since probabilistic trace-based online learning in a spiking neural network, which
sampling on SNNs involves massive probabilistic computations significantly reduces energy consumption compared to existing
that can consume a lot of time and many computationally ASIC products of the same type. However, ASIC requires a long
intensive operations are involved in processing the data in the development cycle and is risky. Once there is a problem, the whole
neural network, the inference speed will be even slower with piece will be discarded. Consequently, we do not consider the use
the scale of the problem. In some practical application scenarios of ASIC for design here.
such as medical diagnosis, environmental monitoring, intelligent FPGA-based design implementations: FPGA has a shorter
monitoring, etc., these problems lead to poor real-time application, development cycle compared to ASIC, is flexible in use, can be used
which causes a series of problems. Therefore, we want to do some repeatedly, and has abundant resources.
acceleration and improvements to meet the demand for speed Ferianc et al. (2021) proposed an FPGA-based hardware
in real applications. At present, there are dedicated hardware design to accelerate Bayesian recurrent neural networks (RNNs),
designs for SNNs (Cai et al., 2018; Liu et al., 2019; Fang et al., it can achieve up to 10 times speedup compared with GPU
2020; Han et al., 2020; Zhu et al., 2022), and for PGMs based on implementation. Wang (2022) implemented a hardware accelerator
conventional artificial neural networks (Cai et al., 2018; Liu et al., on FPGA for the training and inference process of Bayesian belief
2020; Fan et al., 2021; Ferianc et al., 2021). Yet, there are few propagation neural network (BCPNN), and the computing speed
studies for hardware platforms to implement PGM-based SNNs. of the accelerator can improve the CPU counterpart by two orders
Therefore, it is highly demanding and meaningful for hardware of magnitude. However, RNN and BCPNN in the above two
acceleration of PGM-based SNNs, not only for simulation speed- designs are essentially traditional neural network architectures,
up but for neuromorphic computing implementation (Christensen which are different from the hardware implementation of the
et al., 2022). SNN architecture and cannot be directly applied to our SNN
In this study, we address this question by utilizing FPGA implementation.
hardware to implement a recently developed PGM-badsed SNN In addition, Fan et al. (2021) proposed a novel FPGA-
model, named the sampling-tree model (STM) (Yu et al., 2019). The based hardware architecture to accelerate BNNs inferred through
STM is an implementation of spiking neural circuits for Bayesian Monte Carlo, it can achieve up to nine times better compute
inference using importance sampling. In particular, The STM is efficiency compared with other state-of-the-art BNN accelerators.
a typical probabilistic graphical model based on a hierarchical Awano and Hashimoto (2023) proposed a Bayesian neural network
tree structure with a deep hierarchical structure of layer-on-layer hardware accumulator called B2N2, i.e., Bernoulli random number-
iteration and uses a multi-sampling mode based on sampling based Bayesian neural network accumulator, which reduces
coupled with population probability coding. Each node in the resource consumption by 50% compared to the same type of
model contains a large number of spiking neurons that represent FPGA implementation. For the above two designs, the hardware
samples. The STM process information based on spikes, where architecture proposed by Fan and Awano cannot be used for
spiking neurons integrate input spikes over time and fire a spike the acceleration of the STM, because the variational inference
when their membrane potential crosses a threshold. With these model and the Monte Carlo inference model are not suitable
properties, the STM is a typical example of PGM-based SNN for for importance sampling, but STM needs to be sampled through
Bayesian inference. The software implementation of sampling- importance sampling. In other words, the hardware architecture is
based SNN is very time-consuming, and actual tasks are limited different due to the different models, so we cannot use these two
by the model running speed on CPU. Therefore, to fulfill our hardware architectures to accelerate STM on the FPGA.
requirements for the running speed of the model, it is necessary to In summary, many previous designs were implemented on
choose a hardware platform for designing a hardware accelerator. FPGAs because ASIC is less flexible and complex than FPGAs
Here we need to consider which hardware platform is chosen (Ju et al., 2020). GPUs often perform very well on applications
to better implement the design of the accelerator. that benefit from parallelism, and are currently the most widely
used platform for implementing neural networks. However, GPUs implement. Therefore, STM employs the tree structure of Bayesian
are not able to handle spike communication well in real-time, networks to convert global inference into local inference through
while the high energy consumption of GPUs leads to limitations network decomposition. Importance sampling is introduced to
in some embedded scenarios. Therefore, we chose the FPGA as a perform local inference, which ensures that each group of
compromise solution, which provides reasonable cost, low power neurons works simply, making the model suitable for large-scale
consumption, and flexibility for our design. Furthermore, for some distributed computing.
FPGA-based design implementations, due to the limitations of the Unlike the traditional method of sampling from a distribution
traditional ANN neural network architecture (Que et al., 2022) of interest, we use importance sampling to implement Bayesian
and some inference models are not suitable for sampling (Fan inference for spiking neural networks, which is a method of
et al., 2022), we also need to design a hardware implementation sampling from a simple distribution to achieve the estimation of
suitable for importance sampling (Shi and Griffiths, 2009). Based a certain function value. When given the variable y, the conditional
on the above design reference and our previous work that the expectation of a function f (x) is estimated by importance
STM of a neural network model for Bayesian inference, we finally sampling as:
chose FPGA to complete the design of the STM accelerator, and
also complete the neural network model construction of Bayesian P
X
inference on FPGA with the help of PYNQ framework to achieve x f (x)P(y|x)P(x)
E(f (x)|y) = f (x)P(x|y) = P
the acceleration of STM. The overall design idea is as follows. x x P(y|x)P(x)
Firstly, optimize the model inference part of the algorithm to make E(f (x)P(y|x))P(x) X P(y|xi )
full use of FPGA resources to improve program parallelism, thus = ≈ f (xi ) P i
, xi ∼ P(x).
E(P(y|x))P(x) i x xi P(y|x )
reducing the computing delay, and complete the design of custom
(1)
hardware IP cores. Secondly, the designed IP core is connected
to the whole hardware system, and the overall hardware module
where xi follows the distribution P(x). This equation transforms
control is realized according to the preset algorithm flow through
the conditional expectation E(f (x)|y) into a weighted combination
the PYNQ framework. P
of normalized conditional probabilities P(y|xi )/ xi P(y|xi ).
The main contributions of this work are as follows:
Importance sampling can be used to draw a large number of
samples from a simple prior, and skillfully convert the posterior
• We are the first work targeting acceleration of STM on
distribution into the ratio of likelihood, thereby estimating the
the FPGA board, and the inference results of the STM
expectation of the posterior distribution.
implemented on the FPGA are similar to the inference results
implemented by the CPU;
• We implemented the acceleration of the STM on a Xilinx Zynq
ZCU104 FPGA board, and we also found that the acceleration 2.2 Sampling-tree model with spiking
on the FPGA increases with the problem size, such as the neural network
number of model layers, the number of neurons, and other
factors; To build a general-purpose neural network for large-scale
• We demonstrate that the neural circuits we implemented on Bayesian models, the STM was proposed in the previous work (Yu
the FPGA board can be used to solve practical cognitive et al., 2019), as shown in Figure 1. As a spiking neural network
problems, such as the integration of multisensory, it can model for Bayesian inference, STM is also a probabilistic graph
also efficiently perform complex Bayesian reasoning tasks in model with an overall hierarchical structure. Each node in the graph
embedded scenarios. has a large number of neurons as sample data.
The STM is used to explain how Bayesian inference algorithms
can be implemented through neural networks in the brain, building
2 Related work large-scale Bayesian models for SNN. In contrast to other Bayesian
inference methods, the STM focuses on multiple sets of neurons
2.1 Bayesian inference with importance to achieve probabilistic inference in PGM with multiple nodes
sampling and edges. Performing neural sampling on deep tree-structured
neural circuits can transform global inference problems into local
Existing neural networks using variational-based inference inference tasks and achieve approximate inference. Furthermore,
methods such as belief propagation (BP) (Yedidia et al., 2005) since the STM does not have neural circuits specifically designed
and Monte Carlo (MC) (Nagata and Watanabe, 2008) can obtain for a specific task, it can be generalized to solve other inference
accurate inference results in some Bayesian models. However, problems. In summary, the STM is a general neural network model
most Bayesian models in the real world are more complex. that can be used for distributed large-scale Bayesian inference.
When using BP (George and Hawkins, 2009) or MCMC (Buesing In this model, the root node of the Bayesian network
et al., 2011) to implement Bayesian model inference, each or is the problem or reason that needs to be inferred in our
each group of neurons generally has to implement a different experiment, the leaf node represents the information or evidence
and complex computation in these neural networks. In addition, we receive from the outside world, and the branch nodes are
since spiking neural networks require multiple iterations to obtain the intermediate variable of the reasoning problem. From the
optimal Bayesian inference results, they are more complicated to macroscopic perspective, the STM is a probabilistic graphical
FIGURE 1
Sampling-tree model. (A) An example of the STM in spiking neural networks. (B) A tree-structured Bayesian network corresponding to the STM in (A).
FIGURE 3
The example of Bayesian network. (A) A simple Bayesian neural network model. (B) The neural network architecture of the STM for the basic network
as in (A).
low power consumption. When using the PYNQ framework, codes (PPCs) (Ma et al., 2006, 2014). According to PPCs, the
the tight coupling between PS (Processing System, i.e., ARM activities of these neurons encoding stimuli inputs, I1 , I2 , and
processor) and PL (Programmable Logic, i.e. FPGA part) can others, can be obtained neuronal activity of the root node A.
achieve better responsiveness, higher reconfigurability, and richer For the second problem, we divide it into two steps, one is the
interface functions than traditional methods. The simplicity and calculation of the posterior probability, and the other is the neural
efficiency of the Python language and the acceleration provided implementation of the posterior probability. Based on importance
by programmable logic are also fully utilized. Finally, Xilinx has sampling, we can estimate the posterior probability by the ratio
simplified and improved the design of Zynq-based products on the approximation of the likelihood function, as shown in Eq. (2).
PYNQ framework by combining a hybrid library that implements
acceleration within Python and programmable logic. This is a
significant advantage over traditional SoC approaches that cannot
P(I1 , I2 |Bi1 , Bi2 ) · P(Bi1 , Bi2 )
use programmable logic. Therefore, we implement the Bayesian P(B1 = Bi1 , B2 = Bi2 |I1 , I2 ) = R
P(I1 , I2 |B1 , B2 ) · P(B1 , B2 )dB1 , B2
neural network inference algorithm on Xilinx ZCU104 with the
help of the PYNQ framework. P(I1 , I2 |Bi1 , Bi2 )
≈P i i
.
i P(I1 , I2 |B1 , B2 )
(2)
3 System analysis
Then, for the neural implementation of posterior probability,
In this section, we first summarize the basis of our work
Shi and Griffiths (2009) have shown that divisive normalization
on implementing probabilistic inference algorithms for the brain P
E(ri / i ri ) is commonly found in the cerebral cortex by
through neural networks. We then analyze the difficulties of
neuroscience experiments, and Eq. (3) has been proved, where ri
accelerating the probabilistic inference algorithm for running
is the firing rate of the ith neuron.
neural network models and briefly describe how we address
these difficulties.
X P(I1 , I2 |Bi1 , Bi2 )
E(ri / ri ) = P i i
. (3)
3.1 Neural network implementation i i P(I1 , I2 |B1 , B2 )
In this subsection, we take the neural network shown in Next, we will describe the processes and mechanisms of
Figure 3A as an example, and we consider the following two probabilistic inference implemented in the neural network
aspects in the implementation of the neural network: First, for (adapted from Fang et al. 2019). First, for the process of
the stimulus encoding problem, it is important to know how to probabilistic inference, the neural network processes external
accomplish the activities of neurons from stimulus input. Second, stimulus inputs I1 and I2 together in a bottom-up manner, as
for the estimation of posterior probability, it is also necessary to shown in Figure 3B. Second for the process of generation, which
consider how the activities of neurons realize the estimation of is to generate sampling neurons and the opposite of the inference
posterior probability because our final inference result requires the process. Based on the generative model in Figure 3A, we can get
expectation over posterior distribution. sampling neurons Bi1 and Bi2 from P(B1 ) and P(B2 ), respectively. In
For the first problem, we convert stimulus input information other words, we can get that the sampling neurons follow B1 , B2 ∼
into the activities of neurons through probabilistic population N(0, σ 2 ).
FIGURE 4
Data interaction architecture between PS and PL, here we use m_axi interface for data transmission.
FIGURE 5
The design idea and overall computing architecture. (A) The program flow of the model on the ZCU104 board. (B) The hardware architecture of the
model.
3.2 Difficulties in designing the accelerator lot of resources, and clocks in the process of encoding, summing,
multiplying, and normalizing neurons, in which loops may also be
In this work, the communication settings between PS and PL nested. Although pipelines can be added to the loops to improve
should be considered first in the design of the accelerator. Since the parallelism of the model operation, the optimization is not
the design requires frequent data interactions during operation, satisfactory due to the large number of bases. Therefore, we propose
the selection of a suitable data interface can ensure the stability a highly parallelized structure by introducing an array division
of data transmission while improving the time required for data method to divide the array into blocks, which can further unroll
transmission. The second is the design in the PL part, the design the loop and make each loop execute independently to improve
of this part is mainly to complete the work of the FPGA, which the degree of program parallelization. In short, it is a method of
usually needs to achieve the purpose of acceleration by reducing exchanging space for time.
the Latency of the design.
For the communication setting between PS and PL, since the
BRAM in PL part is not enough to store a large amount of 4 Software and hardware
data and parameters, it is necessary to exchange data frequently optimizations
between the PL and PS parts. Therefore, in order to achieve
high-speed read/write operations for large-scale data, we use The design idea and overall architecture of this work are
the m_axi interface to realize it. Figure 4 shows the data shown in Figure 5, which consists of ARM, AXI interface, and
interaction architecture between PS and PL. The m_axi interface custom IP core designed by Vivado HLS. In the IP core part,
has independent read-and-write channels, supports burst transfer we mainly use the structure of the streaming pipeline to reduce
mode, and the potential performance can reach 17GB/s, which fully Latency and thus improve the operation speed. As mentioned in
meets our data scale and transfer speed requirements. the previous section, we use the AXI master interface provided
Furthermore, for the design of the PL part, since each node in by Xilinx for data transmission between PS and PL, and the
the model contains a large number of neurons, it will take up a prior distribution and sample data that are ready to participate
FIGURE 6
Design optimization ideas consisting of on-chip BRAM and processing elements (PE) using array division.
FIGURE 8
Hardware streaming architecture block design targeting the Soc with the m_axi interface between the PL and PS.
external information (Shams and Beierholm, 2010). The core By multiplying and linearly combining the normalized results
problem of causal inference is to calculate the probability with the synaptic weights, the posterior probability can be
of the cause, which can be expressed as the expectation calculated:
value defined on the posterior distribution. The calculation of
the posterior probability is converted into the calculation of
the prior probability and the likelihood probability through X X P(Bi , Bi , Bi , Bi |Al )
P(A = a|B1 , B2 , B3 , B4 ) = I(Al = a) P 1i 2i 3i 4i l .
importance sampling, to realize the simulation of the causal l P(B1 , B2 , B3 , B4 |A )
l i
inference process in the brain. In this experiment, we verify the
(5)
accuracy and efficiency of Bayesian inference in the STM on
the Xilinx ZCU104 FPGA board because probabilistic sampling
The results of the accuracy test are shown in Figure 9B. The
on SNNs involves a large number of probabilistic calculations
error rate of the stimulus estimation keeps decreasing as the sample
that can consume a lot of time, and the processing of the data
size increases, and when there are 2,000 sampled neurons, the error
in the inference process involves many computation-intensive
rate of stimulus estimation is already quite small. In addition, the
operations, and the CPU is not able to handle these tasks very
inference accuracy of the implementation on the FPGA is similar
quickly.
to that on the PC. Therefore, the STM we run on the FPGA board
In this paper, the validity of the model is verified from the
can guarantee the accuracy of inference.
accuracy of inference when different samples are input, and the
In terms of performance, we compare the design with
STM is modeled by the Bayesian network shown in Figure 9A.
multithreading and multiprogramming implementations on
Where B1 , B2 , B3 , and B4 represent the input stimulus in
traditional computing platforms, and the results are shown in
causal inference and A denotes the cause. The tuning curve
Table 2. It shows the processing time for each neuron sampling
of each spiking neuron can be represented as the state of the
when the number of sampled neurons is 4,000. It can be seen
variable. We suppose that the prior and conditional distributions
from the results that multithreading and multiprogramming do
are known, the distributions of these spiking neurons follow
not achieve the desired speedup but have the opposite effect. The
the prior distribution P(B1 , B2 , B3 , B4 ), and the tuning curve
possible reasons for this situation have been analyzed as follows:
of the neuron i is proportional to the likelihood distribution
(1) Multithreaded execution is not strictly parallel, and global
P(Bi1 , Bi2 , Bi3 , Bi4 |A). We can then normalize the output of Poisson
interpreter locks (GILs) can prevent parallel execution of multiple
spiking neurons through shunt inhibition and synaptic inhibition.
threads, so it may not be possible to take full advantage of multicore
Here we use yi to denote the individual firing rate of the
CPUs; (2) In terms of multiprogramming, perhaps the problem did
spiking neuron i and Y to denote the overall firing rate, and
not reach a certain size, resulting in the process creation process
then:
taking longer than the runtime. In addition, communication
between multiprocesses requires passing a large amount of sample
P(Bi , Bi , Bi , Bi |A) data, which introduces some overhead. For the above reasons, we
E(yi /Y = n) = P 1 i 2 i 3 i 4 i . (4)
i P(B1 , B2 , B3 , B4 |A)
finally considered using vectorization operations to vectorize the
FIGURE 9
Simulation of causal inference. (A) The neural network architecture of the basic Bayesian network. (B) Comparison of error rates under PC and FPGA
platforms.
TABLE 2 Results of sampling time and speed-up of each neuron in the two-layer model.
Platform
Intel i7-10700 Intel i5-12500 ARM Xilinx
Processing 2.90 GHz 2.50 GHz ZCU104
time/neural (ms)
Normal 8.556 4.315 53.814 0.389
4 13.098 6.907
10 13.355 7.578
20 13.778 8.386
50 16.085 10.631
4 394.43 278.46
8 564.03 454.81
16 948.47 844.73
sample data to reduce the number of loops and avoid the speed CPU decreases exponentially as the problem size increases when
limitations caused by nested loops. the need to shorten the inference time on the network model
From Table, we can see that vectorization is significantly faster through improvements and optimizations becomes even more
than serial execution, multithreading, and multiprogramming, important. In this section, we will use a multi-layer neural network
while the processing speed of the model on the FPGA is model to test large-scale Bayesian inference based on the sampling
significantly better than that of the PC. tree on the FPGA board. The STM is modeled by the Bayesian
network shown in Figure 10A, where I1 , I2 and I3 denote the input
stimuli in causal inference, A denotes the cause, and the rest are
5.2 Causal inference with multi-layer intermediate variables.
neural network In this simulation, we use several spiking neurons to encode
variables C1 , C2 , and C3 respectively, and the distribution of these
The simulation in the previous section verified the causal neurons follows the prior distribution P(C1 , C2 ) and P(C3 ). In
inference under a simple model. The inference speed on the addition, the tuning curves of these neurons are proportional to
FIGURE 10
Simulation of causal inference with a multi-layer neural network. (A) The Bayesian model for multi-layer network structure. (B) Comparison of error
rates under PC and FPGA platforms.
j
the distribution P(I1 , I2 |C1i , C2i ) and P(I3 |C3 ). We can obtain the FPGA is more pronounced than in the two-layer model, even more
j than doubling.
average firing rates of spiking neurons C1i , C2i , and C3 , respectively:
TABLE 3 Results of sampling time and speed-up of each neuron in the multi-layer model.
Platform
Intel i7-10700 Intel i5-12500 ARM Xilinx
Processing 2.90 GHz 2.50 GHz ZCU104
time/neural (ms)
Normal 1.103 0.635 12.75 0.024
4 1.019 0.617
10 1.006 0.617
20 1.012 0.618
50 1.012 0.618
4 1.056 0.706
8 1.113 0.762
16 1.371 1.097
FIGURE 11
Simulation of multisensory integration. (A) Left: The Bayesian model for visual-auditory-haptic integration, Right: Comparison of model inference
results and theoretical values on FPGA. (B) Left: The Bayesian model for visual-haptic integration, Right: Comparison of model inference results and
theoretical values on FPGA.
TABLE 4 Results of sampling time and speed-up of each neuron in the simulation of multisensory integration.
Platform
Intel i7-10700 Intel i5-12500 ARM Xilinx
Processing 2.90 GHz 2.50 GHz ZCU104
time/neural (ms)
Normal 7.632 5.169 94.608 0.328
In our simulation, multisensory integration inference is ASIC cannot be easily changed once it is completed. In contrast,
achieved through neural circuits based on PPC and normalization. FPGAs are programmable hardware that can be changed at any
We use 1,000 spiking neurons to encode stimuli whose states follow time according to demand without having to remanufacture the
the prior distribution P(S). We suppose that the tuning curve of hardware, and this flexibility is the reason why we ultimately chose
the neuron i is proportional to the distribution P(SV , SH , SA |Si ), FPGAs. FPGA is a compromise between the above two platforms,
and then use shunting inhibition and synaptic depression to make although some aspects of the performance is not up to the two, but
the output of spiking neurons normalized, the result will be fed it is a combination of the advantages of the two. It also provides
into the next spiking neuron with synaptic weights I(Si = s). reasonable cost, low power consumption, and reconfigurability for
Figure 11A shows the simulation results, where the inference result neuromorphic computing acceleration.
obtained from the STM on the FPGA board is in good agreement Thirdly, The experimental results and data on causal inference
with the theoretical values. Similar to the visual-auditory-haptic validate our conclusion: in the two-layer model, we can then see
integration, we also add a simulation of visual-haptic integration that the inference accuracy of the implementation on the FPGA
to improve the completeness, which is illustrated in Figure 11B. can approximate that of the implementation on the CPU, with an
Furthermore, the performance comparison is shown in Table 4, accuracy of up to 98%, and at the same time achieve a multifold
which shows a significant improvement in the sampling speed of speedup. The acceleration effect becomes more and more obvious
each neuron on the FPGA. Since the results of multi-threading and as the problem size increases, which is proved in the multi-layer
multi-process experiments were not ideal in previous experiments, model, and from the results we can see that the acceleration effect
only vectorization methods are compared here. The results also in the multi-layer model is more than twice as much as that in the
show that the running speed on FPGA is still better than that two-layer model. Moreover, in the experiments on multisensory
on CPU. integration, the experimental results also demonstrate that our
design implementation can be used for other real-world cognitive
problems while guaranteeing the accuracy of reasoning and the
6 Conclusion acceleration effect.
Finally, the hardware acceleration method proposed in the
In this work, we design an FPGA-based hardware accelerator paper can simulate the working principle of biological neurons
for PGM-based SNNs with the help of the PYNQ framework. very well. Meanwhile, due to the characteristics of low power
Firstly, the STM, as a novel SNN simulation model for causal consumption and real-time response of FPGA, this method can
inference, can convert a global complex inference problem into have a wide range of applications in the embedded field. The
a local simple inference problem, thus realizing high-precision realized causal inference problems can be used in policy evaluation,
approximate inference. Furthermore, as a generalized neural financial decision-making and other fields, and the multisensory
network model, the STM does not formulate a neural network for a integration can be used in vehicle environment perception, medical
specific task and thus can be generalized to other problems. Our diagnosis and other fields. Specifically, in application scenarios
hardware implementation is based on this solid and innovative such as smart home application environments, causal inference
theoretical model, which solves the problem of slow model can be used to achieve reasoning about factors affecting health
computation based on its realization of large-scale multi-layer and provide personalized health advice. Sensory cues such as
complex model inference. vision and hearing are combined to provide a better perceive
Secondly, As the first work to realize the hardware acceleration the home environment and thus provide intelligent control. Our
of the STM, we chose the FPGA platform as the acceleration work provides a solution for such application scenarios and these
platform of the model. For CPUs and GPUs, both of them need to practical applications are expected to promote the progress of
go through operations such as fetching instructions, decoding, and the neuromorphic computing field and make it better meet the
various branch logic jumps, and the energy consumption of GPUs practical application requirements. In addition, so far the STM
is too high. In contrast, the function of each logic unit of an FPGA does not consider learning, which is an important aspect of
is determined at the time of reprogramming and does not require adaptation between tasks. All the results of our simulations are
these instruction operations, so FPGAs can enjoy lower latency based on inference with known prior probabilities and conditional
and energy consumption. Compared to hardware platform ASICs, probabilities. Therefore, in future work, we need to combine
FPGAs are more flexible. Although ASICs are superior to FPGAs in learning and inference into one framework and introduce some
terms of throughput, latency, and power consumption, their high learning mechanisms to make the model more complete and
cost and long cycle time cannot be ignored, and the design of an flexible for multiple tasks.
References
Alais, D., and Burr, D. (2019). “Cue combination within a Bayesian Fan, H., Ferianc, M., Rodrigues, M., Zhou, H., Niu, X., Luk, W., et al. (2021).
framework," in Multisensory Processes (New York, NY: Springer), 9–31. “High-performance FPGA-based accelerator for Bayesian neural networks," in 2021
doi: 10.1007/978-3-030-10461-0_2 58th ACM/IEEE Design Automation Conference (San Francisco, CA: IEEE), 1063–1068.
doi: 10.1109/DAC18074.2021.9586137
Awano, H., and Hashimoto, M. (2020). “BYNQNET: Bayesian neural network with
quadratic activations for sampling-free uncertainty estimation on FPGA," in 2020 Fang, H., Mei, Z., Shrestha, A., Zhao, Z., Li, Y., Qiu, Q., et al. (2020).
Design, Automation and Test in Europe Conference and Exhibition (Grenoble: IEEE), “Encoding, model, and architecture: systematic optimization for spiking
1402–1407. doi: 10.23919/DATE48585.2020.9116302 neural network in FPGAs," in Proceedings of the 39th International
Conference on Computer-Aided Design (IEEE), 1–9. doi: 10.1145/3400302.34
Awano, H., and Hashimoto, M. (2023). B2N2: resource efficient Bayesian 15608
neural network accelerator using Bernoulli sampler on FPGA. Integration 89, 1–8.
doi: 10.1016/j.vlsi.2022.11.005 Fang, Y., Yu, Z., Liu, J. K., and Chen, F. (2019). A unified
neural circuit of causal inference and multisensory integration.
Bialek, W., Rieke, F., van Steveninck, R., and Warland, D. (1999). Spikes: Exploring Neurocomputing 358, 355–368. doi: 10.1016/j.neucom.2019.
the Neural Code (Computational Neuroscience). Cambridge, MA: The MIT Press. 05.067
Buesing, L., Bill, J., Nessler, B., and Maass, W. (2011). Neural dynamics as sampling: Ferianc, M., Que, Z., Fan, H., Luk, W., and Rodrigues, M. (2021). “Optimizing
a model for stochastic computation in recurrent networks of spiking neurons. PLoS Bayesian recurrent neural networks on an FPGA-based accelerator," in 2021
Comput. Biol. 7, 188–200. doi: 10.1371/journal.pcbi.1002211 International Conference on Field-Programmable Technology (Auckland: IEEE), 1–10.
Cai, R., Ren, A., Liu, N., Ding, C., Wang, L., Qian, X., et al. (2018). VIBNN: doi: 10.1109/ICFPT52863.2021.9609847
hardware acceleration of Bayesian neural networks. ACM SIGPLAN Notices 53, Gallego, G., Delbruck, T., Orchard, G., Bartolozzi, C., Taba, B., Censi, A., et al.
476–488. doi: 10.1145/3296957.3173212 (2022). Event-based vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44,
Chandrasekaran, C. (2017). Computational principles and models of multisensory 154–180. doi: 10.1109/TPAMI.2020.3008413
integration. Curr. Opin. Neurobiol. 43, 25–34. doi: 10.1016/j.conb.2016.11.002 George, D., and Hawkins, J. (2009). Towards a mathematical theory of cortical
Christensen, D. V., Dittmann, R., Linares-Barranco, B., Sebastian, A., et al. (2022). micro-circuits. PLoS Comput. Biol. 5, e1000532. doi: 10.1371/journal.pcbi.1000532
2022 roadmap on neuromorphic computing and engineering. Neuromorphic Comput. Han, J., Li, Z., Zheng, W., and Zhang, Y. (2020). Hardware implementation
Eng. 2, 022501. doi: 10.1088/2634-4386/ac4a83 of spiking neural networks on FPGA. Tsinghua Sci. Technol. 25, 479–486.
Demis, H., Dharshan, K., Christopher, S., and Matthew, B. (2017). doi: 10.26599/TST.2019.9010019
Neuroscience-inspired artificial intelligence. Neuron 95, 245–258.
Ju, X., Fang, B., Yan, R., Xu, X., and Tang, H. (2020). An FPGA implementation of
doi: 10.1016/j.neuron.2017.06.011
deep spiking neural networks for low-power and fast classification. Neural Comput. 32,
Ernst, M. O., and Banks, M. S. (2002). Humans integrate visual and 182–204. doi: 10.1162/neco_a_01245
haptic information in a statistically optimal fashion. Nature 415, 429–433.
Kim, J., Koo, J., Kim, T., and Kim, J.-J. (2018). Efficient synapse memory
doi: 10.1038/415429a
structure for reconfigurable digital neuromorphic hardware. Front. Neurosci. 12, 829.
Fan, H., Ferianc, M., Que, Z., Liu, S., Niu, X., Rodrigues, M. R., et doi: 10.3389/fnins.2018.00829
al. (2022). FPGA-based acceleration for Bayesian convolutional neural
Körding, K. P., and Wolpert, D. M. (2004). Bayesian integration in sensorimotor
networks. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 41, 5343–5356.
learning. Nature 427, 244–247. doi: 10.1038/nature02169
doi: 10.1109/TCAD.2022.3160948
Liu, K., Cui, X., Zhong, Y., Kuang, Y., Wang, Y., Tang, H., et al. (2019). A hardware Wang, D. (2022). Design and Implementation of FPGA-based Hardware Accelerator
implementation of SNN-based spatio-temporal memory model. Front. Neurosci. 13, for Bayesian Confidence [Master’s Thesis]. Turku: The University of Turku Quality.
835. doi: 10.3389/fnins.2019.00835
Wang, D., Xu, J., Li, F., Zhang, L., Cao, C., Stathis, D., et al.
Liu, L., Wang, D., Wang, Y., Lansner, A., Hemani, A., Yang, Y., et al. (2020). (2023). A memristor-based learning engine for synaptic trace-based
A “FPGA-based hardware accelerator for Bayesian confidence propagation neural online learning. IEEE Trans. Biomed. Circuits Syst. 17, 1153–1165.
network," in 2020 IEEE Nordic Circuits and Systems Conference (Oslo: IEEE), 1–6. doi: 10.1109/TBCAS.2023.3291021
doi: 10.1109/NorCAS51424.2020.9265129
Wozny, D. R., Beierholm, U. R., and Shams, L. (2008). Human trimodal perception
Ma, D., Shen, J., Gu, Z., Zhang, M., Zhu, X., Xu, X., et al. (2017). Darwin: a follows optimal statistical inference. J. Vis. 8, 24. doi: 10.1167/8.3.24
neuromorphic hardware co-processor based on spiking neural networks. J. Syst. Archit.
Xu, Q., Shen, J., Ran, X., Tang, H., Pan, G., Liu, J. K., et al. (2022). Robust transcoding
77, 43–51. doi: 10.1016/j.sysarc.2017.01.003
sensory information with neural spikes. IEEE Trans. Neural Netw. Learn. Syst. 33,
Ma, W., Beck, J. M., Latham, P. E., and Pouget, A. (2006). Bayesian inference with 1935–1946. doi: 10.1109/TNNLS.2021.3107449
probabilistic population codes. Nat. Neurosci. 9, 1432–1438. doi: 10.1038/nn1790
Yang, Z., Guo, S., Fang, Y., and Liu, J. K. (2022). “Biologically plausible variational
Ma, W., and Jazayeri, M. (2014). Neural coding of uncertainty and probability. Ann. policy gradient with spiking recurrent winner-take-all networks," in 33rd British
Rev. Neurosci. 37, 205–220. doi: 10.1146/annurev-neuro-071013-014017 Machine Vision Conference 2022 (London: BMVA Press), 21–24.
Maass, W. (1997). Networks of spiking neurons: the third generation of neural Yedidia, J. S., Freeman, W. T., and Weiss, Y. (2005). Constructing free-energy
network models. Neural Netw. 10, 1659–1671. doi: 10.1016/S0893-6080(97)00011-7 approximations and generalized belief propagation algorithms. IEEE Trans. Inf. Theory
51, 2282–2312. doi: 10.1109/TIT.2005.850085
Nagata, K., and Watanabe, S. (2008). Exchange Monte Carlo sampling from
Bayesian posterior for singular learning machines. IEEE Trans. Neural Netw. 19, Yu, Z., Chen, F., and Liu, J. K. (2019). Sampling-tree model: efficient
1253–1266. doi: 10.1109/TNN.2008.2000202 implementation of distributed Bayesian inference in neural networks. IEEE Trans.
Cogn. Dev. Syst. 12, 497–510. doi: 10.1109/TCDS.2019.2927808
Que, Z., Nakahara, H., Fan, H., Li, H., Meng, J., Tsoi, K. H., et al. (2022). “Remarn: a
reconfigurable multi-threaded multi-core accelerator for recurrent neural networks," in Yu, Z., Deng, F., Guo, S., Yan, Q., Liu, J. K., Chen, F., et al. (2018a). Emergent
ACM Transactions on Reconfigurable Technology and Systems (New York, NY: ACM). inference of hidden Markov models in spiking winner-take-all neural networks. IEEE
doi: 10.1145/3534969 Trans. Cybern. 50, 1347–1354. doi: 10.1109/TCYB.2018.2871144
Shams, L., and Beierholm, U. R. (2010). Causal inference in perception. Trends Yu, Z., Liu, J. K., Jia, S., Zhang, Y., Zheng, Y., Tian, Y., et al. (2020). Toward the next
Cogn. Sci. 14, 425–432. doi: 10.1016/j.tics.2010.07.001 generation of retinal neuroprosthesis: visual computation with spikes. Engineering 6,
Shen, J., Liu, J. K., and Wang, Y. (2021). Dynamic spatiotemporal pattern 449–461. doi: 10.1016/j.eng.2020.02.004
recognition with recurrent spiking neural network. Neural Comput. 33, 2971–2995. Yu, Z., Tian, Y., Huang, T., and Liu, J. K. (2018b). Winner-take-all
doi: 10.1162/neco_a_01432 as basic probabilistic inference unit of neuronal circuits. arXiv [preprint].
Shi, L., and Griffiths, T. (2009). Neural implementation of hierarchical Bayesian 10.48550/arXiv.1808.00675
inference by importance sampling. Adv. Neural. Inf. Process Syst. 22. Zador, A., Escola, S., Richards, B., Ölveczky, B., Bengio, Y., Boahen, K., et al. (2022).
Shi, Z., Church, R. M., and Meck, W. H. (2013). Bayesian optimization of time Toward next-generation artificial intelligence: catalyzing the NeuroAI revolution.
perception. Trends Cogn. Sci. 17, 556–564. doi: 10.1016/j.tics.2013.09.009 arXiv [preprint]. doi: 10.1038/s41467-023-37180-x
Tung, C., Hou, K.-W., and Wu, C.-W. (2023). “A built-in self-calibration scheme Zhang, Y., Jia, S., Zheng, Y., Yu, Z., Tian, Y., Ma, S., et al. (2020).
for memristor-based spiking neural networks," in 2023 International VLSI Symposium Reconstruction of natural visual scenes from neural spikes with deep
on Technology, Systems and Applications (VLSI-TSA/VLSI-DAT) (HsinChu: IEEE), 1–4. neural networks. Neural Netw. 125, 19–30. doi: 10.1016/j.neunet.2020.
doi: 10.1109/VLSI-TSA/VLSI-DAT57221.2023.10134261 01.033
Tzanos, G., Kachris, C., and Soudris, D. (2019). “Hardware acceleration on Zhu, Y., Zhang, Y., Xie, X., and Huang, T. (2022). An FPGA
gaussian naive bayes machine learning algorithm," in 2019 8th International accelerator for high-speed moving objects detection and tracking with
Conference on Modern Circuits and Systems Technologies (Thessaloniki: IEEE), 1–5. a spike camera. Neural Comput. 34, 1812–1839. doi: 10.1162/neco_
doi: 10.1109/MOCAST.2019.8741875 a_01507