Physics-Informed Neural Networks For Encoding Dynamics in Real Physical Systems
Physics-Informed Neural Networks For Encoding Dynamics in Real Physical Systems
Department of Engineering
University of Cambridge
I hereby declare that except where specific reference is made to the work of others, the
contents of this dissertation are original and have not been submitted in whole or in part
for consideration for any other degree or qualification in this, or any other university. This
dissertation is my own work and contains nothing which is the outcome of work done in
collaboration with others, except as specified in the text and Acknowledgements. This
dissertation contains fewer than 15,000 words.
I would like to thank my supervisor Professor Phillip Stanley-Marbell for having faith in me
and accepting me into his group, providing academic guidance, support, and encouragement
to pursue exciting work. Phillip has been an excellent supervisor and mentor who has given
me the freedom to explore and work on topics that interest me whilst also advising me on the
most optimal paths to take to achieve positive outcomes. I would also like to thank Professor
Suhaib Fahmy for his eagerness in co-supervising my project, for the valuable discussions
we’ve had on FPGAs and hardware, and for providing me with collaboration opportunities
that I otherwise would not have had. I would like to thank James for his advice on setting up
the parallel heating experiment, Janith for his advice on machine learning, and Chatura for
his advice on helping me getting started at the early stages of my project. I would like to
thank Vasilis and Orestis for being the first people I got in touch with in the group, and for
getting me excited to pursue work here. I would like to thank Hamid and Divya for being
the friendly faces from the group who I’d frequently see in the lab/department. The custom
copper nib used as the soldering iron heat source was designed by Ady Ginn. The ethernet
switch for the parallel heating experiment was provided by Barlow McLeod.
I would like to acknowledge the Saudi Arabian Cultural Bureau for funding my studies
here at Cambridge.
Special thanks to the department crew — Ismail, Alkausil, Ibtisam, Sohail, and Adil for
the chill conversations in the department and for the banter over Wednesday cakes. Thank
you to Yunwoo for always being around to talk and have dinner with during the late working
hours in the department. Special thanks to Faris for our daily lunches and for putting up
with me everyday. Big thank you to the Shbeeba/Shabashib especially Alwaleed, Hallamund,
Bunyamin, Marwan, Abdulkarim, Moez, Muheeb, Radwan, Ahmad, and Ahmad. You guys
made my experience here in Cambridge special so thank you bros. Big special thank you to
Omar, my brother away from home. Thank you for being there for the good and hard times,
God knows we’ve struggled through our degrees together. Thank you to Anas, my KFUPM
roommate and now brother-in-law for always asking about me, for being there to talk to, and
for helping me stay grounded. You’ve always been by my side as a dear friend, and now
you’re family. Thank you to my oldest friend Ahmad for always being there for me despite
vi
the distance, and for our late night conversations. God knows you’ve always been there for
me from the very beginning brother.
Thank you to my sister Jumana and my brothers Mohammed and Faisal, for your
continuous support and encouragement through all of this and for always praying for my
success.
Most of all, thank you to my parents for giving me everything I have in my life, for
your consistent support, for the prayers and encouragement, and for your endless love. This
dissertation is dedicated to you and I hope I can always make you proud.
Most importantly and before all of this, praise be to God, the Most Gracious, Most
Merciful, and Most Benificent. All my successes, blessings, and good fortunes are from Him,
and without His mercy, guidance, and support I am nothing.
Abstract
Predictive data-driven models are gaining widespread attention and are being deployed in
embedded systems within physical environments across a wide variety of modern technolo-
gies such as robotics, autonomous vehicles, smart manufacturing, and industrial controllers.
However, these models have no notion or awareness of the underlying physical principles
that govern the dynamics of the physical systems that they exist within. This dissertation
studies the encoding of governing differential equations that explain system dynamics, within
predictive models that are to be deployed within real physical systems. Based on this, we
investigate physics-informed neural networks (PINNs) as candidate models for encoding
governing equations, and assess their performance on experimental data from two different
systems. The first system is a simple nonlinear pendulum, and the second is 2D heat diffusion
across the surface of a metal block. We show that for the pendulum system the PINNs
outperformed equivalent uninformed neural networks (NNs) in the ideal data case, with
accuracy improvements of 18× and 6× for 10 linearly-spaced and 10 uniformly-distributed
random training points respectively. In similar test cases with real data collected from an
experiment, PINNs outperformed NNs with 9.3× and 9.1× accuracy improvements for
67 linearly-spaced and uniformly-distributed random points respectively. For the 2D heat
diffusion, we show that both PINNs and NNs do not fare very well in reconstructing the
heating regime due to difficulties in optimizing the network parameters over a large domain
in both time and space. We highlight that data denoising and smoothing, reducing the size
of the optimization problem, and using LBFGS as the optimizer are all ways to improve
the accuracy of the predicted solution for both PINNs and NNs. Additionally, we address
the viability of deploying physics-informed models within physical systems, and we choose
FPGAs as the compute substrate for deployment. In light of this, we perform our experiments
using a PYNQ-Z1 FPGA and identify issues related to time-coherent sensing and spatial data
alignment. We discuss the insights gained from this work and list future work items based on
the proposed architecture for the system that our methods work to develop.
Table of contents
Nomenclature xviii
7 Conclusion 77
References 79
List of figures
2.1 Example of a PINN architecture based on the 2D heat equation using trainable
parameters θn . The left dashed box shows the neural network which predicts
the value of u given the training points to produce the data loss term. The
right dashed box shows the PDE residual corresponding to the heat equation,
composed from the differential terms. The differential terms are obtained
using automatic differentiation. The PDE residual forms the physics loss,
which is the distinguishing component of PINNs. . . . . . . . . . . . . . . 9
3.7 PINN and NN predictions on the synthetic data pendulum using 10 training
points for uniformly-distributed random data. The NN is trained for 150
iterations — its final state before predictions became unstable. The PINN is
trained for 2000 iterations. The PINN maintains a reasonable fit of the data
while the NN struggles due to the data’s irregularity. . . . . . . . . . . . . . 26
3.8 NN prediction when trained with 1000 adjacent points. The NN fails to
extrapolate the accurately predicted solution on the training points to the last
500 test points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.9 PINN predicted solution based on the first 5 points of the numerical solution.
The PINN consists of two hidden layers with 12 units in the first and 9 in
the second — corresponding to the 9th entry in Table 3.4a. Remarkably, the
PINN is able to accurately predict the solution despite being trained with
only the first 5 points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.10 PINN vs NN predictions on 100 linearly-spaced points with added Gaussian
noise with a mean of 0 and a standard deviation of 0.5. The PINN and NN
solutions are similar, although the PINN is slightly less impacted by the noise. 30
3.11 AXI IIC block design that we use for our experiments. The ZYNQ7 process-
ing system is the operating system side of the system-on-chip (SoC) FPGA,
on which the Python layer runs. The AXI IIC block is the direct interface
with the sensors through I2C. We configure the I2C clock frequency to be
1000 KHz. The AXI IIC block interfaces with the processing system through
the AXI interconnect block. Read and write commands are issued to the AXI
interconnect through the Python driver API. . . . . . . . . . . . . . . . . . 31
3.12 Pendulum oscillation data captured from the experimental setup shown in
Figure 3.13. The data has a much higher frequency than the numerically-
generated solution in Figure 3.3, although the sinusoidal nature is similar
enough for making comparisons. . . . . . . . . . . . . . . . . . . . . . . . 32
3.13 Experimental setup for the pendulum system. The BNO055 [90] is attached
to the pendulum mass. The PYNQ-Z1 board (on top of the desktop computer)
interfaces with the BNO055 through I2C [79]. . . . . . . . . . . . . . . . . 33
3.14 PINN predictions over the entire domain of the sampled training data. The
PINN fails to arrive at a valid solution due to the difficulty of optimizing
over a large domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.15 NN predictions over the entire domain of the sampled training data. Similarly
to the PINN the NN also fails to converge, although it is more flexible in its
predictive capability due to not being constrained by the physics loss term. . 35
xiv LIST OF FIGURES
3.16 NN predictions over the entire domain of the sampled training data, after
34380 iterations. The main difference from the predictions shown in Fig-
ure 3.15 is that we do not enforce any termination conditions, and instead
allow the training to run indefinitely. . . . . . . . . . . . . . . . . . . . . . 36
3.17 NN predictions based on training with 50 linearly-spaced points. The NN
solution misses the majority of the sinusoids as it is only able to fit data. . . 39
3.18 PINN predictions based on training with 50 linearly-spaced points. In con-
trast to the NN prediction in Figure 3.17, the PINN is able to capture the
sinusoids correctly due to physics loss term. . . . . . . . . . . . . . . . . . 40
3.19 NN predictions based on training with 50 uniformly-distributed points. The
NN fails to make reasonable predictions in areas with no training points. . . 40
3.20 PINN predictions based on training with 50 uniformly-distributed points.
The PINN maintains the trend of making valid predictions. . . . . . . . . . 41
3.21 NN and PINN RMSE values for linearly-spaced and uniformly-distributed
data, reported in Table 3.7. The PINN maintains a constant accuracy irre-
spective of the number of training points, while the NN fails as the points
get less. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.22 NN predictions based on adjacent points that comprise 40% of the problem
domain. The predictions become unstable outside of the training data region. 42
3.23 PINN predictions based on adjacent points that comprise 40% of the problem
domain. The predictions maintain stability just outside of the training data,
but fail to extrapolate for the rest of the domain. . . . . . . . . . . . . . . . 42
4.1 MLX90640 pixel RAM chess reading pattern configuration, borrowed from
the datasheet [71]. The highlighted cells correspond to a subpage and we
read one with each I2C transaction. The subpages get updated with new data
after each read. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Custom copper tip to improve surface contact for conduction. . . . . . . . . 46
4.3 Block heating experimental setup. The MLX90640 is held with an alligator
clip attached to a flexible helping hand. The custom solder tip is inserted
into the block from the side into a hole so that it fits in place during heating.
The sensor is directly connected to the FPGA. . . . . . . . . . . . . . . . . 47
4.4 Converted temperature measurements over time for 8 randomly-selected
pixels. The measurements are noisy and are also dominated by noise spikes. 48
4.5 3D plots of the heating profile at different instances in time. The plots show
valid temperature gradients over time. . . . . . . . . . . . . . . . . . . . . 49
LIST OF FIGURES xv
4.6 Frame visualisation of one of the temperature spikes shown in Figure 4.4.
These correspond to instances when the camera fails to capture valid frames. 50
4.7 Spike-filtered temperature time-series for four randomly-chosen pixels. The
regions near 340 and 470 seconds displayed high concentrations of spikes
across the pixels so were filtered out completely. . . . . . . . . . . . . . . . 52
4.8 Data for 4 random pixels after applying the Savitzky-Golay filter [93]. The
data retains small amounts of noise, although most of it has been smoothed out. 53
4.9 RMSE graph for a 2-layer 32-unit PINN and NN trained with Adam [52].
The training data is shown based on the raw and denoised data, and is
sampled from the full 768-pixel frames. Denoising certainly helps achieve
better training results, although there is a negligible difference between PINN
and NN training performance. . . . . . . . . . . . . . . . . . . . . . . . . 54
4.10 Reduced size frames at t = 433.43. Smaller frames are easier to train with
than larger ones. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.11 Reduced frame vs full frame training comparison. The minimum RMSE
is 23.67. Both denoising and reducing the frame size improve the training
performance, although the training accuracy remains to be satisfactory. . . . 56
4.12 LBFGS training evaluation for a 3-layer network with 64-32-32 units. The
minimum RMSE is 9.42. Using LBFGS improves training performance
considerably. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.13 Comparison between the test data and the predictions of NNs and PINNs
after training with 832 points. LS denotes linearly-spaced data and UD
denotes uniformly-distributed random data. i refers to the time sample index.
The NNs capture slight heating gradients in space, whereas the PINNs predict
almost constant temperatures for specific frames. . . . . . . . . . . . . . . 58
5.1 Experimental setup for the parallel heating experiment. We use 5 magnetic
alligator clamps to hold 5 MLX90640 thermal cameras, which are connected
to 5 PYNQ-Z1 FPGA boards. The cameras are pointed at the block so that
the block surface takes up the most area in the camera FOVs. . . . . . . . . 63
5.2 Time sample difference over the experiment duration for different cameras.
The time coherence between each camera reduces over time, and in the worst
case the difference is 6 seconds (cameras 2 and 4). . . . . . . . . . . . . . . 64
5.3 A comparison of the rectangular patch which we attribute as focusing in on
the same area between the two cameras. The temperature ranges and the
temperature grid are visually similar. . . . . . . . . . . . . . . . . . . . . . 66
5.4 Frame visualisations at time sample 500. . . . . . . . . . . . . . . . . . . . 67
xvi LIST OF FIGURES
6.1 Architectural diagram for our proposed system. A user encodes the differen-
tial equation for a system using a description language such as Newton [64].
A back-end compiler performs static analysis on the Newton description
to generate a PINN architecture, which can be trained offline using experi-
mental measurements taken from the system. The trained PINN can then be
synthesized onto an FPGA using high-level synthesis (HLS) tools. Finally,
the user can then integrate the FPGA with the synthesized model into the
system for real-time inference, and by extension control. . . . . . . . . . . 75
List of tables
3.1 PINN RMSE values for different variations of hidden layers and units in
each layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 NN RMSE values for different variations of hidden layers and units in each
layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 RMSE values for variations of numbers of variably-spaced training points. . 25
3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 RMSE values based on varying sizes of the domain. The time column is the
size of the domain in seconds. Nd and Nt are the number of train and test
points respectively. b is the learned value of the friction coefficient. For the
NN and PINN entries we report the RMSE on the first line and the iteration
number on the second line. We stop the training early if we observe the
RMSE value remaining constant for an extended number of iterations. . . . 35
3.6 PINN prediction RMSE values for the last three domain proportions shown
in Table 3.5, but with less data. Decreasing the amount of data enables more
accurate PINN models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.7 RMSE values for variations of numbers of training points. The PINN predic-
tions for both cases stay relatively consistent, whereas the NN predictions
fail as the number of points decreases. . . . . . . . . . . . . . . . . . . . . 38
3.8 RMSE values for percentages of adjacent points starting from t = 0. Both
the PINN and NN fail at predicting accurate solutions but the NN fails harder. 38
4.1 RMSE values based on varying the frame size. The first line in the NN and
PINN entries corresponds to the RMSE, and the second line corresponds to
the iteration number. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 RMSE values based on a variation of the number of linearly-spaced frames.
Nfr is the number of frames and Nd is the corresponding number of points. . 59
4.3 RMSE values based on a variation of the number of uniformly-distributed
points Nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Nomenclature
Roman Symbols
a Linear acceleration
b Damping coefficient
c Specific heat
f Arbitrary function
g Gravitational acceleration
k Thermal conductivity
L Loss
L Length
m Mass
n Count specifier
S Surface
s Arc length
T Time
t Temporal variable
Greek Symbols
ρ Material density
σ Standard deviation
ϕ Angle of oscillation
Acronyms / Abbreviations
BU Blow-up
ET Early Termination
FC Fully-connected (layer)
FOV Field-of-view
IP Intellectual Property
IR Infra-red
LS Linearly-spaced
Nomenclature xxi
ML Machine Learning
NN Neural Network
PL Programmable Logic
PR Partial Reconfiguration
SoC System-on-chip
UD Uniformly-distributed
1.1 Introduction
Physical computation refers to computation that affects and is affected by physical quantities
in our natural world, such as temperature, pressure, velocity, etc. This type of computation
is prevalent within embedded systems (also referred to as cyber-physical systems) — self-
contained digital devices that are interconnected on a smaller scale than large workstations
and servers, and typically process information from real environments. Embedded systems
have become ubiquitous in modern society with computers being integrated into objects that
we interact with on a daily basis, as well as in modern applications that are increasing in
adoption such as autonomous vehicles, or digital manufacturing. However, the computers in
these technologies lack a fundamental understanding of the physical nature of the systems and
signals that they interact with. In other words, they lack computational abstractions [89] that
can be used to describe the environments that they exist within. The lack of computational
abstractions for physical systems can be classified under four different categories of relevance
to this dissertation:
2. Information on the physical dimensions and units associated with signals [108].
4. Knowledge of the physical laws and relationships that govern physical quantities [64].
2 Incorporating Physics Knowledge in Computation
We discuss these classifications in greater detail in Section 1.3. The focus of this
MPhil dissertation is on the fourth item. Specifically, we are interested in investigating and
developing methods for incorporating differential equations that govern dynamical systems,
into predictive models for the purpose of deployment within real physical systems. There
are tangible benefits to incorporating physics knowledge into predictive models deployed
at the edge. The main benefit is the exploitation of the wealth of information that can be
extracted from physical signals captured from different sensors. However, there are numerous
challenges associated with developing physics-aware compute systems. This dissertation
focuses on two of them.
The first is that it is more difficult to work with real-world data, as opposed to simulated
or idealized data, due to the aleatoric uncertainty [109] arising from noise. In a real-world
setting, noise can come from rapid fluctuations in the measurand, disturbances from the
measurement environment, or as a characteristic of the measurement instrument. The amount
of noise from each source varies depending on the system under investigation, and the effect
it has on model predictions must be considered.
The second challenge relates to the viability of deploying machine learning (ML) models
on the edge for embedded inference. ML models, and neural networks (NNs) in particular,
tend to have thousands to millions of parameters. AlexNet [56] for example has 60 million
parameters and 650,000 neurons. In most cases it is infeasible to store such large models in
embedded devices with limited amounts of memory. Additionally, the extensive amounts of
processing required for inference raises the issue of power consumption, an important factor
to consider for resource-constrained embedded systems running off of batteries.
These two challenges motivate the central research questions of this dissertation, which
are as follows:
1. How well do physics-informed models perform on data captured from real physical
setups in terms of predictive accuracy?
To address the first question, the candidate model that we assess for this dissertation are
physics-informed neural networks (PINNs) [88]. For the second question, we investigate the
use of field-programmable gate arrays (FPGAs) as compute substrates for the models that we
develop.
1.2 Dynamical Systems and Differential Equations 3
∂u ∂u ∂u
+ + ... + = f (x1 , x2 , ..., xn ) (1.2)
∂ x1 ∂ x2 ∂ xn
Finding solutions to differential equations implies finding the unknown function u in
terms of its independent variables In physical contexts it represents finding the expression
that describes the relationship between a set of physical quantities.
Differential equations can be classified as either linear or nonlinear. This classification
changes how the equation is solved. Let {a1 (x), a2 (x), ..., an (x)} be a set of coefficient
functions of x. We define the general form of a linear differential equation as follows [17]:
Specifically, it must be the case that u and all of its derivatives have a power of 1, and that
the coefficients a1 (x), a2 (x), ..., an (x), and the function f (x) depend only on the independent
variable x. On the other hand, if a differential equation does not satisfy these conditions it
is considered to be nonlinear. For given initial and boundary conditions, linear differential
4 Incorporating Physics Knowledge in Computation
equations have known methods for finding analytical solutions such as separation of variables,
integrating factors, or trial solutions. Nonlinear equations, are much more difficult to deal
with and in most cases analytical solutions do not exist for them. In this case, one can only
resort to numerical methods for finding solutions.
The following sections list examples of prominent linear and nonlinear differential
equations in one dimension. It introduces the equations with brief mention of their physical
contexts, and denotes the variable definitions. For all of the equations x and t represent
position and time.
∂ 2u 2∂ u
2
= c (1.4)
∂t 2 ∂ x2
u is the wave amplitude, and c is a constant representing the wave speed.
The diffusion equation [25] for diffusive processes such as heat conduction or Brownian
motion:
∂u ∂ 2u
=α 2 (1.5)
∂t ∂x
u is the concentration of the diffusing quantity, and α is the diffusivity coefficient.
The Laplace equation [35] for equilibrium processes and potential field distributions:
∇2 u = 0 (1.6)
∇2 u = σ (x) (1.7)
∂u
i + ∆u + |u|2 u = 0 (1.8)
∂t
1.3 Lack of Physics Understanding in Computation 5
u is the physical quantity under investigation, and i is the imaginary unit where i2 = −1.
The viscous Burgers’ equation [19], a simplified model for viscous fluid flow:
∂u ∂u ∂ 2u
+u =v 2 (1.9)
∂t ∂x ∂x
u is the fluid velocity, and v is a constant representing the diffusivity coefficient.
The Korteweg–De Vries (KdV) equation [54] for shallow water waves and some other
dispersive wave systems:
∂u ∂ u ∂ 3u
+ 6u + =0 (1.10)
∂t ∂ x ∂t 3
u is the wave amplitude/displacement.
For predictive models that are deployed within dynamical systems, there is currently
no support for programming constructs and abstractions for injecting knowledge about
governing differential equations. Developing such methods would be a significant step
towards computers that are more aware of the environments that they exist within, making
them more robust, adaptable, and reliable.
2.1 Introduction
This topic of this dissertation lies at the intersection of two emerging sub-domains: scientific
machine learning (SciML) [11] and reconfigurable computing [101]. For the first we focus
on Physics-informed Neural Networks (PINNs), and for the second on FPGAs as a compute
substrate for model deployment. The following sections provide outlines on the focus areas,
and shed light on some prominent related works.
Fig. 2.1 Example of a PINN architecture based on the 2D heat equation using trainable
parameters θn . The left dashed box shows the neural network which predicts the value of
u given the training points to produce the data loss term. The right dashed box shows the
PDE residual corresponding to the heat equation, composed from the differential terms. The
differential terms are obtained using automatic differentiation. The PDE residual forms the
physics loss, which is the distinguishing component of PINNs.
engines (GradientTape and Autograd), as well as the rapid improvement in modern compute
infrastructure for network training.
Aside from the promise of generalisability [50] that they offer through the incorporation
of physical laws, there are additional reasons that motivate our decision to investigate PINNs
as a candidate architecture. These are as follows:
1. PINNs are resilient to noise [59]. This makes them a promising choice for deployment
within physical environments which are dominated by noise from different sources.
2. PINNs can be trained with less and more sparse data [8], and in some cases no data at
all, with the exception of initial and boundary points [48]. This is advantageous when
the sensing capabilities or the amount of data that can be gathered is limited.
3. PINNs are more computationally efficient than traditional numerical solvers, such as
finite differences or finite elements, due to not requiring a computational mesh [91].
They are also often more efficient than ordinary feed-forward neural networks since
they restrict the solution space to a subset of physically-plausible ones [50].
10 Background on PINNs and FPGAs
4. PINNs are convenient to implement and flexible, offering the capability of solving
forward and inverse problems based on the sample problem formulation, and almost
the same code implementation [27].
ut + N (u) = 0 (2.2)
ft + N ( f ) = 0 (2.3)
The network loss L for a PINN can be found using the following equation:
L(θ ) = Ld (θ ) + L p (θ ) (2.4)
Ld is the data loss, which optimises to fit a set of data points that correspond to the true
solution, usually at the initial or boundary conditions. It is traditionally the singular loss
term that is used in neural networks. L p is the physics loss which places a soft constraint
on the network optimisation to obey the governing equation, and is consequently comprised
of the equation’s differential terms. Let {xd , td } be a set of Nd data input points for a set of
known output values {ud }. Additionally, let {x p , t p } be a set of collocation points within the
problem domain that are used to evaluate L p . Therefore, Ld and L p are mean squared error
losses denoted by Equations 2.5 and 2.6.
1 Nd
Ld (θ ) = ∑ | f (xdi ,tdi ; θ ) − uid |2 (2.5)
Nd i=1
2.2 Physics-informed Neural Networks 11
N
1 p
L p (θ ) = ∑ | ft (xip,t pi ; θ ) − N ( f (xip,t pi ; θ ))|2
N p i=1
(2.6)
The squared terms in Equation 2.6 correspond to the left-hand side of Equation 2.3. The
differential terms in Equation 2.6 are computed using automatic differentiation.
1. Using a variational autoencoder (VAE) [53] to find the physical factors that relate to
lunar thermodynamics from temperature measurements of the moon’s surface.
4. Investigating the scalability of PINNs to problems with large domains and high-
frequency components.
For these tasks, Moseley shows that the fine-tuned PIML techniques generally perform
well in terms of their ability to learn physical processes and solve complicated scientific
problems. There are still challenges, especially relating to scaling the methods to physical
systems with high-frequencies. To alleviate this issue, Moseley et al. proposes finite-basis
PINNs (FBPINN) [76], a domain decomposition approach for solving large-scale differential
equation problems.
1. Design of high throughput architectures, due to the inherent parallelism from synthe-
sizing processing elements across the space of the FPGA’s hardware resources.
2. Reconfigurability which allows for rapid design prototyping, enables future hardware
design updates, and provides the feature of switching between hardware applications
(through partial reconfiguration [105]).
3. Low processing latency from running directly on hardware, rather than passing through
software abstraction levels such as an operating system.
In recent years there has been a shift away from general-purpose computing towards
domain-specific architectures [83], with highlight examples such as Google’s Tensor Pro-
cessing Unit (TPU) for accelerating deep neural networks (DNNs) [49]. This is mainly
2.3 FPGAs and Accelerator Architectures 13
attributed to the struggle of modern transistors to keep up with Moore’s law [33] and the
end of conventional Dennard Scaling [57]. Therefore, the new approach is to design acceler-
ator architectures for performing specialised tasks rather than relying on general-purpose
CPUs. FPGAs thrive in this new specialisation-focused compute paradigm, and are therefore
becoming more mainstream both in research and in commercial settings.
The following sections highlight two techniques for optimizing NN architectures for
FPGAs: quantization and weight pruning. The final section presents FINN [16], an end-to-
end framework for NN deployment on FPGAs.
Quantization
Quantization aims at reducing the size of the computation units, as well as the memory
and bandwidth requirements by narrowing the bit-width of the data, i.e. the weights and
activations. The trade-off here is between the accuracy degradation due to the loss of
precision, and the performance gain due to the quantization scheme.
Quantization is used in conjunction with fixed-point rather than floating-point data repre-
sentations. Quantized NN architectures often use fully 16-bit layers [40, 95, 104, 110, 112],
fully 8-bit layers [32, 41, 46], or a mix of 8-bit and 16-bit layers [66, 68, 99]. Additionally,
many accelerators [62, 77, 78, 113] implement Binarized NN (BNN) architectures with 1-bit
representations for all of the layers [45].
Qiu et al. introduce a dynamic approach to quantization where different layers and
feature map sets can have different fractional bit-lengths, based on an optimal quantization
configuration strategy [86]. However, for a given configuration the bit-widths are fixed, and
in their experiments they use 16-bit, 8-bit, and a mix of 8 and 4 bits for the weights in the
convolutional (CONV) and fully-connected (FC) layers respectively. They show that using
dynamic precision with 8-bits can restore the top-1 and top-5 accuracies to values marginally
less than the single-precision floating-point benchmark (1.52% loss for top-1, and 0.62% for
top-5 accuracies), as opposed to static precision 8-bits which suffered from high accuracy
degradation.
Abd El-Maksoud et al. use a 4-bit weight quantization in their GoogLeNet FPGA
accelerator [34]. They use incremental network quantization (INQ) [114], a post-training
quantization method which is partitions the weights into two groups, one to be quantized and
the other to be retrained. The two groups are switched afterwards. This is done iteratively
until the accuracy requirement is met. Abd El-Maksoud et al. show that using INQ in addition
to weights pruning allows them to reduce the CNN model by 57.6x, with their accelerator
achieving a classification rate of 25.1 FPS with 3.92 W of power [34].
Weight pruning
Denil et al. have shown that NN models are over-parameterized, and that in many cases only
a few of the weights are required to predict all the rest [30]. For NN models that are to be
deployed in embedded systems, this redundancy results in a waste of storage and computation
2.3 FPGAs and Accelerator Architectures 15
FINN framework
3.1 Introduction
This chapter investigates the predictive capability of PINNs for a simple pendulum system.
First it introduces the dynamics of the nonlinear pendulum. Then it covers tests on the
predictive performance on an idealized version of the system using numerically-generated
data. This idealized case acts as a reference for the best-case accuracy. Then it outlines a real
experimental setup of the system, and discusses results obtained from it. For both of these
cases a PINN is benchmarked against a standard uninformed NN. Finally it closes with a
discussion on the results and insights gained from them.
F = ma
−mg sin ϕ = m a
a = −g sin ϕ (3.1)
The negative sign in Equation 3.1 indicates that the pendulum is decelerating as it moves
towards the top of the arc. By using the equation for the arc length s we get:
s = Lϕ
d2s
a=
dt 2
d2ϕ
a=L (3.2)
dt 2
Substituting Equation 3.2 into Equation 3.1 gives us the differential equation for the
simple nonlinear pendulum, Equation 3.3:
18 Predicting the Oscillation Angle of a Swinging Pendulum
d2ϕ
L = −g sin ϕ
dt 2
d2ϕ g
+ sin ϕ = 0 (3.3)
dt 2 L
ϕ̇i+1 − ϕ̇i g
= − sin ϕi
∆t L
g
ϕ̇i+1 = ϕ̇i − sin ϕi ∆t (3.4)
L
Equation 3.4 is the approximate discretized solution for the angular velocity. A more
exact expression would necessitate that we take the sine of ϕi+1 , but at this stage we would
not have computed it yet. Fortunately for a small enough ∆t, ϕi+1 ≈ ϕi .
To get the angular displacement, we follow a similar approach based on discretization of
the angular velocity:
3.3 Ideal Pendulum Simulation 19
ϕi+1 − ϕi
ϕ̇i+1 =
∆t
Equation 3.5 is the approximate discretized equation for the angular displacement. We
apply the Euler-Cromer method [26] through the usage of ϕ̇i+1 in Equation 3.5 instead of ϕ̇i ,
which allows the solution to maintain energy conservation.
To generate a solution that is similar to a real experiment, we generate 1500 linearly
spaced time points in the interval [0, 6] (∆t = 0.004 s), set the initial conditions to be
ϕ1 = − π2 rad and ϕ̇1 = 0 rad/s, and set g = 9.8 m/s2 and L = 0.325 m — the length of the
rod that we use for our experiment in Section 3.4. Figure 3.2 shows the numerical solution of
the angular displacement based on this setup.
Fig. 3.2 Numerical solution of a pendulum system generated using Equations 3.4 and 3.5.
The next step is to account for air resistance. The exact amount of air resistance acting
on the pendulum mass and the rod is dependant on many factors such as speed, surface
roughness, air density, and the object’s geometry. In our case, we use a simple air resistance
20 Predicting the Oscillation Angle of a Swinging Pendulum
model where we assume that the drag force is linearly proportional to the object’s speed with
a constant of proportionality b. Therefore the model becomes:
d2ϕ dϕ g
2
+b + sin ϕ = 0 (3.6)
dt dt L
Additionally, Equation 3.4 becomes:
g
ϕ̇i+1 = ϕ̇i − b ϕ̇i − sin ϕi ∆t (3.7)
L
We arbitrarily take b to be 0.001 since we do not have an exact value to rely on. This
gives us the data that we will use for training, shown in Figure 3.3.
Fig. 3.3 Numerical solution of a pendulum system generated using Equations 3.7 and 3.5,
taking air resistance into consideration. A more realistic solution would consider a smaller
amount of damping over a longer interval, but for our purposes this solution is sufficient.
N
1 Nd i 2 λp p i g
L= ∑ | f ϕ − ϕd | + ∑ |ϕ̈ f + b ϕ̇ if + sinϕ if |2 (3.8)
Nd i=1 N p i=1 L
We compare PINNs against ordinary NNs that do not include the physics loss component.
For the NN, we eliminate the second term in Equation 3.8. The code implementation
we develop for our evaluations is based partly on open-source repositories developed by
Moseley [74] and Bhustali [13], although we adapt the implementation to suit our particular
problem. For all our training cases we use a multi-layer perceptron (MLP) architecture [15]
with different network parameters, and we assess the predictive performance based on a
variation of these parameters. We run the training on a workstation running an Intel i7-7820X
16-core CPU, and an NVIDIA Quadro P1000 4 GB GPU.
Based on tests with different activation functions, we found that the sine function per-
formed the best for the pendulum system and so we use it for all of our training cases. Addi-
tionally, we found that the limited-memory Broyden–Fletcher–Goldfarb–Shanno (LBFGS)
algorithm [65] performed best for PINNs, and so we use it as our default optimizer. This
is corroborated by the usage of LBFGS in the PINN paper by Raissi et al. [88]. We use
the default tolerance values in Pytorch for the termination, so LBFGS stops when there is
nothing left to learn based on these tolerances. We fix the learning rate at 0.01.
For the training data, we test four different training variations: linearly-spaced, uniformly-
distributed random, adjacent, and noisy data. For the first three cases we vary the number of
training points, and in the fourth we vary the amount of noise.
Figure 3.4 shows the predictions of a 3-layer PINN with 32 neurons in each layer,
given 150 linearly spaced training points (Nd = 150) and 100 linearly spaced collocation
points (N p = 100). In the best possible case, one where we have enough data available
and a sufficiently expressive network architecture, the PINN is able to predict the entire
pendulum solution with high accuracy (RMSE = 6.8 × 10−3 ). Figure 3.5 shows the test
RMSE plotted against the training iterations for the PINN and the NN. We can see that the
NN also has no trouble fitting the data due to its abundance, with an RMSE of 4.0 × 10−3 . In
the coming sections we gradually make the training setup more difficult and compare the
PINN’s performance against that of an equivalent NN.
3.3.3 Results
The first thing to determine for both a PINN and an NN, is the smallest possible network
architecture size so that we can fix it for training. Tables 3.1 and 3.2 show the test RMSE
22 Predicting the Oscillation Angle of a Swinging Pendulum
Fig. 3.4 PINN predictions on the synthetic data pendulum given 150 training points. PINN
architecture: 3 FC hidden layers with 32 neurons each. RMSE = 0.0068 as the PINN has no
trouble fitting the data given a perfect setup.
values for the PINN and NN respectively based on a variation of the number of hidden layers
and number of units in each layer, after 2000 iterations. The blow-up (BU) entries correspond
to instances where the training failed due to exploding gradients, and the early termination
(ET) entries correspond to instances where the training ends due to the termination condition
of LBFGS. The first number in those entries is the last valid training iteration before blow-up,
and the second number is the final reported RMSE value. For both the PINN and the NN, the
network maintains a high level of accuracy even with small architectures. Therefore we fix
the architecture to be 3 layers with 5 units each, to keep the network small whilst maintaining
expressiveness for more difficult training cases.
Linearly-spaced data
Table 3.3a shows the RMSE values for different numbers of linearly spaced data points based
on the training configuration. The NN maintains good accuracy down to 25 points. Lower
than that, the RMSE values begin to suffer considerably. In contrast, the PINN maintains
good accuracy even with only 5 linearly-spaced training points. Figure 3.6 shows that the
physics loss term allows the network to accurately generalise across the entire domain, whilst
the uninformed NN is only able to fit the data.
3.3 Ideal Pendulum Simulation 23
Fig. 3.5 Test RMSE values against training iterations of a PINN and an equivalent NN, given
150 linearly spaced training points. Both models converge to an accurate solution after
approximately 1250 iterations, although the PINN solution faces a spike that it overcomes
during the optimization near the 300 iteration mark.
Here we randomly sample training points from a uniform distribution. We evaluate training
in a similar fashion to the linearly-spaced data case. Table 3.3b shows the results. The NN
accuracy drops significantly below the 50 point mark, and for 10 points and less the optimizer
stops early as the network is unable to learn anything. The equivalent PINN maintains good
accuracy for 2000 iterations down to 15 points. We repeated the PINN training runs for 10
and 5 points but this time allowing them to stop based the LBFGS termination condition. We
found that for 10 points the training stops after 5161 iterations with an RMSE of 0.1056, and
for 5 points it stops at 4083 iterations with an RMSE of 0.4432. By comparing Figure 3.6
with Figure 3.7, we can see that the data irregularity degrades the accuracy of both models,
although the PINN is still able to maintain a reasonable prediction of the true solution.
Adjacent points
For adjacent training points, we take a different approach to PINN evaluation. We do not
compare against an uninformed NN, as we have already shown that they fail in data-absent
24 Predicting the Oscillation Angle of a Swinging Pendulum
Layers
1 2 3 4 5
Units
32 0.2914 0.0126 0.0066 0.0067 0.0076
16 0.6960 0.0134 0.0067 0.0067 0.0067
8 0.6987 0.0233 0.0133 0.0072 0.0070
BU: 435 BU: 300
5 0.6536 0.2190 0.0184
L: 0.7278 L: 0.7732
BU: 784
4 0.6957 0.0361 0.0086 0.2391
L: 0.6862
BU: 835
3 0.7448 0.0256 0.0286 0.0333
L: 0.6757
Table 3.1 PINN RMSE values for different variations of hidden layers and units in each layer.
Layers
1 2 3 4 5
Units
BU: 636 ET: 1914 ET: 1423
32 0.0059 0.0037
L: 0.7376 L: 0.0066 L: 0.0037
16 0.6615 0.0087 0.0033 0.0071 0.0054
8 0.6610 0.0101 0.0173 0.0067 0.0056
ET: 1681 ET: 1763
5 0.0227 0.0084 0.0044
L: 0.6626 L: 0.0031
ET: 1560
4 0.0168 0.0125 0.00158 0.0092
L: 0.7240
ET: 1915 BU: 273 BU: 266
3 0.0473 0.0141
L: 0.7692 L: 0.7277 L: 0.7786
Table 3.2 NN RMSE values for different variations of hidden layers and units in each layer.
regions. Figure 3.8 shows a further example for this. Instead, we focus on the PINN for
the test cases. Due to the difficulty of predicting outside of the training distribution even
with the aid of physics knowledge, we deviate away from the default training configuration.
Instead we configure the network parameters to show that given the right conditions it is
possible for a PINN to predict outside of the training distribution. Additionally, for all of
the configurations we either allow the network to run until the termination condition, or
stop the training early once a satisfactory test accuracy is achieved. Table 3.4a reports the
configurations, losses, and iteration numbers. Based on the data, we observe that given the
right architecture it is possible for PINNs to make predictions outside of the training set. The
accuracy of the prediction shown in Figure 3.9 further emphasizes this point. Additionally,
all of the architectures that have been proven to be successful have a low memory footprint,
consisting of 2-3 hidden layers with less than 10 units each for most cases.
3.3 Ideal Pendulum Simulation 25
Nd NN PINN Nd NN PINN
100 0.0051 0.0177 100 0.0093 0.0118
50 0.0044 0.0246 50 0.0181 0.0084
25 0.0136 0.0084 25 0.5434 0.0424
ET: 1131 ET: 1468
15 0.0109 15 0.0292
L: 0.2904 L: 0.5097
ET: 1021 ET: 152
10 0.0756 10 0.1488
L: 1.3906 L: 1.0219
ET: 612 ET: 956
5 0.0470 5 0.4922
L: 1.1184 L: 0.9285
(a) Linearly-spaced points. (b) Uniformly-distributed points.
Table 3.3 RMSE values for variations of numbers of variably-spaced training points.
Noisy data
This training case involves taking 100 linearly-spaced points and adding Gaussian noise to
them. The Gaussian has a mean of 0 and variable standard deviation. We fix the architecture
at 3 hidden layers with 5 units each, as per the default configuration. Table 3.4b reports
the RMSEs and iteration numbers. The PINN outperforms the NN in all cases, but only
marginally so in cases with less noise thresholds. In general, we can see that both the PINN
and NN tend to be impacted with large amounts of noise, although the PINN is better at
adapting to it. The predictions in Figure 3.10 show this to be the case, where the shape of the
solution for the NN deviates away from a sinusoid, whereas the PINN solution maintains
its sinusoidal behaviour. The difficulty of making accurate predictions in the presence of
noise puts forward a strong case for finding efficient ways to denoise data or to model the
uncertainty arising from noise.
26 Predicting the Oscillation Angle of a Swinging Pendulum
Fig. 3.6 PINN and NN predictions on the data for the idealized pendulum using 5 linearly-
spaced training points. The PINN is able to predict the correct solution based on the physics
loss, whereas the NN is only able to fit the training data.
Fig. 3.7 PINN and NN predictions on the synthetic data pendulum using 10 training points
for uniformly-distributed random data. The NN is trained for 150 iterations — its final state
before predictions became unstable. The PINN is trained for 2000 iterations. The PINN
maintains a reasonable fit of the data while the NN struggles due to the data’s irregularity.
3.3 Ideal Pendulum Simulation 27
Fig. 3.8 NN prediction when trained with 1000 adjacent points. The NN fails to extrapolate
the accurately predicted solution on the training points to the last 500 test points.
28 Predicting the Oscillation Angle of a Swinging Pendulum
Fig. 3.9 PINN predicted solution based on the first 5 points of the numerical solution.
The PINN consists of two hidden layers with 12 units in the first and 9 in the second —
corresponding to the 9th entry in Table 3.4a. Remarkably, the PINN is able to accurately
predict the solution despite being trained with only the first 5 points.
30 Predicting the Oscillation Angle of a Swinging Pendulum
Fig. 3.10 PINN vs NN predictions on 100 linearly-spaced points with added Gaussian noise
with a mean of 0 and a standard deviation of 0.5. The PINN and NN solutions are similar,
although the PINN is slightly less impacted by the noise.
3.4 Real Pendulum Experiment 31
Fig. 3.11 AXI IIC block design that we use for our experiments. The ZYNQ7 processing
system is the operating system side of the system-on-chip (SoC) FPGA, on which the Python
layer runs. The AXI IIC block is the direct interface with the sensors through I2C. We
configure the I2C clock frequency to be 1000 KHz. The AXI IIC block interfaces with the
processing system through the AXI interconnect block. Read and write commands are issued
to the AXI interconnect through the Python driver API.
Fig. 3.12 Pendulum oscillation data captured from the experimental setup shown in Fig-
ure 3.13. The data has a much higher frequency than the numerically-generated solution in
Figure 3.3, although the sinusoidal nature is similar enough for making comparisons.
Fig. 3.13 Experimental setup for the pendulum system. The BNO055 [90] is attached to the
pendulum mass. The PYNQ-Z1 board (on top of the desktop computer) interfaces with the
BNO055 through I2C [79].
as many iterations as is necessary and report this value. This is to overcome the spectral bias
issue that we discuss in Section 3.5.3. Once again, we use sine as the activation function. We
use a λ p value of 0.1 and a learning rate of 0.05.
34 Predicting the Oscillation Angle of a Swinging Pendulum
3.5.3 Results
We evaluate training cases similar to the ones in Section 3.3.3. First we outline a problem
encountered when training over large domains, and find an appropriate domain size for
the training cases. Then we evaluate network performances after 50000 iterations using
linearly-spaced, random uniformly-distributed, and adjacent data.
In the initial attempt to train the data, the networks failed to converge to an accurate solution
when the data was sampled over the entire time domain of oscillation. Figures 3.14 and 3.15
show this problem for the PINN and NN respectively. Both networks meet their termination
conditions at just over 2000 iterations, as the optimization algorithm is not able to learn
anything further. This relates to a common problem that NNs, and particularly PINNs, face.
They are difficult to train for high-frequency features or solutions [107]. This is due to the
spectral bias of NNs, where the rate of convergence for low-frequency loss components
is much faster than that of high-frequency ones. Moseley et al. proposed FBPINNs as a
solution to this problem [76], but for our purposes we choose a simpler approach. Increasing
the size of the domain is equivalent to increasing the solution’s frequency. Therefore we
gradually decrease the domain size to find an appropriately-sized problem that we can fix for
the training cases.
Fig. 3.14 PINN predictions over the entire domain of the sampled training data. The PINN
fails to arrive at a valid solution due to the difficulty of optimizing over a large domain.
We keep the network structures fixed and vary the domain proportion of the data that we
use for training. For all of the data proportions, we use linearly-spaced data with a spacing of
7 and 23 points for train and test datasets respectively. In contrast to the train and test points,
we do not constrain the collocation points to be within the domain proportion that we evaluate.
This yielded the best results, and is reasonable choice given that collocation points can be
3.5 Hardware Block Design 35
Fig. 3.15 NN predictions over the entire domain of the sampled training data. Similarly to the
PINN the NN also fails to converge, although it is more flexible in its predictive capability
due to not being constrained by the physics loss term.
evaluated outside of the problem scope if the governing equation is known. Additionally we
set both the gradient and function value/parameter tolerance parameters (tolerance_grad
and tolerance_change) to 0, thereby eliminating the termination condition and allowing
LBFGS to train indefinitely until the maximum number of iterations. This allowed the
optimizer to overcome a training rut where it mistakenly assumes that there is nothing left
to learn. We run the optimizer for as many iterations as required and report the findings in
Table 3.5.
Fig. 3.16 NN predictions over the entire domain of the sampled training data, after 34380
iterations. The main difference from the predictions shown in Figure 3.15 is that we do not
enforce any termination conditions, and instead allow the training to run indefinitely.
The results show that the NN is eventually able to overcome the spectral bias issue for all
domain proportions if it is allowed to run indefinitely. We re-ran the training case with the
entire domain for the NN, but with indefinite training, and found that it is able to converge to
an accurate solution after 34380 iterations with an RMSE of 0.0538. Figure 3.16 shows this
solution. The PINN on the other hand does not perform as well, and is only able to converge
to a reasonable solution for 20% of the domain size.
After a careful investigation of this problem, we found that the cause of it is the large Nd
values for each training case. PINNs can often make more accurate predictions if they are
provided with less training data. This is because of the data loss term Ld dominating over the
physics loss term L p , causing it to become less flexible at adapting to the underlying physics
of the problem. Therefore we re-ran the last three PINN evaluations in Table 3.5 with less
data. The tables in 3.6 show the results based on a variation of the number of training points.
By comparing the RMSE values in Table 3.5 with the ones in Table 3.6, we observe that using
less training points allows the PINN to make more accurate predictions after training. More
expressive architectures and an extensive hyperparameter grid search would be required to
find models that show significant increases in accuracy. For our purposes, we fix the domain
size at 20% to allow for training flexibility for the test cases.
Linearly-spaced data
Table 3.7a shows the training results based on a variation of the number of linearly-spaced
training points. The PINN maintains its accuracy even with the reduction Nd , while the NN
suffers considerably after 167 points. Figure 3.21 shows this trend visualized, where we
3.5 Hardware Block Design 37
see that the NN predictions fail for less data points and the PINN predictions stay relatively
consistent. A closer inspection into the predicted solutions in Figures 3.17 and 3.18, allows us
to see that similar to the ideal case predictions in Section 3.3.3, the PINN is able to regularise
the solution according to physics whereas the NN is only able to fit the data points.
Table 3.7b shows the training results based on a variation of the number of uniformly-
distributed training points. Similar to the case with linearly-spaced points, the PINN maintains
its accuracy regardless of Nd whereas the NN does not. However the NN suffers more in the
case of uniformly-distributed points than it does for linearly-spaced points, as we show in
Figure 3.21. By comparing Figures 3.19 and 3.20, we observe the continuing trend of NNs
fitting data points in contrast with PINNs that are able to regularise based on the governing
equation.
Adjacent points
Table 3.8 shows the results for training based on adjacent points, taken as percentages of
the problem domain. In contrast to the previous evaluations, here both the PINN and the
NN fail at predicting the solution outside of the training data, even though the NN suffers
more significantly. Figure 3.23 shows that the PINN predictions maintain a semblance of the
physical behaviour just outside of the training points, but then fall to 0 after that. The NN
predictions on the other hand, shown in Figure 3.22, maintain no semblance of the governing
physics and are in ranges that are entirely outside of physical plausibility.
38 Predicting the Oscillation Angle of a Swinging Pendulum
Nd NN PINN b Nd NN PINN b
1000 0.0245 0.1357 0.0346 1000 0.0319 0.1394 0.0292
500 0.0231 0.1427 0.0284 500 0.0512 0.1428 0.0273
334 0.0253 0.1636 0.0042 334 0.1596 0.1491 0.0293
250 0.2142 0.1344 0.0288 250 0.1012 0.1514 0.0237
200 0.1110 0.1381 0.0336 200 0.6762 0.1724 0.0204
167 0.0408 0.1772 0.0374 167 1.0134 0.2131 0.0198
143 0.7083 0.3675 0.0468 143 0.7937 0.1942 0.0182
125 0.8337 0.1718 0.0362 125 1.3762 0.2154 0.0157
100 1.3689 0.1661 0.0334 100 2.3499 0.2020 0.0165
84 1.3930 0.1838 0.0342 84 2.0139 0.2046 0.0200
67 2.1644 0.2325 0.0350 67 3.3333 0.3644 0.0142
50 1.3206 0.1828 0.0350 50 2.9213 0.2221 0.0170
(a) Linearly-spaced points. (b) Uniformly-distributed points.
Table 3.7 RMSE values for variations of numbers of training points. The PINN predictions
for both cases stay relatively consistent, whereas the NN predictions fail as the number of
points decreases.
Fig. 3.17 NN predictions based on training with 50 linearly-spaced points. The NN solution
misses the majority of the sinusoids as it is only able to fit data.
box. The pendulum system has served as a simple but effective illustrative example for the
effectiveness of incorporating physics knowledge into deep learning for physical systems.
40 Predicting the Oscillation Angle of a Swinging Pendulum
Fig. 3.18 PINN predictions based on training with 50 linearly-spaced points. In contrast to
the NN prediction in Figure 3.17, the PINN is able to capture the sinusoids correctly due to
physics loss term.
Fig. 3.20 PINN predictions based on training with 50 uniformly-distributed points. The
PINN maintains the trend of making valid predictions.
Fig. 3.21 NN and PINN RMSE values for linearly-spaced and uniformly-distributed data,
reported in Table 3.7. The PINN maintains a constant accuracy irrespective of the number of
training points, while the NN fails as the points get less.
42 Predicting the Oscillation Angle of a Swinging Pendulum
Fig. 3.22 NN predictions based on adjacent points that comprise 40% of the problem domain.
The predictions become unstable outside of the training data region.
Fig. 3.23 PINN predictions based on adjacent points that comprise 40% of the problem
domain. The predictions maintain stability just outside of the training data, but fail to
extrapolate for the rest of the domain.
Chapter 4
4.1 Introduction
In this chapter, we study the performance of PINNs for a slightly more complicated dynamical
system — heat diffusion across the 2D surface of a metal block. The setup that we use
for our experiment is irregular and does not necessarily adhere to a perfect physical model
or governing equation. Additionally, we use a non-uniform scrap metal block that we do
not have ground-truth reference values for its physical parameters (thermal conduction,
density, emissivity, etc.), so we use best-guess values for the coefficients. The sensor that
we use is inexpensive and highly susceptible to noise. Therefore instead of running an
idealized simulation as a benchmark, we immediately start by analysing the experimental
data. Therefore, we are evaluating PINNs on real data in a situation that is dominated by
significant amounts of noise, and where the exact physics is unknown. First we go over the
dynamics of heat diffusion by showing a derivation of the heat equation. Then we outline
our experimental procedure for data collection, denoising, and training. Finally we end the
chapter with some closing insights based on the results.
We outline its derivation which we borrow jointly from Strauss [98] and a PDEs course
handout from Stanford University [60].
First we consider a region D ∈ Rn where n is the number of dimensions. Let x =
[x1 , ..., xn ]T be a spatial vector in Rn , and let u(x, t) be the temperature at point x and time
t. Additionally, let c be the specific heat of the material of region D and ρ its density. We
express H(t), the total amount of heat in calories contained in D as follows:
Z
H(t) = c ρ u(x, t) dx
D
By considering the change in heat we get the following (note the time derivative of u):
dH
Z
= c ρ ut (x, t) dx (4.1)
dt D
Fourier’s law states that the rate of heat transfer is proportional to the negative temperature
gradient, meaning that heat can only flow from hot to cold regions at a rate proportional to
the thermal conductivity k. Mathematically, this is expressed as follows:
dH
Z
= k ∇ u · n̂ dS (4.2)
dt ∂D
∂ D is the boundary of D, n̂ is the outward normal unit vector to ∂ D, and dS is the surface
measure over ∂ D. By equating Equations 4.1 and 4.2 we obtain the following:
Z Z
c ρ ut (x, t) dx = k ∇ u · n̂ dS (4.3)
D ∂D
The Divergence theorem states that the volume integral over an enclosed volume is equal
to the surface integral over the boundary of the volume. For a vector field F this is represented
as follows:
Z Z
F · n̂ dS = ∇ · F dx
∂D D
Therefore, we simplify Equation 4.3 to get:
Z Z
c ρ ut (x, t) dx = ∇ · (k ∇ u) dx
D D
By further simplifying, we obtain the following PDE:
c ρ ut = ∇ · (k ∇u)
Since c, ρ, and k are constants, by simplifying once more we get the heat equation:
4.3 Block Heating Experiment 45
ut = α ∆u (4.4)
Fig. 4.1 MLX90640 pixel RAM chess reading pattern configuration, borrowed from the
datasheet [71]. The highlighted cells correspond to a subpage and we read one with each I2C
transaction. The subpages get updated with new data after each read.
In devising a setup for the heat diffusion experiment, we needed to find a way to con-
veniently collect data without requiring complicated equipment. Therefore, we use scrap
metal aluminium alloy block which we heat using a soldering iron. We chose to use an
aluminium block because we needed a metal that had a medium level of conductivity so that
the observed temperature gradients can be apparent over time. Using a metal that was too
thermally conductive, like copper for example, would result in temperature gradients that are
not very pronounced.
We use the PYNQ-Z1 FPGA as the sensing platform, and record surface temperatures
across the block using the MLX90640 infra-red (IR) thermal camera [71]. The MLX90640
has a resolution of 32x24 pixels, a field-of-view (FOV) of 55°x35°, and a temperature range
of -40°C – 300°C. The pixels are split into two subpages within the RAM of the sensor, for
the odd and even pixels. These pixels are arranged in a chess-like pattern as Figure 4.1 shows,
and we read the RAM twice and compile the subpages together to get a valid frame. The
FPGA issues bulk I2C commands to read the RAM all at once rather than individual I2C
reads for each pixel, as we have found that this method is faster. We use the same AXI IIC IP
block design shown in Figure 3.11 to communicate with the sensor. We implement a custom
driver for interfacing with the MLX90640 through the Python PYNQ API, based off of the
46 Predicting the Surface Temperatures Across a Metal Block During Heating
manufacturer’s device driver library [70]. The main difference in our implementation is that
we obtain the raw frames during the heating and only perform the conversion routines after
the experiment is over and we have collected all of the data. The conversion routines require
the specification of an emissivity parameter, and based on our aluminium block we use an
estimated value of 0.05 for this. For the heat source we use a WES51 soldering iron, but we
replace the conical nib with a custom cylindrical copper tip for better surface contact for the
conduction. Figure 4.2 shows the custom tip and Figure 4.3 shows the entire experimental
setup.
Fig. 4.2 Custom copper tip to improve surface contact for conduction.
Before we start the experiment, we configure the sensor by setting the analogue-to-digital
converter (ADC) resolution to 19 bits, and the frame refresh rate to 64 Hz. Next we ensure
that the camera is positioned correctly so that it covers the entire area of the block. To do
this we implement real-time thermal imaging so that we can observe what the camera is
capturing. We heat the block slightly so that it is visually apparent on the thermal imager.
Using this setup, we move the camera and the block accordingly until both are positioned
in place with the entire block surface appearing in frame. Then we leave the block to cool
down and after that we turn the temperature up to approximately 298°C. We insert the iron
into the hole once the temperature has stabilized and begin recording.
We record the raw sensor readings for 15 minutes. After that we convert the readings
into temperature measurement values using conversion routines specified by the sensor
4.3 Block Heating Experiment 47
Fig. 4.3 Block heating experimental setup. The MLX90640 is held with an alligator clip
attached to a flexible helping hand. The custom solder tip is inserted into the block from the
side into a hole so that it fits in place during heating. The sensor is directly connected to the
FPGA.
manufacturer. We save the readings and then move them onto the workstation to process the
data.
Fig. 4.4 Converted temperature measurements over time for 8 randomly-selected pixels. The
measurements are noisy and are also dominated by noise spikes.
The existence of high amplitude spikes, as well as the high fluctuations in the temper-
ature signal will cause training difficulties in the optimization problem that we formulate.
Therefore, it was necessary for us to implement a denoising strategy to eliminate the spikes
and smooth out the data over time. Later on, we investigate the training results with and
without denoising.
The first step was to filter out the spikes in the the data. The main issue that we faced here
was that different pixels exhibit spikes at different instances in time. Therefore, we adopt a
strategy where we remove the entire frame from the data if any given pixel exhibits spiky
behaviour. A better alternative to this would be to replace spike pixels with the average of
their neighbouring pixels, however this introduces an additional level of complexity given that
neighbouring pixels in both space and time may also be spiky. For our purposes, our filtering
method has acceptable performance. Algorithm 1 outlines the spike filtering algorithm. We
iterate over all of the pixels and their respective measurements over time, and compare their
values against their neighbours, adding the spike value indices to the spiky_indices array.
4.3 Block Heating Experiment 49
Fig. 4.5 3D plots of the heating profile at different instances in time. The plots show valid
temperature gradients over time.
We delete the spiky indices from the data to obtain the spike-filtered data. We use a threshold
value of 100, and based on this we find that the spikes comprise 8.53% of the data. Figure 4.7
shows a plot of the spike-filtered data for 4 randomly chosen pixels. We notice that two
regions of the data have been filtered out completely as they correspond to durations with
high spike concentratons across many pixels. Additionally, we notice that a few downwards
spikes are still present. These will be smoothed out with our next denoising step.
Data smoothing
After filtering out spikes, the data still exhibits fluctuations in temperatures that cannot
be physically possible. Therefore we perform an additional step to smooth out the data
fluctuations and capture the true behaviour of the temperature signals. For this we apply a
Savitzky-Golay filter [93] on the time-series for each pixel, using the SciPy library [106]. The
Savitzky-Golay filter is a data-smoothing algorithm based on fitting low-order polynomials
using the linear least squares method. It is a simple and effective method that is suitable
for our purposes since the temperatures measurements maintain a relatively linear upwards
50 Predicting the Surface Temperatures Across a Metal Block During Heating
Fig. 4.6 Frame visualisation of one of the temperature spikes shown in Figure 4.4. These
correspond to instances when the camera fails to capture valid frames.
trend. We use a polynomial order of 3, and a window size of 400. Figure 4.8 shows the
measurements after smoothing with the filter.
• Optimizer: LBFGS.
• λ p : 0.5.
The loss function we use is based on a substitution of the 2D version of Equation 4.4 in
Equation 2.4. Additionally we solve an inverse problem to discover the α parameter, although
we assume that the value might differ slightly for the x and y dimensions so we use the
coefficients α and β for x and y respectively. Thus, the result is Equation 4.5.
N
1 Nd λp p i
L=
Nd i=1
| f − u i 2
∑ u d Np ∑ |ut − α uixx − β uiyy|2
| + (4.5)
i=1
Our initial attempt at training involved using LBFGS with full batch training, similar to
the approach we took in Chapter 3, but with 3 dimensions for the x, y, and t coordinates.
The current LBFGS Pytorch implementation only supports full-batch training. The full
dataset collected from our experiment consists of 18, 798 frames where each frame consists
of 768 2-byte pixels, and the denoised data similarly consists of 17, 195 frames. Each pixel
52 Predicting the Surface Temperatures Across a Metal Block During Heating
Fig. 4.7 Spike-filtered temperature time-series for four randomly-chosen pixels. The regions
near 340 and 470 seconds displayed high concentrations of spikes across the pixels so were
filtered out completely.
Fig. 4.8 Data for 4 random pixels after applying the Savitzky-Golay filter [93]. The data
retains small amounts of noise, although most of it has been smoothed out.
We reduce our problem size so that we focus on a central square of the frame. Figure 4.10
shows contour plots for different reduced-size frames after 433.43 seconds of heating. After
frame reduction the memory taken up by our training data has reduced, and we are now
also able to use LBFGS with full-batch training. Figure 4.11 shows the 8x8 reduced frame
training cases compared against the full-frame ones using Adam, and Figure 4.12 shows a
similar evaluation but only for the 8x8 reduced frame using LBFGS. For LBFGS we used
a 3-layer network with 64 units in the first layer as the 2-layer architecture caused training
instabilities. Additionally we trained the LBFGS case for 500 iterations as the network was
still able to learn more, as opposed to the Adam case where the RMSE plateaued after 20
iterations. The best case RMSE for Figure 4.11 was 23.67 while the best case RMSE for
Figure 4.12 was 9.42. We can see that LBFGS performs much better than Adam so we fix it
for the upcoming evaluations. Additionally we will only be training with the denoised data.
4.3.5 Results
Frame size variation
Here we investigate the training performance based varying frame size. We train with 819
linearly-spaced frames and test with 441 linearly-spaced frames. We train until convergence
54 Predicting the Surface Temperatures Across a Metal Block During Heating
Fig. 4.9 RMSE graph for a 2-layer 32-unit PINN and NN trained with Adam [52]. The
training data is shown based on the raw and denoised data, and is sampled from the full
768-pixel frames. Denoising certainly helps achieve better training results, although there is
a negligible difference between PINN and NN training performance.
and report on the number of iterations needed. Table 4.1 shows the training results for
different frame dimensions. We notice a predictable pattern in the PINN RMSE column in
that the values decrease with a decreasing frame size. The NN RMSEs do not adhere to this
pattern as strictly, as we can see with the RMSE for the 6x6 pixel frame RMSE which is
greater than that of the 10x10 frame.
Fig. 4.10 Reduced size frames at t = 433.43. Smaller frames are easier to train with than
larger ones.
Linearly-spaced frames
We fix the frame size to be 8x8 pixels and investigate predictive performance for a variation
of the number of linearly-spaced frames. Table 4.2 shows the results, and the second and third
rows of Figure 4.13 show visualisations of the predictions. Contrary to our expectation, the
PINN results do not appear to be better than the NN results as the amount of data decreases,
except for the 36-frame and 6-frame cases. Additionally, we generally see from Figure 4.13
that the PINNs failed to capture spatial temperature gradients compared to NNs, as they
made near constant temperature predictions for their entire frames. This may be because the
α and β parameters are not optimized, or because of the fact that the data does not strongly
adhere to the physics. This is most likely due to the large amounts of noise in the system
56 Predicting the Surface Temperatures Across a Metal Block During Heating
Fig. 4.11 Reduced frame vs full frame training comparison. The minimum RMSE is 23.67.
Both denoising and reducing the frame size improve the training performance, although the
training accuracy remains to be satisfactory.
which influences what the physics looks like for the observed data. It may be possible to get
results for the PINN that are better than the NN by tweaking the hyperparameters since the
optimization process is stochastic in nature.
Uniformly-distributed points
Similarly to the case for linearly-spaced frames, we vary the number of training points
within the domain but this time based on a random choice of uniformly-distributed points.
Table 4.3 shows the training performance for these evaluations, and the fourth and fifth rows
of Figure 4.13 show visualisations for their predictions. With the exception of a few outlier
training cases, such as Nd = 73408 and Nd = 384, the PINN and NN have similar RMSEs.
In general, the concentration of points at different spatiotemporal regions would affect how
well the networks predict for those regions. However, based on the constant RMSE trend at
values close to 9 irrespective of Nd , it may be the case that for this problem the networks are
converging to similar solutions regardless of the training data. The similarity between the
predicted solutions in the second and fourth rows of Figure 4.13 further support this idea.
Fig. 4.12 LBFGS training evaluation for a 3-layer network with 64-32-32 units. The minimum
RMSE is 9.42. Using LBFGS improves training performance considerably.
a block as it is being heated, based on an experiment that we set up. We also looked into
issues related to sensor denoising and its effect on training. Unfortunately we have not
been able to reconstruct a sufficiently accurate solution with PINNs nor with NNs, although
we have found ways of achieving better accuracy. These include denoising the data, using
a smaller frame size to reduce the difficulty of the optimization problem, and using the
LBFGS optimizer instead of Adam. We hypothesize that, similarly to what we discussed in
Section 3.5.3, the training domain in time is too large, resulting in stiffness in the optimization
due to the difficulty of traversing the loss landscape. This is especially the case when we
are trying to reconstruct 2D temperature grids rather than single angle measurements as in
Section 3.5.2. In Section 3.5.3 we solved this issue by reducing the domain of our problem
to make the optimization easier. This was applicable for the case of the pendulum since it
had a predictable sinusoidal pattern in a single dimension. The complexity of the 2D heat
diffusion problem on the other hand is something that we want to fully investigate for the
entire duration of our experiment, so we are interested in solving it for the entire domain.
One possible approach to this is to use sequence-to-sequence training, as was proposed by
Krishnapriyan et al. [55]. Sequence-to-sequence training involves splitting the time domain
into time-steps and training on the consecutive time-steps, one after the other. Krishnapriyan
et al. show that using it enables them to achieve lower losses in the order of 1-2 magnitudes
for a simulation of a 1D reaction-diffusion problem. We believe that applying it to our
problem could allow us to arrive at accurate solutions for both NNs and PINNs. Additionally,
by observing the coefficient values in Tables 4.1, 4.2, and 4.3 we notice that the α and β
coefficients do not change much from their initial value. It it generally the case that inversion
58 Predicting the Surface Temperatures Across a Metal Block During Heating
(a) Test data, i = 35. (b) Test data, i = 200. (c) Test data, i = 400.
(d) LS, NN, i = 35. (e) LS, NN, i = 200. (f) LS, NN, i = 400.
(g) LS, PINN, i = 35. (h) LS, PINN, i = 200. (i) LS, PINN, i = 400.
(j) UD, NN, i = 35. (k) UD, NN, i = 200. (l) UD, NN, i = 400.
(m) UD, PINN, i = 35. (n) UD, PINN, i = 200. (o) UD, PINN, i = 400.
Fig. 4.13 Comparison between the test data and the predictions of NNs and PINNs after
training with 832 points. LS denotes linearly-spaced data and UD denotes uniformly-
distributed random data. i refers to the time sample index. The NNs capture slight heating
gradients in space, whereas the PINNs predict almost constant temperatures for specific
frames.
4.4 Closing Remarks 59
Nfr
NN PINN α β
Nd
3439 8.344 9.537
10.0382 9.8948
220096 5000 5000
1147 8.701 17.526
9.9853 9.9753
73408 5000 1901
574 9.805 9.426
10.0397 10.0631
36736 5000 4987
287 9.272 9.462
9.9517 9.9112
18368 4999 4994
144 8.952 9.421
9.9574 9.8677
9216 5000 5000
72 8.828 9.443
10.0699 9.7978
4608 2898 5000
36 11.381 9.436
9.7256 9.8428
2304 4388 5000
28 9.579 23.606
10.0837 9.9646
1792 1943 461
18 9.508 15.905
9.9908 9.8933
1152 1859 819
13 7.915 9.578
9.7833 9.8016
832 5000 5000
11 9.683 14.848
9.8494 9.9277
704 1633 4820
6 13.394 11.281
9.9824 9.7317
384 823 3547
Table 4.2 RMSE values based on a variation of the number of linearly-spaced frames. Nfr is
the number of frames and Nd is the corresponding number of points.
problems suffer from ill-posedness of optimization, require large numbers of network forward
passes, and are highly susceptible to noise [75]. Instead of posing their optimization as an
inverse problem, Krishnapriyan et al. introduce a curriculum learning approach where the
PINN is initially trained on small coefficient values, and then gradually retrained with larger
coefficients [55]. This may be a promising approach to look into for our problem, since the
abundance of noise in our setup may have affected the optimal values for the coefficients. A
curriculum learning approach may be promising in terms of searching the solution space of
the coefficients.
60 Predicting the Surface Temperatures Across a Metal Block During Heating
Nd NN PINN α β
9.475 9.429
220096 9.9781 9.9761
3557 2542
7.016 15.436
73408 -4e5 -8e5
3500 4562
9.2324 9.5814
36736 9.9888 9.7793
5000 5000
8.792 10.384
18368 10.0844 9.8509
4341 5000
8.453 9.848
9216 9.8036 9.8602
2605 5000
14.030 14.599
4608 9.9838 9.9897
1163 4506
9.482 10.517
2304 9.9933 10.0076
4468 1311
20.813 23.072
1792 -1e13 -9e12
155 5000
11.198 9.756
1152 9.8414 9.8285
2666 5000
7.819 9.467
832 10.0405 9.932
5000 3139
9.241 9.470
704 10.1434 10.0899
5000 2458
9.637 33.971
384 10.0501 10.0402
3564 225
Table 4.3 RMSE values based on a variation of the number of uniformly-distributed points
Nd .
Chapter 5
5.1 Introduction
This chapter studies sensing issues related to deployment within hardware for real-time
inference, and focuses particularly on time coherence. Time coherence is the degree to which
two or more n-dimensional data inputs that occur at the same time instance are captured
at with minimal latency between them, for an arbitrary value of n. The inputs to deployed
predictive models will commonly arrive from digital interfaces of sensors, and in many cases
they could be multi-dimensional inputs arriving from many different sensors. Common
embedded microcontrollers face difficulties in maintaining time coherence for many sensors
due to the sequential nature of their programs. FPGAs on the other hand are inherently
parallel, and so are an appropriate choice for this type of sensing architecture as they are able
to sample independently from parallel interfaces. Based on this, we present an experiment to
shed light on the issues of time and space coherence for digital sensing applications.
Time
During the experiment, we captured timestamps for each measurement for all of the cameras
using the Python time.time() function which returns the time elapsed in seconds since the
1st of January 1970. Therefore, all of the measurements have an absolute reference of time
that we can compare against. First, we subtract the first timestamp for each of the cameras
from their respective timestamp arrays so that time starts from 0 for each of them. To compare
the time-coherence of the data between different cameras, we subtract their timestamps from
each other and observe the time difference. Figure 5.2 shows a plot of the differences. We
can see that over time, the ∆ T values between the cameras increases, indicating that the data
misalignment is increasing. In the worst case, ∆T between cameras 2 and 4 drifts to a 6
second difference by the end of the experiment duration, which is significant.
5.2 Parallel Capture Heating Experiment 63
Fig. 5.1 Experimental setup for the parallel heating experiment. We use 5 magnetic alligator
clamps to hold 5 MLX90640 thermal cameras, which are connected to 5 PYNQ-Z1 FPGA
boards. The cameras are pointed at the block so that the block surface takes up the most area
in the camera FOVs.
To ensure data validity, the frames for all of the cameras would have to be aligned in time,
but this unfortunately is not the case with the data. There are two issues that are apparent
here. The first is that the cameras have different levels of delays between each other, so a
proper reference point would have to be found which is not a trivial matter. The second is
that between any two given cameras the amount of delay is not constant, nor is it constant
across the measurements of any single camera. Therefore, shifting the measurements for the
cameras by fixed amounts would still not lead to correct alignment. The proper course of
action in this case is to ensure that samples are taken from one device from 5 independant
interfaces using dedicated AXI IIC blocks.
64 Parallel Hardware and Time-coherent Sensing
Fig. 5.2 Time sample difference over the experiment duration for different cameras. The time
coherence between each camera reduces over time, and in the worst case the difference is 6
seconds (cameras 2 and 4).
Space
From right to left in Figure 5.1, the cameras are at angles 49.0°, 71.5°, 90.0°, 114.0°, and
144.0°. Since the vertical camera (third camera) did not work, we will call the next two
cameras, from right to left, cameras 3 and 4. Figures 5.4, 5.5, 5.6, and 5.7 show the raw
frames obtained from the four cameras at different time samples. These samples are delayed
in time for the different cameras based on the sample differences that Figure 5.2 shows. One
of the first things we notice from the camera data, is the spatial misalignment of the frame
views between the different cameras. Camera 1’s view is positioned slightly to the right of
camera 2’s, as we can see from the solder end which appears in frame. Cameras 3 and 4
on the other hand do not have the solder end in the frame. We also have reason to believe
that camera 3’s lens is not at the same orientation as that of the rest of the cameras, despite
it being positioned in the correct way in the experimental setup. This is because the hot
part resulting from the solder is seen appearing vertically between X pixels 15 and 20 (see
Figure 5.7c), rather than horizontally similar to the rest of the cameras. Also, camera 3 might
be faulty since its recorded temperatures are not in the same range as the other cameras
(almost 50 – 100°C less).
5.3 Closing Remarks 65
In an ideal case where the cameras are aligned in time, we would align the data in space
so that the frames from each camera represent the same area. We would do this by shifting
their perspectives so they all point vertically downwards. Since the data is not aligned in
time nor in space, it would not make sense to do this so we instead compare the data from
cameras 1 and 2 from t = 0 s to t = 600 since these cameras are closest in space, and have
the least time delay between them up to 600 seconds.
We compare the rectangular patch just after the solder end for the two cameras. For
camera 1, the patch has corners with pixel coordinates (1, 7), (29, 7), (1, 14), (29, 14). For
camera 2, the corner coordinates are (4, 9), (32, 9), (4, 16), (32, 16). Figure 5.3 shows a
comparison of the patch for the two cameras at t = 240.85 s. Figure 5.8 shows histograms
for the differences between the pixels within the rectangular patch at five instances in time.
The expectation was that the temperature differences would be near 0 for the early part of the
heating process, and then they would increase due to the gradual loss of time coherence. The
histogram plots show that this is not the case, as there are large temperature differences that
are consistent throughout the heating process.
Fig. 5.3 A comparison of the rectangular patch which we attribute as focusing in on the same
area between the two cameras. The temperature ranges and the temperature grid are visually
similar.
5.3 Closing Remarks 67
Fig. 5.8 Histogram plots for the temperature difference between the pixels in the rectangular
patch which corresponds to the same area in two different cameras (1 and 2). We can see
that, even though a significant number of the differences are near 0, the majority of them are
to the right of the graph and have large errors.
Chapter 6
6.1 Discussion
In Section 1.1, we introduced the central research questions that motivated our work in
this dissertation. The first question assessed the viability of physics-informed models as
predictive bases for experimental data captured from real-world systems. Chapters 3 and 4
addressed this, with a study of the performance of PINNs on two different physical systems:
a simple 1D nonlinear pendulum, and a more complicated 2D heat diffusion system.
For the pendulum many of the training cases, for both the idealized system and the
experimental data, have proven that PINNs outperform standard NNs when it comes to
regularizing differential equation solutions for sparse, noisy, and low-density data regions.
This puts forward a strong case for encoding known information about system dynamics in
deep learning as opposed to treating NNs as black boxes, especially for experimental data.
The nature of experimental data is that it is dominated by aleatoric and epistemic uncertainty,
and using PINNs or other techniques that encode physics information could be a promising
strategy for taking these uncertainties into account.
Unfortunately, training for the heat diffusion system did not fare as well as it did for the
pendulum system. This was the case for both the NN and PINN cases. The reason for this
ties back to the explanation in Section 3.5.3 — the optimization problem is too difficult to
solve on the entire domain of the heating process. This is especially the case for the PINN,
since the loss landscape is more difficult to traverse with the second-order derivative terms.
Table 4.1 supports this point, since the RMSE values improve once the spatial domain is
reduced. A reduction in the temporal domain may also be necessary. Additionally, it may
be the case that the data collected from the experiment is not adherent to the physics to a
sufficient degree. This would explain the PINN frame predictions in Figure 4.13, where
the model predicts near constant temperatures throughout entire frames for specific time
6.2 Outlook 73
instances. It may be valuable to study the thermal and sensor noise models better to find a
way to incorporate them into the training.
The second question focused on the feasibility of deployment of physics-informed models
in physical setups, and the issues that might be faced in attempting to do so. To this end
we used inexpensive sensors for our experiments and an FPGA as our embedded platform,
and presented a review of issues related to sensor time coherence and spatial alignment in
Chapter 5.
The hardware design shown in Figure 3.11 uses a single AXI IIC block with a 1000 KHz
I2C clock frequency. The experiment in Section 5.2 uses this design in the 5 FPGAs to sense
in parallel. While this may be a reasonable setup for independant parallel sensing, as shown
in Section 5.2.2 it fails to retain time coherence in practise. A better approach that retains
it, is to use an FPGA design with 5 AXI IIC blocks on a single board with the same I2C
clock. The FPGA design should include a double buffer BRAM, where in one clock cycle
the first BRAM stores the sensor data and the second BRAM outputs the data through the
AXI interface. In the next cycle the BRAM roles are reversed — the AXI interface receives
the previously stored data from the first BRAM and the second BRAM stores the sensor data.
This approach ensures that time coherence would be retained.
Spatial data alignment is a more complicated issue to tackle. The first step is to ensure
that all of the cameras are calibrated properly so that they measure temperatures accurately
and within the same ranges. After that, the cameras should be positioned correctly based on
proper alignment of their FOVs with the block’s surface. Once this is done, post-processing
frame transformations will be required to ensure that the camera views all point vertically
downwards on the surface of the camera. These steps will ensure that the pixels from each
camera are aligned with each other, after which their comparison will be valid.
6.2 Outlook
The motivation behind this work is to develop methods for encoding dynamics for the
deployment of robust physics-aware models within real physical systems, for real-time
inference and by extension model predictive control (MPC). Figure 6.1 shows an architectural
diagram for the system that our methods work towards. ML models are being deployed in
a wide variety of modern technologies such as autonomous vehicles [94], biosensing [73],
patient health monitoring [29], and smart manufacturing [5]. Most of the models used in these
applications are context-agnostic and are unaware of the dynamics of the environments that
they exist within. The development of the systems that follow the architecture in Figure 6.1
would introduce domain knowledge into models that make safety-critical decisions under
74 Discussion and Future Work
uncertain and changing environments. This would increase their robustness and adaptability
and would also give us confidence in their decisions, since we know that they are based
on widely understood physical principles. Extending domain-specific languages such as
Newton [64] with features for encoding dynamics, coupled with the reconfigurability of
FPGAs, would enable generalization of deployment across a wide variety of physical systems
with different dynamical behaviours and with adaptive computer architectures.
Fig. 6.1 Architectural diagram for our proposed system. A user encodes the differential
equation for a system using a description language such as Newton [64]. A back-end compiler
performs static analysis on the Newton description to generate a PINN architecture, which
can be trained offline using experimental measurements taken from the system. The trained
PINN can then be synthesized onto an FPGA using high-level synthesis (HLS) tools. Finally,
the user can then integrate the FPGA with the synthesized model into the system for real-time
inference, and by extension control.
76 Discussion and Future Work
Conclusion
This dissertation has investigated two central motivating questions. The first is whether or not
the encoding of differential equations in machine learning improves predictive performance
for data collected from real physical systems. The second relates to the viability of deploying
physics-informed models within physical systems for real-time inference. To answer the first
question, we studied the performance of physics-informed neural networks for two different
systems: a simple nonlinear pendulum, and 2D heat diffusion across the surface of a metal
block.
For the first system, we found that the inclusion of the physics loss term based on the
system’s governing equation, helped in regularizing the solution according to the underlying
physics. This resulted in accurate predictions of the exact solution for both the ideal numerical
solution and for the experimental data, for very few training points. In the best case, the
PINN achieved an 18× accuracy improvement over an equivalent NN for 10 linearly-spaced
training points for the ideal data, and an over 6× improvement for 10 uniformly-distributed
random points. For the real data case, the PINN achieved accuracy improvements of 9.3×
and 9.1× for 67 linearly-spaced and uniformly-distributed random points respectively. This
proves the predictive performance benefits of encoding known physics into neural networks
for both ideal and real data cases for a simple pendulum system.
For the heat diffusion system, we addressed challenges related to denoising thermal
camera data and simplifying the optimization for a complex 2D system. We have shown that
data denoising, frame size reduction, and optimization using LBFGS are ways to improve
the accuracy of network predictions. The PINN and NN showed similar RMSE values, and
we were unable to obtain satisfactory accuracy for both despite the improvements. This was
because of the difficulty in the underlying optimization problem when it exists within a large
domain.
78 Conclusion
[1] Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S.,
Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G.,
Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., and Zheng, X. (2016).
Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on
Operating Systems Design and Implementation (OSDI 16), pages 265–283.
[2] Ablowitz, M. and Prinari, B. (2008). Nonlinear Schrodinger systems: continuous and
discrete. Scholarpedia, 3(8):5561. revision #137230.
[3] Abowd, G. D., Dey, A. K., Brown, P. J., Davies, N., Smith, M., and Steggles, P. (1999).
Towards a better understanding of context and context-awareness. In Gellersen, H.-W.,
editor, Handheld and Ubiquitous Computing, pages 304–307, Berlin, Heidelberg. Springer
Berlin Heidelberg.
[4] Abramowitz, M. and Stegun, I. A. (1948). Handbook of mathematical functions with
formulas, graphs, and mathematical tables, volume 55. US Government printing office.
[5] Ahmad, H. M. and Rahimi, A. (2022). Deep learning methods for object detection in
smart manufacturing: A survey. Journal of Manufacturing Systems, 64:181–196.
[6] Arnold, F. and King, R. (2021). State–space modeling for control based on physics-
informed neural networks. Engineering Applications of Artificial Intelligence, 101:104195.
[7] Arroyo Leon, M., Ruiz Castro, A., and Leal Ascencio, R. (1999). An artificial neural
network on a field programmable gate array as a virtual sensor. In Proceedings of the Third
International Workshop on Design of Mixed-Mode Integrated Circuits and Applications
(Cat. No.99EX303), pages 114–117.
[8] Arzani, A., Wang, J.-X., and D’Souza, R. M. (2021). Uncovering near-wall blood
flow from sparse data with physics-informed neural networks. Physics of Fluids, 33(7).
071905.
[9] Bade, S. and Hutchings, B. (1994). Fpga-based stochastic neural networks-
implementation. In Proceedings of IEEE Workshop on FPGA’s for Custom Computing
Machines, pages 189–198.
[10] Bai, J., Lu, F., Zhang, K., et al. (2019). Onnx: Open neural network exchange.
https://github.com/onnx/onnx.
80 REFERENCES
[11] Baker, N., Alexander, F., Bremer, T., Hagberg, A., Kevrekidis, Y., Najm, H., Parashar,
M., Patra, A., Sethian, J., Wild, S., Willcox, K., and Lee, S. (2019). Workshop report
on basic research needs for scientific machine learning: Core technologies for artificial
intelligence. U.S. Department of Energy Office of Scientific and Technical Information.
[12] Beléndez, A., Pascual, C., Méndez, D., Beléndez, T., and Neipp, C. (2007). Exact
solution for the nonlinear pendulum. Revista brasileira de ensino de física, 29:645–648.
[13] Bhustali, P. (2021). Physics-informed-neural-networks. https://github.com/
omniscientoctopus/Physics-Informed-Neural-Networks.
[14] BIPM (2019). Le Système international d’unités / The International System of Units
(‘The SI Brochure’). Bureau international des poids et mesures, ninth edition.
[15] Bishop, C. M. (2006). Pattern Recognition and Machine Learning (Information Science
and Statistics). Springer-Verlag, Berlin, Heidelberg.
[16] Blott, M., Preußer, T. B., Fraser, N. J., Gambardella, G., O’brien, K., Umuroglu, Y.,
Leeser, M., and Vissers, K. (2018). Finn-r: An end-to-end deep-learning framework for
fast exploration of quantized neural networks. ACM Trans. Reconfigurable Technol. Syst.,
11(3).
[17] Boyce, W., DiPrima, R., and Meade, D. (2017). Elementary Differential Equations and
Boundary Value Problems. Wiley.
[18] Buckingham, E. (1914). On physically similar systems; illustrations of the use of
dimensional equations. Phys. Rev., 4:345–376.
[19] Burgers, J. (1948). A mathematical model illustrating the theory of turbulence. In Von
Mises, R. and Von Kármán, T., editors, Advances in Applied Mechanics, volume 1, pages
171–199. Elsevier.
[20] Cai, S., Mao, Z., Wang, Z., Yin, M., and Karniadakis, G. E. (2021a). Physics-
informed neural networks (pinns) for fluid mechanics: A review. Acta Mechanica Sinica,
37(12):1727–1738.
[21] Cai, S., Wang, Z., Wang, S., Perdikaris, P., and Karniadakis, G. E. (2021b). Physics-
Informed Neural Networks for Heat Transfer Problems. Journal of Heat Transfer,
143(6):060801.
[22] Chen, R. T. Q., Rubanova, Y., Bettencourt, J., and Duvenaud, D. K. (2018). Neural
ordinary differential equations. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K.,
Cesa-Bianchi, N., and Garnett, R., editors, Advances in Neural Information Processing
Systems, volume 31. Curran Associates, Inc.
[23] Cloutier, J., Cosatto, E., Pigeon, S., Boyer, F., and Simard, P. (1996). Vip: an fpga-based
processor for image processing and neural networks. In Proceedings of Fifth International
Conference on Microelectronics for Neural Networks, pages 330–336.
[24] Cox, C. and Blanz, W. (1992). Ganglion-a fast field-programmable gate array implemen-
tation of a connectionist classifier. IEEE Journal of Solid-State Circuits, 27(3):288–299.
REFERENCES 81
[25] Crank, J. (1975). The Mathematics of Diffusion. Oxford science publications. Clarendon
Press.
[26] Cromer, A. (1981). Stable solutions using the Euler approximation. American Journal
of Physics, 49(5):455–459.
[27] Cuomo, S., Di Cola, V. S., Giampaolo, F., Rozza, G., Raissi, M., and Piccialli, F. (2022).
Scientific machine learning through physics–informed neural networks: where we are and
what’s next. Journal of Scientific Computing, 92(3):88.
[28] Dahmen, S. R. (2015). On pendulums and air resistance: the mathematics and physics
of denis diderot. The European Physical Journal H, 40:337–373.
[29] Davoudi, A., Malhotra, K. R., Shickel, B., Siegel, S., Williams, S., Ruppert, M.,
Bihorac, E., Ozrazgat-Baslanti, T., Tighe, P. J., Bihorac, A., et al. (2019). Intelligent icu
for autonomous patient monitoring using pervasive sensing and deep learning. Scientific
reports, 9(1):8020.
[30] Denil, M., Shakibi, B., Dinh, L., Ranzato, M., and de Freitas, N. (2013). Predicting
parameters in deep learning. In Proceedings of the 26th International Conference on
Neural Information Processing Systems - Volume 2, NIPS’13, page 2148–2156, Red Hook,
NY, USA. Curran Associates Inc.
[31] Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., and Fergus, R. (2014). Exploiting
linear structure within convolutional networks for efficient evaluation. Advances in neural
information processing systems, 27.
[32] Ding, Y., Wu, J., Gao, Y., Wang, M., and So, H. K.-H. (2023). Model-platform opti-
mized deep neural network accelerator generation through mixed-integer geometric pro-
gramming. In 2023 IEEE 31st Annual International Symposium on Field-Programmable
Custom Computing Machines (FCCM), pages 83–93.
[33] Eeckhout, L. (2017). Is moore’s law slowing down? what’s next? IEEE Micro,
37(04):4–5.
[34] El-Maksoud, A. J. A., Ebbed, M., Khalil, A. H., and Mostafa, H. (2021). Power efficient
design of high-performance convolutional neural networks hardware accelerator on fpga:
A case study with googlenet. IEEE Access, 9:151897–151911.
[35] Evans, L. (2010). Partial Differential Equations. Graduate studies in mathematics.
American Mathematical Society.
[36] Farlow, S. (1993a). Partial Differential Equations for Scientists and Engineers. Dover
books on advanced mathematics. Dover Publications.
[37] Farlow, S. (1993b). Partial Differential Equations for Scientists and Engineers. Dover
books on advanced mathematics. Dover Publications.
[38] Ferrer, D., Gonzalez, R., Fleitas, R., Acle, J., and Canetti, R. (2004). Neurofpga-
implementing artificial neural networks on programmable logic devices. In Proceedings
Design, Automation and Test in Europe Conference and Exhibition, volume 3, pages
218–223 Vol.3.
82 REFERENCES
[50] Karniadakis, G. E., Kevrekidis, I. G., Lu, L., Perdikaris, P., Wang, S., and Yang, L.
(2021). Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440.
[51] Khan, A. and Lowther, D. A. (2022). Physics informed neural networks for electromag-
netic analysis. IEEE Transactions on Magnetics, 58(9):1–4.
[52] Kingma, D. and Ba, J. (2014). Adam: A method for stochastic optimization. Interna-
tional Conference on Learning Representations.
[53] Kingma, D. P. and Welling, M. (2022). Auto-encoding variational bayes.
[54] Korteweg, D. D. J. and de Vries, D. G. (1895). Xli. on the change of form of long
waves advancing in a rectangular canal, and on a new type of long stationary waves.
The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science,
39(240):422–443.
[55] Krishnapriyan, A., Gholami, A., Zhe, S., Kirby, R., and Mahoney, M. W. (2021). Char-
acterizing possible failure modes in physics-informed neural networks. In Beygelzimer,
A., Dauphin, Y., Liang, P., and Vaughan, J. W., editors, Advances in Neural Information
Processing Systems.
[56] Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with
deep convolutional neural networks. In Pereira, F., Burges, C., Bottou, L., and Weinberger,
K., editors, Advances in Neural Information Processing Systems, volume 25. Curran
Associates, Inc.
[57] Kuhn, K. J. (2009). Cmos scaling beyond 32nm: Challenges and opportunities. In
Proceedings of the 46th Annual Design Automation Conference, DAC ’09, page 310–313,
New York, NY, USA. Association for Computing Machinery.
[58] Lagaris, I., Likas, A., and Fotiadis, D. (1998). Artificial neural networks for solving
ordinary and partial differential equations. IEEE Transactions on Neural Networks,
9(5):987–1000.
[59] Leoni, P. C. D., Agarwal, K., Zaki, T. A., Meneveau, C., and Katz, J. (2023). Recon-
structing turbulent velocity and pressure fields from under-resolved noisy particle tracks
using physics-informed neural networks. Experiments in Fluids, 64(5).
[60] Levandosky, J. (2003). Math 220b lecture notes. https://web.stanford.edu/class/
math220b/handouts/HEATEQN.pdf. Accessed: 14/08/2023.
[61] Li, S., Wang, G., Di, Y., Wang, L., Wang, H., and Zhou, Q. (2023). A physics-informed
neural network framework to predict 3d temperature field without labeled data in process
of laser metal deposition. Engineering Applications of Artificial Intelligence, 120:105908.
[62] Liang, S., Yin, S., Liu, L., Luk, W., and Wei, S. (2018). Fp-bnn: Binarized neural
network on fpga. Neurocomputing, 275:1072–1086.
[63] Lienhard, IV, J. H. and Lienhard, V, J. H. (2019). A Heat Transfer Textbook. Dover
Publications, Mineola, NY, 5th edition.
84 REFERENCES
[64] Lim, J. and Stanley-Marbell, P. (2018). Newton: A language for describing physics.
CoRR, abs/1811.04626.
[65] Liu, D. C. and Nocedal, J. (1989). On the limited memory bfgs method for large scale
optimization. Mathematical programming, 45(1-3):503–528.
[66] Liu, Z., Dou, Y., Jiang, J., and Xu, J. (2016). Automatic code generation of convolutional
neural networks in fpga implementation. In 2016 International Conference on Field-
Programmable Technology (FPT), pages 61–68.
[67] Lu, L., Jin, P., Pang, G., Zhang, Z., and Karniadakis, G. E. (2021). Learning nonlinear
operators via deeponet based on the universal approximation theorem of operators. Nature
machine intelligence, 3(3):218–229.
[68] Ma, Y., Cao, Y., Vrudhula, S., and Seo, J.-s. (2017). Optimizing loop operation and
dataflow in fpga acceleration of deep convolutional neural networks. In Proceedings of
the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays,
FPGA ’17, page 45–54, New York, NY, USA. Association for Computing Machinery.
[69] Meech, J. T. and Stanley-Marbell, P. (2022). An algorithm for sensor data uncertainty
quantification. IEEE Sensors Letters, 6(1):1–4.
[70] Melexis (2018). mlx90640-library. https://github.com/melexis/mlx90640-library.
[71] Melexis (2019). MLX90640 32x24 IR array.
[72] Misyris, G. S., Venzke, A., and Chatzivasileiadis, S. (2020). Physics-informed neural
networks for power systems. In 2020 IEEE Power & Energy Society General Meeting
(PESGM), pages 1–5.
[73] Moin, A., Zhou, A., Rahimi, A., Menon, A., Benatti, S., Alexandrov, G., Tamakloe,
S., Ting, J., Yamamoto, N., Khan, Y., et al. (2021). A wearable biosensing system with
in-sensor adaptive machine learning for hand gesture recognition. Nature Electronics,
4(1):54–63.
[74] Moseley, B. (2021). harmonic-oscillator-pinn. https://github.com/benmoseley/
harmonic-oscillator-pinn.
[75] Moseley, B. (2022). Physics-informed machine learning: from concepts to real-world
applications. PhD thesis, University of Oxford.
[76] Moseley, B., Markham, A., and Nissen-Meyer, T. (2021). Finite basis physics-informed
neural networks (fbpinns): a scalable domain decomposition approach for solving differ-
ential equations.
[77] Moss, D. J. M., Nurvitadhi, E., Sim, J., Mishra, A., Marr, D., Subhaschandra, S., and
Leong, P. H. W. (2017). High performance binary neural networks on the xeon+fpga™
platform. In 2017 27th International Conference on Field Programmable Logic and
Applications (FPL), pages 1–4.
[78] Nakahara, H., Fujii, T., and Sato, S. (2017). A fully connected layer elimination for a
binarizec convolutional neural network on an fpga. In 2017 27th International Conference
on Field Programmable Logic and Applications (FPL), pages 1–4.
REFERENCES 85
[79] NXP Semiconductors (2021). I2C-bus specification and user manual. Rev. 7.0.
[80] Panasonic (2017). Infrared Array Sensor Grid-EYE (AMG88).
[81] Pappalardo, A. (2023). Xilinx/brevitas.
[82] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin,
Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison,
M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019).
Pytorch: An imperative style, high-performance deep learning library. In Advances in
Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc.
[83] Patterson, D. (2018). 50 years of computer architecture: From the mainframe cpu to
the domain-specific tpu and the open risc-v instruction set. In 2018 IEEE International
Solid - State Circuits Conference - (ISSCC), pages 27–31.
[84] Plotly Technologies Inc. (2015). Collaborative data science. https://plot.ly.
[85] Qasaimeh, M., Denolf, K., Lo, J., Vissers, K., Zambreno, J., and Jones, P. H. (2019).
Comparing energy efficiency of cpu, gpu and fpga implementations for vision kernels. In
2019 IEEE International Conference on Embedded Software and Systems (ICESS), pages
1–8.
[86] Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., Song,
S., Wang, Y., and Yang, H. (2016). Going deeper with embedded fpga platform for
convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International
Symposium on Field-Programmable Gate Arrays, FPGA ’16, page 26–35, New York, NY,
USA. Association for Computing Machinery.
[87] Raissi, M. and Karniadakis, G. E. (2018). Hidden physics models: Machine learning of
nonlinear partial differential equations. Journal of Computational Physics, 357:125–141.
[88] Raissi, M., Perdikaris, P., and Karniadakis, G. (2019). Physics-informed neural net-
works: A deep learning framework for solving forward and inverse problems involving
nonlinear partial differential equations. Journal of Computational Physics, 378:686–707.
[89] Rajkumar, R. R., Lee, I., Sha, L., and Stankovic, J. (2010). Cyber-physical systems:
The next computing revolution. In Proceedings of the 47th Design Automation Conference,
DAC ’10, page 731–736, New York, NY, USA. Association for Computing Machinery.
[90] Robert Bosch GmbH (2021). BNO055: Intelligent 9-axis absolute orientation sensor.
Rev. 1.8.
[91] Rohrhofer, F. M., Posch, S., and Geiger, B. C. (2021). On the pareto front of physics-
informed neural networks. CoRR, abs/2105.00862.
[92] Sahli Costabal, F., Yang, Y., Perdikaris, P., Hurtado, D. E., and Kuhl, E. (2020). Physics-
informed neural networks for cardiac activation mapping. Frontiers in Physics, 8:42.
[93] Savitzky, A. and Golay, M. J. E. (1964). Smoothing and differentiation of data by
simplified least squares procedures. Analytical Chemistry, 36:1627–1639.
86 REFERENCES
[94] Schwarting, W., Alonso-Mora, J., and Rus, D. (2018). Planning and decision-making
for autonomous vehicles. Annual Review of Control, Robotics, and Autonomous Systems,
1(1):187–210.
[95] Shen, J., Huang, Y., Wang, Z., Qiao, Y., Wen, M., and Zhang, C. (2018). Towards
a uniform template-based architecture for accelerating 2d and 3d cnns on fpga. In
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable
Gate Arrays, FPGA ’18, page 97–106, New York, NY, USA. Association for Computing
Machinery.
[96] Simonyan, K. and Zisserman, A. (2015). Very deep convolutional networks for large-
scale image recognition. In International Conference on Learning Representations.
[97] Stefan, J. (1891). Über die theorie der eisbildung, insbesondere über die eisbildung im
polarmeere. Annalen der Physik, 278(2):269–286.
[98] Strauss, W. (2008). Partial Differential Equations: An Introduction. Wiley.
[99] Suda, N., Chandra, V., Dasika, G., Mohanty, A., Ma, Y., Vrudhula, S., Seo, J.-s., and
Cao, Y. (2016). Throughput-optimized opencl-based fpga accelerator for large-scale
convolutional neural networks. In Proceedings of the 2016 ACM/SIGDA International
Symposium on Field-Programmable Gate Arrays, FPGA ’16, page 16–25, New York, NY,
USA. Association for Computing Machinery.
[100] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D.,
Vanhoucke, V., and Rabinovich, A. (2015). Going deeper with convolutions. In 2015
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–9, Los
Alamitos, CA, USA. IEEE Computer Society.
[101] Tessier, R., Pocek, K., and DeHon, A. (2015). Reconfigurable computing architectures.
Proceedings of the IEEE, 103(3):332–354.
[102] Tsoutsouras, V., Kaparounakis, O., Samarakoon, C., Bilgin, B., Meech, J., Heck, J.,
and Stanley-Marbell, P. (2022). The laplace microarchitecture for tracking data uncertainty.
IEEE Micro, 42(4):78–86.
[103] van Daalen, M., Jeavons, P., and Shawe-Taylor, J. (1993). A stochastic neural ar-
chitecture that exploits dynamically reconfigurable fpgas. In [1993] Proceedings IEEE
Workshop on FPGAs for Custom Computing Machines, pages 202–211.
[104] Venieris, S. I. and Bouganis, C.-S. (2016). fpgaconvnet: A framework for map-
ping convolutional neural networks on fpgas. In 2016 IEEE 24th Annual International
Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 40–47.
[105] Vipin, K. and Fahmy, S. A. (2018). Fpga dynamic and partial reconfiguration: A
survey of architectures, methods, and applications. ACM Comput. Surv., 51(4).
[106] Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau,
D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M.,
Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson,
E., Carey, C. J., Polat, İ., Feng, Y., Moore, E. W., VanderPlas, J., Laxalde, D., Perktold,
REFERENCES 87
J., Cimrman, R., Henriksen, I., Quintero, E. A., Harris, C. R., Archibald, A. M., Ribeiro,
A. H., Pedregosa, F., van Mulbregt, P., and SciPy 1.0 Contributors (2020). SciPy 1.0:
Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–
272.
[107] Wang, S., Wang, H., and Perdikaris, P. (2021). On the eigenvector bias of fourier
feature networks: From regression to solving multi-scale pdes with physics-informed
neural networks. Computer Methods in Applied Mechanics and Engineering, 384:113938.
[108] Wang, Y., Willis, S., Tsoutsouras, V., and Stanley-Marbell, P. (2019). Deriving
equations from sensor data using dimensional function synthesis. ACM Trans. Embed.
Comput. Syst., 18(5s).
[109] Willink, R. (2013). Measurement Uncertainty and Probability. Cambridge University
Press.
[110] Xiao, Q., Liang, Y., Lu, L., Yan, S., and Tai, Y.-W. (2017). Exploring heterogeneous
algorithms for accelerating deep convolutional neural networks on fpgas. In 2017 54th
ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1–6.
[111] Xilinx (2021). AXI IIC Bus Interface. v2.1.
[112] Zhang, C., Fang, Z., Zhou, P., Pan, P., and Cong, J. (2016). Caffeine: Towards
uniformed representation and acceleration for deep convolutional neural networks. In
2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages
1–8.
[113] Zhang, L., Yan, X., and Ma, D. (2022). A binarized neural network approach to
accelerate in-vehicle network intrusion detection. IEEE Access, 10:123505–123520.
[114] Zhou, A., Yao, A., Guo, Y., Xu, L., and Chen, Y. (2017). Incremental network
quantization: Towards lossless CNNs with low-precision weights. In International
Conference on Learning Representations.