0% found this document useful (0 votes)
113 views110 pages

Physics-Informed Neural Networks For Encoding Dynamics in Real Physical Systems

Uploaded by

Nhựt Lê
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views110 pages

Physics-Informed Neural Networks For Encoding Dynamics in Real Physical Systems

Uploaded by

Nhựt Lê
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

Physics-informed Neural Networks for

Encoding Dynamics in Real Physical


Systems
arXiv:2401.03534v1 [cs.LG] 7 Jan 2024

Hamza Sharaf F Alsharif

Department of Engineering
University of Cambridge

This dissertation is submitted for the degree of


Master of Philosophy

Girton College January 2024


Declaration

I hereby declare that except where specific reference is made to the work of others, the
contents of this dissertation are original and have not been submitted in whole or in part
for consideration for any other degree or qualification in this, or any other university. This
dissertation is my own work and contains nothing which is the outcome of work done in
collaboration with others, except as specified in the text and Acknowledgements. This
dissertation contains fewer than 15,000 words.

Hamza Sharaf F Alsharif


January 2024
Acknowledgements

I would like to thank my supervisor Professor Phillip Stanley-Marbell for having faith in me
and accepting me into his group, providing academic guidance, support, and encouragement
to pursue exciting work. Phillip has been an excellent supervisor and mentor who has given
me the freedom to explore and work on topics that interest me whilst also advising me on the
most optimal paths to take to achieve positive outcomes. I would also like to thank Professor
Suhaib Fahmy for his eagerness in co-supervising my project, for the valuable discussions
we’ve had on FPGAs and hardware, and for providing me with collaboration opportunities
that I otherwise would not have had. I would like to thank James for his advice on setting up
the parallel heating experiment, Janith for his advice on machine learning, and Chatura for
his advice on helping me getting started at the early stages of my project. I would like to
thank Vasilis and Orestis for being the first people I got in touch with in the group, and for
getting me excited to pursue work here. I would like to thank Hamid and Divya for being
the friendly faces from the group who I’d frequently see in the lab/department. The custom
copper nib used as the soldering iron heat source was designed by Ady Ginn. The ethernet
switch for the parallel heating experiment was provided by Barlow McLeod.
I would like to acknowledge the Saudi Arabian Cultural Bureau for funding my studies
here at Cambridge.
Special thanks to the department crew — Ismail, Alkausil, Ibtisam, Sohail, and Adil for
the chill conversations in the department and for the banter over Wednesday cakes. Thank
you to Yunwoo for always being around to talk and have dinner with during the late working
hours in the department. Special thanks to Faris for our daily lunches and for putting up
with me everyday. Big thank you to the Shbeeba/Shabashib especially Alwaleed, Hallamund,
Bunyamin, Marwan, Abdulkarim, Moez, Muheeb, Radwan, Ahmad, and Ahmad. You guys
made my experience here in Cambridge special so thank you bros. Big special thank you to
Omar, my brother away from home. Thank you for being there for the good and hard times,
God knows we’ve struggled through our degrees together. Thank you to Anas, my KFUPM
roommate and now brother-in-law for always asking about me, for being there to talk to, and
for helping me stay grounded. You’ve always been by my side as a dear friend, and now
you’re family. Thank you to my oldest friend Ahmad for always being there for me despite
vi

the distance, and for our late night conversations. God knows you’ve always been there for
me from the very beginning brother.
Thank you to my sister Jumana and my brothers Mohammed and Faisal, for your
continuous support and encouragement through all of this and for always praying for my
success.
Most of all, thank you to my parents for giving me everything I have in my life, for
your consistent support, for the prayers and encouragement, and for your endless love. This
dissertation is dedicated to you and I hope I can always make you proud.
Most importantly and before all of this, praise be to God, the Most Gracious, Most
Merciful, and Most Benificent. All my successes, blessings, and good fortunes are from Him,
and without His mercy, guidance, and support I am nothing.
Abstract

Predictive data-driven models are gaining widespread attention and are being deployed in
embedded systems within physical environments across a wide variety of modern technolo-
gies such as robotics, autonomous vehicles, smart manufacturing, and industrial controllers.
However, these models have no notion or awareness of the underlying physical principles
that govern the dynamics of the physical systems that they exist within. This dissertation
studies the encoding of governing differential equations that explain system dynamics, within
predictive models that are to be deployed within real physical systems. Based on this, we
investigate physics-informed neural networks (PINNs) as candidate models for encoding
governing equations, and assess their performance on experimental data from two different
systems. The first system is a simple nonlinear pendulum, and the second is 2D heat diffusion
across the surface of a metal block. We show that for the pendulum system the PINNs
outperformed equivalent uninformed neural networks (NNs) in the ideal data case, with
accuracy improvements of 18× and 6× for 10 linearly-spaced and 10 uniformly-distributed
random training points respectively. In similar test cases with real data collected from an
experiment, PINNs outperformed NNs with 9.3× and 9.1× accuracy improvements for
67 linearly-spaced and uniformly-distributed random points respectively. For the 2D heat
diffusion, we show that both PINNs and NNs do not fare very well in reconstructing the
heating regime due to difficulties in optimizing the network parameters over a large domain
in both time and space. We highlight that data denoising and smoothing, reducing the size
of the optimization problem, and using LBFGS as the optimizer are all ways to improve
the accuracy of the predicted solution for both PINNs and NNs. Additionally, we address
the viability of deploying physics-informed models within physical systems, and we choose
FPGAs as the compute substrate for deployment. In light of this, we perform our experiments
using a PYNQ-Z1 FPGA and identify issues related to time-coherent sensing and spatial data
alignment. We discuss the insights gained from this work and list future work items based on
the proposed architecture for the system that our methods work to develop.
Table of contents

List of figures xii

List of tables xvii

Nomenclature xviii

1 Incorporating Physics Knowledge in Computation 1


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Dynamical Systems and Differential Equations . . . . . . . . . . . . . . . 3
1.2.1 Linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Nonlinear equations . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Lack of Physics Understanding in Computation . . . . . . . . . . . . . . . 5
1.3.1 Context of physical signals . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 Information on dimensions and units . . . . . . . . . . . . . . . . . 6
1.3.3 Consideration of noise . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.4 Knowledge of physical laws and relationships . . . . . . . . . . . . 6
1.4 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Structural Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Background on PINNs and FPGAs 8


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Physics-informed Neural Networks . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Mathematical framework . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 FPGAs and Accelerator Architectures . . . . . . . . . . . . . . . . . . . . 12
2.3.1 Neural networks on FPGAs . . . . . . . . . . . . . . . . . . . . . 13

3 Predicting the Oscillation Angle of a Swinging Pendulum 16


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
x TABLE OF CONTENTS

3.2 Pendulum Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16


3.3 Ideal Pendulum Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.1 Data generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.2 Training setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Real Pendulum Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5 Hardware Block Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.2 Training setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6 Closing Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4 Predicting the Surface Temperatures Across a Metal Block During Heating 43


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Heat Diffusion Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 Block Heating Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3.2 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.3 Denoising strategy . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.4 Training setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4 Closing Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5 Parallel Hardware and Time-coherent Sensing 61


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 Parallel Capture Heating Experiment . . . . . . . . . . . . . . . . . . . . . 61
5.2.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2.2 Data alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.3 Closing Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6 Discussion and Future Work 72


6.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.3.1 Resolve optimization difficulties . . . . . . . . . . . . . . . . . . . 74
6.3.2 Resolve coherent sensing issues . . . . . . . . . . . . . . . . . . . 74
6.3.3 Study alternative PIML approaches . . . . . . . . . . . . . . . . . 76
TABLE OF CONTENTS xi

6.3.4 Investigate deployment . . . . . . . . . . . . . . . . . . . . . . . . 76


6.3.5 Extend Newton with dynamics constructs . . . . . . . . . . . . . . 76
6.3.6 Implement MPC . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

7 Conclusion 77

References 79
List of figures

2.1 Example of a PINN architecture based on the 2D heat equation using trainable
parameters θn . The left dashed box shows the neural network which predicts
the value of u given the training points to produce the data loss term. The
right dashed box shows the PDE residual corresponding to the heat equation,
composed from the differential terms. The differential terms are obtained
using automatic differentiation. The PDE residual forms the physics loss,
which is the distinguishing component of PINNs. . . . . . . . . . . . . . . 9

3.1 Illustrative diagram of a pendulum system. . . . . . . . . . . . . . . . . . . 17


3.2 Numerical solution of a pendulum system generated using Equations 3.4
and 3.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Numerical solution of a pendulum system generated using Equations 3.7
and 3.5, taking air resistance into consideration. A more realistic solution
would consider a smaller amount of damping over a longer interval, but for
our purposes this solution is sufficient. . . . . . . . . . . . . . . . . . . . . 20
3.4 PINN predictions on the synthetic data pendulum given 150 training points.
PINN architecture: 3 FC hidden layers with 32 neurons each. RMSE =
0.0068 as the PINN has no trouble fitting the data given a perfect setup. . . 22
3.5 Test RMSE values against training iterations of a PINN and an equivalent
NN, given 150 linearly spaced training points. Both models converge to an
accurate solution after approximately 1250 iterations, although the PINN
solution faces a spike that it overcomes during the optimization near the 300
iteration mark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.6 PINN and NN predictions on the data for the idealized pendulum using
5 linearly-spaced training points. The PINN is able to predict the correct
solution based on the physics loss, whereas the NN is only able to fit the
training data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
LIST OF FIGURES xiii

3.7 PINN and NN predictions on the synthetic data pendulum using 10 training
points for uniformly-distributed random data. The NN is trained for 150
iterations — its final state before predictions became unstable. The PINN is
trained for 2000 iterations. The PINN maintains a reasonable fit of the data
while the NN struggles due to the data’s irregularity. . . . . . . . . . . . . . 26
3.8 NN prediction when trained with 1000 adjacent points. The NN fails to
extrapolate the accurately predicted solution on the training points to the last
500 test points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.9 PINN predicted solution based on the first 5 points of the numerical solution.
The PINN consists of two hidden layers with 12 units in the first and 9 in
the second — corresponding to the 9th entry in Table 3.4a. Remarkably, the
PINN is able to accurately predict the solution despite being trained with
only the first 5 points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.10 PINN vs NN predictions on 100 linearly-spaced points with added Gaussian
noise with a mean of 0 and a standard deviation of 0.5. The PINN and NN
solutions are similar, although the PINN is slightly less impacted by the noise. 30
3.11 AXI IIC block design that we use for our experiments. The ZYNQ7 process-
ing system is the operating system side of the system-on-chip (SoC) FPGA,
on which the Python layer runs. The AXI IIC block is the direct interface
with the sensors through I2C. We configure the I2C clock frequency to be
1000 KHz. The AXI IIC block interfaces with the processing system through
the AXI interconnect block. Read and write commands are issued to the AXI
interconnect through the Python driver API. . . . . . . . . . . . . . . . . . 31
3.12 Pendulum oscillation data captured from the experimental setup shown in
Figure 3.13. The data has a much higher frequency than the numerically-
generated solution in Figure 3.3, although the sinusoidal nature is similar
enough for making comparisons. . . . . . . . . . . . . . . . . . . . . . . . 32
3.13 Experimental setup for the pendulum system. The BNO055 [90] is attached
to the pendulum mass. The PYNQ-Z1 board (on top of the desktop computer)
interfaces with the BNO055 through I2C [79]. . . . . . . . . . . . . . . . . 33
3.14 PINN predictions over the entire domain of the sampled training data. The
PINN fails to arrive at a valid solution due to the difficulty of optimizing
over a large domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.15 NN predictions over the entire domain of the sampled training data. Similarly
to the PINN the NN also fails to converge, although it is more flexible in its
predictive capability due to not being constrained by the physics loss term. . 35
xiv LIST OF FIGURES

3.16 NN predictions over the entire domain of the sampled training data, after
34380 iterations. The main difference from the predictions shown in Fig-
ure 3.15 is that we do not enforce any termination conditions, and instead
allow the training to run indefinitely. . . . . . . . . . . . . . . . . . . . . . 36
3.17 NN predictions based on training with 50 linearly-spaced points. The NN
solution misses the majority of the sinusoids as it is only able to fit data. . . 39
3.18 PINN predictions based on training with 50 linearly-spaced points. In con-
trast to the NN prediction in Figure 3.17, the PINN is able to capture the
sinusoids correctly due to physics loss term. . . . . . . . . . . . . . . . . . 40
3.19 NN predictions based on training with 50 uniformly-distributed points. The
NN fails to make reasonable predictions in areas with no training points. . . 40
3.20 PINN predictions based on training with 50 uniformly-distributed points.
The PINN maintains the trend of making valid predictions. . . . . . . . . . 41
3.21 NN and PINN RMSE values for linearly-spaced and uniformly-distributed
data, reported in Table 3.7. The PINN maintains a constant accuracy irre-
spective of the number of training points, while the NN fails as the points
get less. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.22 NN predictions based on adjacent points that comprise 40% of the problem
domain. The predictions become unstable outside of the training data region. 42
3.23 PINN predictions based on adjacent points that comprise 40% of the problem
domain. The predictions maintain stability just outside of the training data,
but fail to extrapolate for the rest of the domain. . . . . . . . . . . . . . . . 42

4.1 MLX90640 pixel RAM chess reading pattern configuration, borrowed from
the datasheet [71]. The highlighted cells correspond to a subpage and we
read one with each I2C transaction. The subpages get updated with new data
after each read. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Custom copper tip to improve surface contact for conduction. . . . . . . . . 46
4.3 Block heating experimental setup. The MLX90640 is held with an alligator
clip attached to a flexible helping hand. The custom solder tip is inserted
into the block from the side into a hole so that it fits in place during heating.
The sensor is directly connected to the FPGA. . . . . . . . . . . . . . . . . 47
4.4 Converted temperature measurements over time for 8 randomly-selected
pixels. The measurements are noisy and are also dominated by noise spikes. 48
4.5 3D plots of the heating profile at different instances in time. The plots show
valid temperature gradients over time. . . . . . . . . . . . . . . . . . . . . 49
LIST OF FIGURES xv

4.6 Frame visualisation of one of the temperature spikes shown in Figure 4.4.
These correspond to instances when the camera fails to capture valid frames. 50
4.7 Spike-filtered temperature time-series for four randomly-chosen pixels. The
regions near 340 and 470 seconds displayed high concentrations of spikes
across the pixels so were filtered out completely. . . . . . . . . . . . . . . . 52
4.8 Data for 4 random pixels after applying the Savitzky-Golay filter [93]. The
data retains small amounts of noise, although most of it has been smoothed out. 53
4.9 RMSE graph for a 2-layer 32-unit PINN and NN trained with Adam [52].
The training data is shown based on the raw and denoised data, and is
sampled from the full 768-pixel frames. Denoising certainly helps achieve
better training results, although there is a negligible difference between PINN
and NN training performance. . . . . . . . . . . . . . . . . . . . . . . . . 54
4.10 Reduced size frames at t = 433.43. Smaller frames are easier to train with
than larger ones. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.11 Reduced frame vs full frame training comparison. The minimum RMSE
is 23.67. Both denoising and reducing the frame size improve the training
performance, although the training accuracy remains to be satisfactory. . . . 56
4.12 LBFGS training evaluation for a 3-layer network with 64-32-32 units. The
minimum RMSE is 9.42. Using LBFGS improves training performance
considerably. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.13 Comparison between the test data and the predictions of NNs and PINNs
after training with 832 points. LS denotes linearly-spaced data and UD
denotes uniformly-distributed random data. i refers to the time sample index.
The NNs capture slight heating gradients in space, whereas the PINNs predict
almost constant temperatures for specific frames. . . . . . . . . . . . . . . 58

5.1 Experimental setup for the parallel heating experiment. We use 5 magnetic
alligator clamps to hold 5 MLX90640 thermal cameras, which are connected
to 5 PYNQ-Z1 FPGA boards. The cameras are pointed at the block so that
the block surface takes up the most area in the camera FOVs. . . . . . . . . 63
5.2 Time sample difference over the experiment duration for different cameras.
The time coherence between each camera reduces over time, and in the worst
case the difference is 6 seconds (cameras 2 and 4). . . . . . . . . . . . . . . 64
5.3 A comparison of the rectangular patch which we attribute as focusing in on
the same area between the two cameras. The temperature ranges and the
temperature grid are visually similar. . . . . . . . . . . . . . . . . . . . . . 66
5.4 Frame visualisations at time sample 500. . . . . . . . . . . . . . . . . . . . 67
xvi LIST OF FIGURES

5.5 Frame visualisations at time sample 3250. . . . . . . . . . . . . . . . . . . 68


5.6 Frame visualisations at time sample 7500. . . . . . . . . . . . . . . . . . . 69
5.7 Frame visualisations at time sample 15000. . . . . . . . . . . . . . . . . . 70
5.8 Histogram plots for the temperature difference between the pixels in the rect-
angular patch which corresponds to the same area in two different cameras (1
and 2). We can see that, even though a significant number of the differences
are near 0, the majority of them are to the right of the graph and have large
errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.1 Architectural diagram for our proposed system. A user encodes the differen-
tial equation for a system using a description language such as Newton [64].
A back-end compiler performs static analysis on the Newton description
to generate a PINN architecture, which can be trained offline using experi-
mental measurements taken from the system. The trained PINN can then be
synthesized onto an FPGA using high-level synthesis (HLS) tools. Finally,
the user can then integrate the FPGA with the synthesized model into the
system for real-time inference, and by extension control. . . . . . . . . . . 75
List of tables

3.1 PINN RMSE values for different variations of hidden layers and units in
each layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 NN RMSE values for different variations of hidden layers and units in each
layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 RMSE values for variations of numbers of variably-spaced training points. . 25
3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 RMSE values based on varying sizes of the domain. The time column is the
size of the domain in seconds. Nd and Nt are the number of train and test
points respectively. b is the learned value of the friction coefficient. For the
NN and PINN entries we report the RMSE on the first line and the iteration
number on the second line. We stop the training early if we observe the
RMSE value remaining constant for an extended number of iterations. . . . 35
3.6 PINN prediction RMSE values for the last three domain proportions shown
in Table 3.5, but with less data. Decreasing the amount of data enables more
accurate PINN models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.7 RMSE values for variations of numbers of training points. The PINN predic-
tions for both cases stay relatively consistent, whereas the NN predictions
fail as the number of points decreases. . . . . . . . . . . . . . . . . . . . . 38
3.8 RMSE values for percentages of adjacent points starting from t = 0. Both
the PINN and NN fail at predicting accurate solutions but the NN fails harder. 38

4.1 RMSE values based on varying the frame size. The first line in the NN and
PINN entries corresponds to the RMSE, and the second line corresponds to
the iteration number. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 RMSE values based on a variation of the number of linearly-spaced frames.
Nfr is the number of frames and Nd is the corresponding number of points. . 59
4.3 RMSE values based on a variation of the number of uniformly-distributed
points Nd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Nomenclature

Roman Symbols

a Linear acceleration

a(x) Coefficient function with an arbitrary variable

b Damping coefficient

c Specific heat

D n-dimensional region in space

D Arbitrary differential operator

F Force summation or vector field

f Arbitrary function

g Gravitational acceleration

H Amount of heat in calories

i Index specifier or imaginary number

k Thermal conductivity

L Loss

L Length

m Mass

N Nonlinear differential operator

N Number of instances in a dataset


Nomenclature xix

n Count specifier

n̂ Outward normal unit vector

R Set of real numbers

S Surface

s Arc length

T Time

t Temporal variable

u Differential equation solution function

v Burgers’ equation diffusivity coefficient

x Arbitrary or spatial variable in the horizontal axis

y Spatial variable in the vertical axis

Greek Symbols

α Heat equation diffusivity coefficient

β Heat equation diffusivity coefficient in y

λ Physics loss enforcement hyperparameter

ρ Material density

σ Standard deviation

σ (x) Source term

θ Neural network parameters

ϕ Angle of oscillation

Acronyms / Abbreviations

ADC Analogue-to-digital Converter

API Application Programming Interface

AXI Advanced Extensible Interface


xx Nomenclature

BNN Binarized Neural Network

BU Blow-up

CLB Configurable Logic Block

CNN Convolutional Neural Network

CONV Convolutional (layer)

CPU Central Processing Unit

DFS Dimensional Function Synthesis

DNN Deep Neural Network

ET Early Termination

FBPINN Finite-basis Physics-informed Neural Network

FC Fully-connected (layer)

FOV Field-of-view

FPGA Field-programmable Gate Array

FPS Frames per second

GPU Graphical Processing Unit

HLS High-level Synthesis

I2C Inter-integrated Circuit (protocol)

INQ Incremental Network Quantization

IP Intellectual Property

IR Infra-red

KdV Korteweg-De Vries (equation)

LBFGS Limited-memory Broyden–Fletcher–Goldfarb–Shanno (algorithm)

LCA Logic Cell Array

LS Linearly-spaced
Nomenclature xxi

ML Machine Learning

MLP Multi-layer Perceptron

MPC Model Predictive Control

NN Neural Network

ODE Ordinary Differential Equation

OPS Operations per Second

PDE Partial Differential Equation

PIML Physics-informed Machine Learning

PINN Physics-informed Neural Network

PL Programmable Logic

PR Partial Reconfiguration

PROM Programmable Read-only Memory

RMSE Root-mean-square error

SciML Scientific Machine Learning

SoC System-on-chip

SV D Singular Value Decomposition

T OPS Tera Operations per Second

T PU Tensor Processing Unit

UD Uniformly-distributed

VAE Variational Autoencoder


Chapter 1

Incorporating Physics Knowledge in


Computation

1.1 Introduction
Physical computation refers to computation that affects and is affected by physical quantities
in our natural world, such as temperature, pressure, velocity, etc. This type of computation
is prevalent within embedded systems (also referred to as cyber-physical systems) — self-
contained digital devices that are interconnected on a smaller scale than large workstations
and servers, and typically process information from real environments. Embedded systems
have become ubiquitous in modern society with computers being integrated into objects that
we interact with on a daily basis, as well as in modern applications that are increasing in
adoption such as autonomous vehicles, or digital manufacturing. However, the computers in
these technologies lack a fundamental understanding of the physical nature of the systems and
signals that they interact with. In other words, they lack computational abstractions [89] that
can be used to describe the environments that they exist within. The lack of computational
abstractions for physical systems can be classified under four different categories of relevance
to this dissertation:

1. Contextual information about what physical signals represent [3].

2. Information on the physical dimensions and units associated with signals [108].

3. Existence and characterisation of noise [69].

4. Knowledge of the physical laws and relationships that govern physical quantities [64].
2 Incorporating Physics Knowledge in Computation

We discuss these classifications in greater detail in Section 1.3. The focus of this
MPhil dissertation is on the fourth item. Specifically, we are interested in investigating and
developing methods for incorporating differential equations that govern dynamical systems,
into predictive models for the purpose of deployment within real physical systems. There
are tangible benefits to incorporating physics knowledge into predictive models deployed
at the edge. The main benefit is the exploitation of the wealth of information that can be
extracted from physical signals captured from different sensors. However, there are numerous
challenges associated with developing physics-aware compute systems. This dissertation
focuses on two of them.
The first is that it is more difficult to work with real-world data, as opposed to simulated
or idealized data, due to the aleatoric uncertainty [109] arising from noise. In a real-world
setting, noise can come from rapid fluctuations in the measurand, disturbances from the
measurement environment, or as a characteristic of the measurement instrument. The amount
of noise from each source varies depending on the system under investigation, and the effect
it has on model predictions must be considered.
The second challenge relates to the viability of deploying machine learning (ML) models
on the edge for embedded inference. ML models, and neural networks (NNs) in particular,
tend to have thousands to millions of parameters. AlexNet [56] for example has 60 million
parameters and 650,000 neurons. In most cases it is infeasible to store such large models in
embedded devices with limited amounts of memory. Additionally, the extensive amounts of
processing required for inference raises the issue of power consumption, an important factor
to consider for resource-constrained embedded systems running off of batteries.
These two challenges motivate the central research questions of this dissertation, which
are as follows:

1. How well do physics-informed models perform on data captured from real physical
setups in terms of predictive accuracy?

2. How viable is it to deploy physics-informed models on edge-based computers within a


real physical setup for real-time prediction?

To address the first question, the candidate model that we assess for this dissertation are
physics-informed neural networks (PINNs) [88]. For the second question, we investigate the
use of field-programmable gate arrays (FPGAs) as compute substrates for the models that we
develop.
1.2 Dynamical Systems and Differential Equations 3

1.2 Dynamical Systems and Differential Equations


Dynamics is the study of changing and evolving states. Consequently, dynamical systems are
ones that are characterised by state evolution. The complex behaviours within these systems
are governed and described by differential equations, which are derived through analyses
of how these behaviours evolve over time and through space. Differential equations act as
physical models for many of the phenomena that we observe in nature as well as in many
engineering systems that humans develop.
Mathematically, a differential equation is a relationship between a function of interest
and its derivatives with respect to one or more independent variables. Let u be an arbitrary
function and x be an independent variable. We define the general form of a differential as
follows:

D (u) = f (x) (1.1)

where D is an arbitrary differential operator, and f is a given function of x. A differential


equation with the general form in Equation 1.1, where u depends only on x is called an
ordinary differential equation (ODE).
If u depends on more than one independent variable and the equation includes partial
derivatives with respect to each, then the equation is a partial differential equation (PDE).
Let {x1 , x2 , ..., xn } be an arbitrary set of independent variables of length n. We define the
first order general form of a PDE as follows:

∂u ∂u ∂u
+ + ... + = f (x1 , x2 , ..., xn ) (1.2)
∂ x1 ∂ x2 ∂ xn
Finding solutions to differential equations implies finding the unknown function u in
terms of its independent variables In physical contexts it represents finding the expression
that describes the relationship between a set of physical quantities.
Differential equations can be classified as either linear or nonlinear. This classification
changes how the equation is solved. Let {a1 (x), a2 (x), ..., an (x)} be a set of coefficient
functions of x. We define the general form of a linear differential equation as follows [17]:

a1 (x) u(n) + a2 (x) u(n−1) + ... + an (x) u = f (x) (1.3)

Specifically, it must be the case that u and all of its derivatives have a power of 1, and that
the coefficients a1 (x), a2 (x), ..., an (x), and the function f (x) depend only on the independent
variable x. On the other hand, if a differential equation does not satisfy these conditions it
is considered to be nonlinear. For given initial and boundary conditions, linear differential
4 Incorporating Physics Knowledge in Computation

equations have known methods for finding analytical solutions such as separation of variables,
integrating factors, or trial solutions. Nonlinear equations, are much more difficult to deal
with and in most cases analytical solutions do not exist for them. In this case, one can only
resort to numerical methods for finding solutions.
The following sections list examples of prominent linear and nonlinear differential
equations in one dimension. It introduces the equations with brief mention of their physical
contexts, and denotes the variable definitions. For all of the equations x and t represent
position and time.

1.2.1 Linear equations


The wave equation [36] for wave-bearing systems:

∂ 2u 2∂ u
2
= c (1.4)
∂t 2 ∂ x2
u is the wave amplitude, and c is a constant representing the wave speed.
The diffusion equation [25] for diffusive processes such as heat conduction or Brownian
motion:

∂u ∂ 2u
=α 2 (1.5)
∂t ∂x
u is the concentration of the diffusing quantity, and α is the diffusivity coefficient.
The Laplace equation [35] for equilibrium processes and potential field distributions:

∇2 u = 0 (1.6)

u is the physical quantity under investigation.


The Poisson equation [35], which is the Laplace equation with a source term:

∇2 u = σ (x) (1.7)

σ (x) is the function representing the source term.

1.2.2 Nonlinear equations


The nonlinear Schrödinger equation [2] for wave propagation in quantum mechanics systems,
nonlinear optics, Bose-Einstein condensates, and dispersive water waves:

∂u
i + ∆u + |u|2 u = 0 (1.8)
∂t
1.3 Lack of Physics Understanding in Computation 5

u is the physical quantity under investigation, and i is the imaginary unit where i2 = −1.
The viscous Burgers’ equation [19], a simplified model for viscous fluid flow:

∂u ∂u ∂ 2u
+u =v 2 (1.9)
∂t ∂x ∂x
u is the fluid velocity, and v is a constant representing the diffusivity coefficient.
The Korteweg–De Vries (KdV) equation [54] for shallow water waves and some other
dispersive wave systems:

∂u ∂ u ∂ 3u
+ 6u + =0 (1.10)
∂t ∂ x ∂t 3
u is the wave amplitude/displacement.
For predictive models that are deployed within dynamical systems, there is currently
no support for programming constructs and abstractions for injecting knowledge about
governing differential equations. Developing such methods would be a significant step
towards computers that are more aware of the environments that they exist within, making
them more robust, adaptable, and reliable.

1.3 Lack of Physics Understanding in Computation


As mentioned in Section 1.1, we identify four ways in which physics knowledge is absent
and can be incorporated within computation.

1.3.1 Context of physical signals


Physical computation deals primarily with data that is sampled from sensors and corresponds
to signals captured from real-world environments. The data structures that represent signals
are treated as raw collections of integer or floating-point numbers without any context as-
sociated to them. As a result, there is no way to specify whether a given data collection in
memory corresponds to temperature measurements or accelerometer readings. We know that
temperature signals in Kelvin cannot fall below 0 and that accelerometer readings at rest on
Earth should be 9.8 m/s2 downwards on average. This specification is known and accounted
for by the programmer in the algorithmic treatment between different measurements, how-
ever if signal context specification was available as a feature of the compute system then
algorithmic treatment would be implicit and automatic.
6 Incorporating Physics Knowledge in Computation

1.3.2 Information on dimensions and units


Any measured physical quantity has dimensions associated with it, if the measurement is
not a ratio or relative quantity. The unit of a measurement can be found using dimensional
analysis. Many quantities that we can measure are derivations of the seven SI base quantities
defined by the Système international [14]. Their units are therefore derivations of the SI
base units. However, data streams coming from sensors do not contain information about
units, and in fact conversion routines must be performed to obtain the unit-calibrated data
from raw sensor outputs. Including support for dimensional information within computing
would allow for sensory signals to have the proper unit association, regardless of whether the
signals are base or derived quantities.

1.3.3 Consideration of noise


Noise is an inherent characteristic in the measurement of physical signals. It could be a
dominant property in the measurand itself, arising from the measurement environment, or
a prevalent feature of the measurement instrument. Commonly, the existence of noise is
attributed to all three sources. Modern compute systems treat signals as collections of definite
point values without consideration of the uncertainty arising due to noise. Signal uncertainty
is an important factor to keep track of to ensure reliability in computation, especially in
real-time systems where uncertain signals result in actuations that affect the real world.
Laplace is a microarchitecture that provides bit-level representations for uncertain data types
as well as microachitectural components for uncertainty propagation in arithmetic [102].
Such solutions are essential for the move towards trustworthy uncertainty-aware compute
systems.

1.3.4 Knowledge of physical laws and relationships


Signals are interrelated with each other through conservation laws, or invariant relationships.
As mentioned in section 1.2, dynamical systems are governed by differential equations.
Currently however, there is no support within embedded computers for their specification.
For example, for the design of a digital controller that regulates the temperature across a flat
surface - there is an absence of programmatic structures to specify that the thermal conduction
regime is governed by the 2D heat equation [63]. The state equation structures pertaining
to the controller design must therefore be explicitly hard-coded based on the programmer’s
understanding of the nature of the heating process.
1.4 Proposed Approach 7

1.4 Proposed Approach


There have been proposed solutions for encoding physical knowledge based on the afore-
mentioned points. A prominent one is Newton, a specification language for physics [64]. It
features a type system for describing physical signals and their units of measure, as well as
the invariant relationships and laws between them.
Newton focuses on dimensional information and physical laws, outlined in Sections 1.3.2
and 1.3.4. It also serves as a front-end for a back-end process called dimensional function
synthesis (DFS) [108] — which performs dimensional analysis to find the Pi groups of the
Buckingham-Pi theorem [18]. However, for the current implementation of Newton there is
no functionality for incorporating differential equations.
One recent popular approach for incorporating differential equations for dynamical
systems is physics-informed neural networks (PINNs) [88]. PINNs have gained significant
attention in the recent scientific machine learning (SciML) literature. Section 2.2 outlines
them in detail. However, the current literature only explores PINNs in their applications to
simulations and multi-physics models — which tend to run in high-performance computers.
Instead, we focus on using PINNs as predictive model bases to be deployed within real
physical systems. To do this we investigate systems that do not involve complicated dynamics
and do not require convoluted experimental setups, yet still provide us data with physical
significance. Therefore, we chose two systems: a swinging pendulum, and the heating of a
metal block.
We also address the issue of choosing an appropriate compute substrate for the PINNs
that we aim to deploy. For this, we propose to use field-programmable gate arrays (FPGAs),
due to the numerous advantages that they provide which are discussed in Section 2.3.

1.5 Structural Overview


Chapter 2 provides background on PINNs and FPGA-based NN accelerators. Chapter 3
presents the first case study system — a swinging pendulum. We observe how well PINNs
perform for predicting the pendulum’s angle of oscillation based on a physical setup, and
compare the predictive performance against a basic simulation of an ideal pendulum. Chap-
ter 4 presents the second case study system — heat diffusion across a metal block. We assess
whether PINNs are able to predict the surface temperatures across the block as it is being
heated, and present a method for denoising the thermal data. Chapter 5 discusses issues
related to parallel sensing. Chapter 6 provides a discussion on the work presented and future
research directions. Chapter 7 summarises and concludes this dissertation.
Chapter 2

Background on PINNs and FPGAs

2.1 Introduction
This topic of this dissertation lies at the intersection of two emerging sub-domains: scientific
machine learning (SciML) [11] and reconfigurable computing [101]. For the first we focus
on Physics-informed Neural Networks (PINNs), and for the second on FPGAs as a compute
substrate for model deployment. The following sections provide outlines on the focus areas,
and shed light on some prominent related works.

2.2 Physics-informed Neural Networks


PINNs [88] are models that incorporate physical laws through the inclusion of terms that
correspond to a system’s governing differential equation into the loss function. Training a
PINN corresponds to finding an approximate solution to the differential equation with the
aid of data that is adherent to the equation’s solution. From a mathematical perspective, the
training process is posed as a constrained optimisation problem where the governing equation
loss terms act as a soft constraint that restricts the solution space to physically-plausible
ones. From an ML perspective, the physics loss acts as a regularisation term that allows the
model to generalise using known physics, often enabling it to predict in regions outside of
the training domain. Figure 2.1 shows an example architecture of a PINN based on the heat
equation.
PINNs were first introduced in 1998 by Lagaris et al. [58], although they resurfaced
after their applicability to real problems was demonstrated by Raissi et al. [88]. Their
recent widespread popularity in the literature can be attributed to the adoption of ML
frameworks such as Tensorflow [1] and Pytorch [82] which provide automatic differentiation
2.2 Physics-informed Neural Networks 9

Fig. 2.1 Example of a PINN architecture based on the 2D heat equation using trainable
parameters θn . The left dashed box shows the neural network which predicts the value of
u given the training points to produce the data loss term. The right dashed box shows the
PDE residual corresponding to the heat equation, composed from the differential terms. The
differential terms are obtained using automatic differentiation. The PDE residual forms the
physics loss, which is the distinguishing component of PINNs.

engines (GradientTape and Autograd), as well as the rapid improvement in modern compute
infrastructure for network training.
Aside from the promise of generalisability [50] that they offer through the incorporation
of physical laws, there are additional reasons that motivate our decision to investigate PINNs
as a candidate architecture. These are as follows:

1. PINNs are resilient to noise [59]. This makes them a promising choice for deployment
within physical environments which are dominated by noise from different sources.

2. PINNs can be trained with less and more sparse data [8], and in some cases no data at
all, with the exception of initial and boundary points [48]. This is advantageous when
the sensing capabilities or the amount of data that can be gathered is limited.

3. PINNs are more computationally efficient than traditional numerical solvers, such as
finite differences or finite elements, due to not requiring a computational mesh [91].
They are also often more efficient than ordinary feed-forward neural networks since
they restrict the solution space to a subset of physically-plausible ones [50].
10 Background on PINNs and FPGAs

4. PINNs are convenient to implement and flexible, offering the capability of solving
forward and inverse problems based on the sample problem formulation, and almost
the same code implementation [27].

2.2.1 Mathematical framework


Let x be a spatial input variable where x ∈ Rn and n is the number of dimensions, t be a
temporal input variable, and u be a function in terms of both inputs representing the solution
of a PDE. PINNs come up with solutions to DEs by posing the following approximation:

u(x,t) ≈ f (x,t; θ ) (2.1)

where f is a neural network approximation of u based on network parameters θ . Specif-


ically, this approximate form is a solution to partial differential equations (PDEs) of the
general form:

ut + N (u) = 0 (2.2)

where ut is a time derivative of u, and N is a nonlinear differential operator. Therefore,


the PINN finds a solution based on the following approximation:

ft + N ( f ) = 0 (2.3)

The network loss L for a PINN can be found using the following equation:

L(θ ) = Ld (θ ) + L p (θ ) (2.4)

Ld is the data loss, which optimises to fit a set of data points that correspond to the true
solution, usually at the initial or boundary conditions. It is traditionally the singular loss
term that is used in neural networks. L p is the physics loss which places a soft constraint
on the network optimisation to obey the governing equation, and is consequently comprised
of the equation’s differential terms. Let {xd , td } be a set of Nd data input points for a set of
known output values {ud }. Additionally, let {x p , t p } be a set of collocation points within the
problem domain that are used to evaluate L p . Therefore, Ld and L p are mean squared error
losses denoted by Equations 2.5 and 2.6.

1 Nd
Ld (θ ) = ∑ | f (xdi ,tdi ; θ ) − uid |2 (2.5)
Nd i=1
2.2 Physics-informed Neural Networks 11

N
1 p
L p (θ ) = ∑ | ft (xip,t pi ; θ ) − N ( f (xip,t pi ; θ ))|2
N p i=1
(2.6)

The squared terms in Equation 2.6 correspond to the left-hand side of Equation 2.3. The
differential terms in Equation 2.6 are computed using automatic differentiation.

2.2.2 Related work


The recent literature on PINNs, and physics-informed machine learning (PIML) in general,
is wide and comprehensive, as the methods have successfully been shown to perform well
in many different scientific domains. Some of the highlighted reviews on physics-informed
machine learning include Karniadakis et al. [50], Baker et al. [11], and Hao et al. [44].
Cuomo et al. [27] provide an exhaustive review that focuses specifically on PINNs.
Cai et al. [21] review the application of PINNs to a number of inverse problems for
heat transfer. They show that PINNs show promising predictive capabilities in a simulation
setting for forced and mixed heat convection past a cylinder, as well as for two-phase Stefan
problems [97].
A few other scientific domains where PINNs have been investigated include fluid mechan-
ics [20], power systems [72], cardiac elecrophysiology [92], fiber optics [47], laser metal
deposition [61], and electromagnetics [51]. For all of these works, the authors have shown
that the embedding of domain-specific governing equations into the training process has
proven to improve generalised inference, with better predictions in regions with less or no
data.
In addition to surveying PIML methods and providing a categorisation of the different
ways in which physics is incorporated into the ML workflow (through the model architecture,
loss function, and through hybrid approaches), Ben Moseley’s thesis [75] tackles the chal-
lenge of using PIML techniques as tools to solve real-world large-scale scientific problems.
Moseley investigates the performance of PIML methods by posing the following tasks:

1. Using a variational autoencoder (VAE) [53] to find the physical factors that relate to
lunar thermodynamics from temperature measurements of the moon’s surface.

2. Filtering out noise from low-light images for visualisation of permanently-shadowed


regions on the lunar surface using a custom-built PIML algorithm.

3. Simulating complicated seismic wave phenomena using different physics-informed


deep learning models.
12 Background on PINNs and FPGAs

4. Investigating the scalability of PINNs to problems with large domains and high-
frequency components.

For these tasks, Moseley shows that the fine-tuned PIML techniques generally perform
well in terms of their ability to learn physical processes and solve complicated scientific
problems. There are still challenges, especially relating to scaling the methods to physical
systems with high-frequencies. To alleviate this issue, Moseley et al. proposes finite-basis
PINNs (FBPINN) [76], a domain decomposition approach for solving large-scale differential
equation problems.

2.3 FPGAs and Accelerator Architectures


Field-programmable gate arrays (FPGAs) are reconfigurable computer architectures made
up of large collections of digital logic gates, where the connections between the gates can
be customised for specific applications. FPGA reconfigurability implies that their hardware
designs (and thus the applications that they run) can be configured as many times as is required
by the user, and often during runtime through partial reconfiguration (PR) [105]. FPGAs have
been gaining increased popularity in their deployment as accelerator architectures over the
recent years, due to the advantages they provide in running hardware-optimised algorithms
for domain-specific computing. These advantages include:

1. Design of high throughput architectures, due to the inherent parallelism from synthe-
sizing processing elements across the space of the FPGA’s hardware resources.

2. Reconfigurability which allows for rapid design prototyping, enables future hardware
design updates, and provides the feature of switching between hardware applications
(through partial reconfiguration [105]).

3. Low processing latency from running directly on hardware, rather than passing through
software abstraction levels such as an operating system.

4. Energy-efficient architectures, due to the flexibility in synthesizing hardware to perform


only the necessary operations with as few overheads as possible. Qasaimeh et al. show
that FPGAs outperform CPUs and GPUs for more complicated vision tasks with a
1.2-22.3x energy reduction per frame [85].

In recent years there has been a shift away from general-purpose computing towards
domain-specific architectures [83], with highlight examples such as Google’s Tensor Pro-
cessing Unit (TPU) for accelerating deep neural networks (DNNs) [49]. This is mainly
2.3 FPGAs and Accelerator Architectures 13

attributed to the struggle of modern transistors to keep up with Moore’s law [33] and the
end of conventional Dennard Scaling [57]. Therefore, the new approach is to design acceler-
ator architectures for performing specialised tasks rather than relying on general-purpose
CPUs. FPGAs thrive in this new specialisation-focused compute paradigm, and are therefore
becoming more mainstream both in research and in commercial settings.

2.3.1 Neural networks on FPGAs


This section highlights some of the relevant research involving NN acceleration on FPGAs.
Implementing neural networks on FPGAs is not a novel idea with one of the earliest im-
plementations, GANGLION, dating back to 1992 [24]. GANGLION is a fully-connected
network architecture with 12, 14, 4 input, hidden, and output layer neurons implemented on
40 Xilinx XC3000 series logic cell array (LCA) chips (36 XC3090s and 4 XC3042s), and 18
2KB PROMs as lookup tables. This is in stark contrast to modern FPGA NN designs, which
are implemented on a single chip. The GANGLION architecture used different techniques to
efficiently conserve hardware, as well as to increase data throughput. These include splitting
8-bit multiplications into summations of two 8-bit by 4-bit partial products, using carry-save
and three-to-two reduction adders to avoid long carry propagation datapaths, and scaling
down 20-bit accumulation results to 11-bits for the activation function inputs. For image
segmentation tasks it achieves a data processing rate at the inputs of 240 MB/s corresponding
to 20 million pixels per second.
Early FPGA NN implementations [7, 9, 23, 38, 103], faced similar problems: limited
hardware resources due to the low number of configurable logic blocks (CLB) of early FPGAs,
complicated and large multiplication circuitry, and in some cases insufficient routing control
within high-level design software. Since then FPGA designs have advanced tremendously,
with modern implementations accommodating for state-of-the-art NN architectures with
millions of parameters (AlexNet [56], VGG-16 [96], GoogLeNet [100], etc.), and can
perform hundreds of billions of operations per second (OPS) [42]. Some designs reach peaks
of around 40 teraOPS (TOPS) [77].
FPGA NN inference accelerators focus on optimising for speed and energy efficiency.
Optimizing for speed enables for accelerators that can perform many inferences with real-
time performance. Energy efficiency is critical given that accelerators often run either within
small-scale embedded systems where the energy cost is constrained, or within large scale
data centers serving thousands of clients, where the energy cost is multiplicative. For a given
FPGA, an accelerator design is constrained by the amount of available hardware resources,
so area-efficient designs are favourable.
14 Background on PINNs and FPGAs

The following sections highlight two techniques for optimizing NN architectures for
FPGAs: quantization and weight pruning. The final section presents FINN [16], an end-to-
end framework for NN deployment on FPGAs.

Quantization

Quantization aims at reducing the size of the computation units, as well as the memory
and bandwidth requirements by narrowing the bit-width of the data, i.e. the weights and
activations. The trade-off here is between the accuracy degradation due to the loss of
precision, and the performance gain due to the quantization scheme.
Quantization is used in conjunction with fixed-point rather than floating-point data repre-
sentations. Quantized NN architectures often use fully 16-bit layers [40, 95, 104, 110, 112],
fully 8-bit layers [32, 41, 46], or a mix of 8-bit and 16-bit layers [66, 68, 99]. Additionally,
many accelerators [62, 77, 78, 113] implement Binarized NN (BNN) architectures with 1-bit
representations for all of the layers [45].
Qiu et al. introduce a dynamic approach to quantization where different layers and
feature map sets can have different fractional bit-lengths, based on an optimal quantization
configuration strategy [86]. However, for a given configuration the bit-widths are fixed, and
in their experiments they use 16-bit, 8-bit, and a mix of 8 and 4 bits for the weights in the
convolutional (CONV) and fully-connected (FC) layers respectively. They show that using
dynamic precision with 8-bits can restore the top-1 and top-5 accuracies to values marginally
less than the single-precision floating-point benchmark (1.52% loss for top-1, and 0.62% for
top-5 accuracies), as opposed to static precision 8-bits which suffered from high accuracy
degradation.
Abd El-Maksoud et al. use a 4-bit weight quantization in their GoogLeNet FPGA
accelerator [34]. They use incremental network quantization (INQ) [114], a post-training
quantization method which is partitions the weights into two groups, one to be quantized and
the other to be retrained. The two groups are switched afterwards. This is done iteratively
until the accuracy requirement is met. Abd El-Maksoud et al. show that using INQ in addition
to weights pruning allows them to reduce the CNN model by 57.6x, with their accelerator
achieving a classification rate of 25.1 FPS with 3.92 W of power [34].

Weight pruning

Denil et al. have shown that NN models are over-parameterized, and that in many cases only
a few of the weights are required to predict all the rest [30]. For NN models that are to be
deployed in embedded systems, this redundancy results in a waste of storage and computation
2.3 FPGAs and Accelerator Architectures 15

requirements. Weight pruning, or weight reduction, seeks to resolve the over-parameterization


of NN models by removing zero or small absolute value weights. This results in compact NN
models that require less storage and utilize less hardware resources, resulting in lower power
consumption. It also frees up space for additional processing for optimizing other areas of
NN accelerators. Song et al. present a three-step process for pruning NNs by first training
the network to identify the important connections, removing the unimportant connections,
and then retraining the network after the parameter reduction [43]. They show that their
method achieved a 9× and 13× reduction in parameters for AlexNet [56] and VGG-16 [96]
respectively, without loss of accuracy. Denton et al. apply compression techniques that
exploit the linear redundancy within CNNs to reduce the number of weights [31]. They use
matrix singular value decomposition (SVD) to approximate high order tensors into tensors
with lower dimensions. They achieved a 2 − 3× memory reduction for the first two layers of
a CNN, and 5 − 13× for the fully conected layers.

FINN framework

FINN1 is a full end-to-end framework for NN deployment on FPGAs [16]. It provides


features for a design-space exploration of mixed bit precisions for network weights, biases,
and activations. The front-end for FINN is a Pytorch library named Brevitas [81], that
supports post-training quantization and quantization-aware training. A user can set different
bit-widths for different layers and activations by defining a quantized version of a given
network. The user trains the quantized network, and benchmarks its accuracy against the
floating-point version, tweaking the bit-widths until the network accuracy is sufficient. The
quantized network can then be exported as an ONNX object [10], which can then be converted
into a dataflow accelerator hardware design using the FINN compiler. Blott et al. have shown
that NNs compiled by FINN have achieved 5 TOPS on an embedded platform, and 50 TOPS
for a datacenter implementation [16].

1 Named after the cat of one of the authors.


Chapter 3

Predicting the Oscillation Angle of a


Swinging Pendulum

3.1 Introduction
This chapter investigates the predictive capability of PINNs for a simple pendulum system.
First it introduces the dynamics of the nonlinear pendulum. Then it covers tests on the
predictive performance on an idealized version of the system using numerically-generated
data. This idealized case acts as a reference for the best-case accuracy. Then it outlines a real
experimental setup of the system, and discusses results obtained from it. For both of these
cases a PINN is benchmarked against a standard uninformed NN. Finally it closes with a
discussion on the results and insights gained from them.

3.2 Pendulum Dynamics


In the study of differential equations, the pendulum is often one of the first examples students
are introduced to of a dynamical system governed by a nonlinear differential equation [17].
Figure 3.1 shows an illustration of a pendulum system. m is the mass of the pendulum bob,
L is the length of the rod, f is the force from the rod, ϕ is the angle of oscillation, s is the
oscillation arc length, and g is gravitational acceleration. We can derive the differential
equation for the motion of the pendulum by applying Newton’s second law along the tangent
of the oscillation arc. In the following F is the summation of forces along the tangential axis,
and a is the linear acceleration:
3.2 Pendulum Dynamics 17

Fig. 3.1 Illustrative diagram of a pendulum system.

F = ma
−mg sin ϕ = m a
a = −g sin ϕ (3.1)

The negative sign in Equation 3.1 indicates that the pendulum is decelerating as it moves
towards the top of the arc. By using the equation for the arc length s we get:

s = Lϕ

d2s
a=
dt 2

d2ϕ
a=L (3.2)
dt 2
Substituting Equation 3.2 into Equation 3.1 gives us the differential equation for the
simple nonlinear pendulum, Equation 3.3:
18 Predicting the Oscillation Angle of a Swinging Pendulum

d2ϕ
L = −g sin ϕ
dt 2

d2ϕ g
+ sin ϕ = 0 (3.3)
dt 2 L

3.3 Ideal Pendulum Simulation


This section covers a formulation of an idealized setup based on a numerical solution
of Equation 3.3. The equation is difficult to solve due to the nonlinearity introduced by
sin ϕ [12], and its exact solution is expressed in terms of elliptic integrals [4]. The equation
becomes even more complicated when one takes air resistance into consideration [28], which
is a critical factor to consider for real systems. Therefore, we generate a simplified solution
using the Euler-Cromer method [26].

3.3.1 Data generation


We begin by defining a discretization scheme. Let {t1 ,t2 , ...,tn } be a discrete set of time
points where t ∈ R+ , and {ϕ1 , ϕ2 , ..., ϕn } be the corresponding set of angular displacements
where ϕ ∈ [−2π, 2π]. Additionally, we define the index set i = {1, 2, ..., n}. A time frame
∆t is the difference between ti+1 and ti . By defining a discretized expression for the angular
acceleration we obtain the following:

ϕ̇i+1 − ϕ̇i g
= − sin ϕi
∆t L

g
ϕ̇i+1 = ϕ̇i − sin ϕi ∆t (3.4)
L
Equation 3.4 is the approximate discretized solution for the angular velocity. A more
exact expression would necessitate that we take the sine of ϕi+1 , but at this stage we would
not have computed it yet. Fortunately for a small enough ∆t, ϕi+1 ≈ ϕi .
To get the angular displacement, we follow a similar approach based on discretization of
the angular velocity:
3.3 Ideal Pendulum Simulation 19

ϕi+1 − ϕi
ϕ̇i+1 =
∆t

ϕi+1 = ϕi + ϕ̇i+1 ∆t (3.5)

Equation 3.5 is the approximate discretized equation for the angular displacement. We
apply the Euler-Cromer method [26] through the usage of ϕ̇i+1 in Equation 3.5 instead of ϕ̇i ,
which allows the solution to maintain energy conservation.
To generate a solution that is similar to a real experiment, we generate 1500 linearly
spaced time points in the interval [0, 6] (∆t = 0.004 s), set the initial conditions to be
ϕ1 = − π2 rad and ϕ̇1 = 0 rad/s, and set g = 9.8 m/s2 and L = 0.325 m — the length of the
rod that we use for our experiment in Section 3.4. Figure 3.2 shows the numerical solution of
the angular displacement based on this setup.

Fig. 3.2 Numerical solution of a pendulum system generated using Equations 3.4 and 3.5.

The next step is to account for air resistance. The exact amount of air resistance acting
on the pendulum mass and the rod is dependant on many factors such as speed, surface
roughness, air density, and the object’s geometry. In our case, we use a simple air resistance
20 Predicting the Oscillation Angle of a Swinging Pendulum

model where we assume that the drag force is linearly proportional to the object’s speed with
a constant of proportionality b. Therefore the model becomes:

d2ϕ dϕ g
2
+b + sin ϕ = 0 (3.6)
dt dt L
Additionally, Equation 3.4 becomes:

g
ϕ̇i+1 = ϕ̇i − b ϕ̇i − sin ϕi ∆t (3.7)
L
We arbitrarily take b to be 0.001 since we do not have an exact value to rely on. This
gives us the data that we will use for training, shown in Figure 3.3.

Fig. 3.3 Numerical solution of a pendulum system generated using Equations 3.7 and 3.5,
taking air resistance into consideration. A more realistic solution would consider a smaller
amount of damping over a longer interval, but for our purposes this solution is sufficient.

3.3.2 Training setup


The essential element of PINNs is the inclusion of a physics-informed component in the loss
function based on the differential equation residual. For the pendulum data that we generated,
this corresponds to Equation 3.6. Thus, based on Equation 2.4 we get Equation 3.8 as the
3.3 Ideal Pendulum Simulation 21

loss function. λ p is a configurable hyperparameter that we introduce to enforce the strength


of the physics loss constraint. We fix its value to be 0.001.

N
1 Nd i 2 λp p i g
L= ∑ | f ϕ − ϕd | + ∑ |ϕ̈ f + b ϕ̇ if + sinϕ if |2 (3.8)
Nd i=1 N p i=1 L
We compare PINNs against ordinary NNs that do not include the physics loss component.
For the NN, we eliminate the second term in Equation 3.8. The code implementation
we develop for our evaluations is based partly on open-source repositories developed by
Moseley [74] and Bhustali [13], although we adapt the implementation to suit our particular
problem. For all our training cases we use a multi-layer perceptron (MLP) architecture [15]
with different network parameters, and we assess the predictive performance based on a
variation of these parameters. We run the training on a workstation running an Intel i7-7820X
16-core CPU, and an NVIDIA Quadro P1000 4 GB GPU.
Based on tests with different activation functions, we found that the sine function per-
formed the best for the pendulum system and so we use it for all of our training cases. Addi-
tionally, we found that the limited-memory Broyden–Fletcher–Goldfarb–Shanno (LBFGS)
algorithm [65] performed best for PINNs, and so we use it as our default optimizer. This
is corroborated by the usage of LBFGS in the PINN paper by Raissi et al. [88]. We use
the default tolerance values in Pytorch for the termination, so LBFGS stops when there is
nothing left to learn based on these tolerances. We fix the learning rate at 0.01.
For the training data, we test four different training variations: linearly-spaced, uniformly-
distributed random, adjacent, and noisy data. For the first three cases we vary the number of
training points, and in the fourth we vary the amount of noise.
Figure 3.4 shows the predictions of a 3-layer PINN with 32 neurons in each layer,
given 150 linearly spaced training points (Nd = 150) and 100 linearly spaced collocation
points (N p = 100). In the best possible case, one where we have enough data available
and a sufficiently expressive network architecture, the PINN is able to predict the entire
pendulum solution with high accuracy (RMSE = 6.8 × 10−3 ). Figure 3.5 shows the test
RMSE plotted against the training iterations for the PINN and the NN. We can see that the
NN also has no trouble fitting the data due to its abundance, with an RMSE of 4.0 × 10−3 . In
the coming sections we gradually make the training setup more difficult and compare the
PINN’s performance against that of an equivalent NN.

3.3.3 Results
The first thing to determine for both a PINN and an NN, is the smallest possible network
architecture size so that we can fix it for training. Tables 3.1 and 3.2 show the test RMSE
22 Predicting the Oscillation Angle of a Swinging Pendulum

Fig. 3.4 PINN predictions on the synthetic data pendulum given 150 training points. PINN
architecture: 3 FC hidden layers with 32 neurons each. RMSE = 0.0068 as the PINN has no
trouble fitting the data given a perfect setup.

values for the PINN and NN respectively based on a variation of the number of hidden layers
and number of units in each layer, after 2000 iterations. The blow-up (BU) entries correspond
to instances where the training failed due to exploding gradients, and the early termination
(ET) entries correspond to instances where the training ends due to the termination condition
of LBFGS. The first number in those entries is the last valid training iteration before blow-up,
and the second number is the final reported RMSE value. For both the PINN and the NN, the
network maintains a high level of accuracy even with small architectures. Therefore we fix
the architecture to be 3 layers with 5 units each, to keep the network small whilst maintaining
expressiveness for more difficult training cases.

Linearly-spaced data

Table 3.3a shows the RMSE values for different numbers of linearly spaced data points based
on the training configuration. The NN maintains good accuracy down to 25 points. Lower
than that, the RMSE values begin to suffer considerably. In contrast, the PINN maintains
good accuracy even with only 5 linearly-spaced training points. Figure 3.6 shows that the
physics loss term allows the network to accurately generalise across the entire domain, whilst
the uninformed NN is only able to fit the data.
3.3 Ideal Pendulum Simulation 23

Fig. 3.5 Test RMSE values against training iterations of a PINN and an equivalent NN, given
150 linearly spaced training points. Both models converge to an accurate solution after
approximately 1250 iterations, although the PINN solution faces a spike that it overcomes
during the optimization near the 300 iteration mark.

Uniformly-distributed random data

Here we randomly sample training points from a uniform distribution. We evaluate training
in a similar fashion to the linearly-spaced data case. Table 3.3b shows the results. The NN
accuracy drops significantly below the 50 point mark, and for 10 points and less the optimizer
stops early as the network is unable to learn anything. The equivalent PINN maintains good
accuracy for 2000 iterations down to 15 points. We repeated the PINN training runs for 10
and 5 points but this time allowing them to stop based the LBFGS termination condition. We
found that for 10 points the training stops after 5161 iterations with an RMSE of 0.1056, and
for 5 points it stops at 4083 iterations with an RMSE of 0.4432. By comparing Figure 3.6
with Figure 3.7, we can see that the data irregularity degrades the accuracy of both models,
although the PINN is still able to maintain a reasonable prediction of the true solution.

Adjacent points

For adjacent training points, we take a different approach to PINN evaluation. We do not
compare against an uninformed NN, as we have already shown that they fail in data-absent
24 Predicting the Oscillation Angle of a Swinging Pendulum

Layers
1 2 3 4 5
Units
32 0.2914 0.0126 0.0066 0.0067 0.0076
16 0.6960 0.0134 0.0067 0.0067 0.0067
8 0.6987 0.0233 0.0133 0.0072 0.0070
BU: 435 BU: 300
5 0.6536 0.2190 0.0184
L: 0.7278 L: 0.7732
BU: 784
4 0.6957 0.0361 0.0086 0.2391
L: 0.6862
BU: 835
3 0.7448 0.0256 0.0286 0.0333
L: 0.6757
Table 3.1 PINN RMSE values for different variations of hidden layers and units in each layer.

Layers
1 2 3 4 5
Units
BU: 636 ET: 1914 ET: 1423
32 0.0059 0.0037
L: 0.7376 L: 0.0066 L: 0.0037
16 0.6615 0.0087 0.0033 0.0071 0.0054
8 0.6610 0.0101 0.0173 0.0067 0.0056
ET: 1681 ET: 1763
5 0.0227 0.0084 0.0044
L: 0.6626 L: 0.0031
ET: 1560
4 0.0168 0.0125 0.00158 0.0092
L: 0.7240
ET: 1915 BU: 273 BU: 266
3 0.0473 0.0141
L: 0.7692 L: 0.7277 L: 0.7786
Table 3.2 NN RMSE values for different variations of hidden layers and units in each layer.

regions. Figure 3.8 shows a further example for this. Instead, we focus on the PINN for
the test cases. Due to the difficulty of predicting outside of the training distribution even
with the aid of physics knowledge, we deviate away from the default training configuration.
Instead we configure the network parameters to show that given the right conditions it is
possible for a PINN to predict outside of the training distribution. Additionally, for all of
the configurations we either allow the network to run until the termination condition, or
stop the training early once a satisfactory test accuracy is achieved. Table 3.4a reports the
configurations, losses, and iteration numbers. Based on the data, we observe that given the
right architecture it is possible for PINNs to make predictions outside of the training set. The
accuracy of the prediction shown in Figure 3.9 further emphasizes this point. Additionally,
all of the architectures that have been proven to be successful have a low memory footprint,
consisting of 2-3 hidden layers with less than 10 units each for most cases.
3.3 Ideal Pendulum Simulation 25

Nd NN PINN Nd NN PINN
100 0.0051 0.0177 100 0.0093 0.0118
50 0.0044 0.0246 50 0.0181 0.0084
25 0.0136 0.0084 25 0.5434 0.0424
ET: 1131 ET: 1468
15 0.0109 15 0.0292
L: 0.2904 L: 0.5097
ET: 1021 ET: 152
10 0.0756 10 0.1488
L: 1.3906 L: 1.0219
ET: 612 ET: 956
5 0.0470 5 0.4922
L: 1.1184 L: 0.9285
(a) Linearly-spaced points. (b) Uniformly-distributed points.

Table 3.3 RMSE values for variations of numbers of variably-spaced training points.

Noisy data

This training case involves taking 100 linearly-spaced points and adding Gaussian noise to
them. The Gaussian has a mean of 0 and variable standard deviation. We fix the architecture
at 3 hidden layers with 5 units each, as per the default configuration. Table 3.4b reports
the RMSEs and iteration numbers. The PINN outperforms the NN in all cases, but only
marginally so in cases with less noise thresholds. In general, we can see that both the PINN
and NN tend to be impacted with large amounts of noise, although the PINN is better at
adapting to it. The predictions in Figure 3.10 show this to be the case, where the shape of the
solution for the NN deviates away from a sinusoid, whereas the PINN solution maintains
its sinusoidal behaviour. The difficulty of making accurate predictions in the presence of
noise puts forward a strong case for finding efficient ways to denoise data or to model the
uncertainty arising from noise.
26 Predicting the Oscillation Angle of a Swinging Pendulum

Fig. 3.6 PINN and NN predictions on the data for the idealized pendulum using 5 linearly-
spaced training points. The PINN is able to predict the correct solution based on the physics
loss, whereas the NN is only able to fit the training data.

Fig. 3.7 PINN and NN predictions on the synthetic data pendulum using 10 training points
for uniformly-distributed random data. The NN is trained for 150 iterations — its final state
before predictions became unstable. The PINN is trained for 2000 iterations. The PINN
maintains a reasonable fit of the data while the NN struggles due to the data’s irregularity.
3.3 Ideal Pendulum Simulation 27

Fig. 3.8 NN prediction when trained with 1000 adjacent points. The NN fails to extrapolate
the accurately predicted solution on the training points to the last 500 test points.
28 Predicting the Oscillation Angle of a Swinging Pendulum

Points Loss Arch Iters Std. Dev. NN PINN


750 0.0927 5-5 2852 576 978
1.5
500 0.0672 5-5 5558 0.6188 0.4698
300 0.0659 5-5 3205 615 742
1.0
200 0.0910 4-4 2417 0.4024 0.3112
100 0.0364 6-6-4 1432 716 976
0.7
50 0.0471 5-6-5 1768 0.2598 0.2126
25 0.0338 10-6 3912 1696 4022
0.5
10 0.0582 12-9 5572 0.1791 0.1583
5 0.0523 12-9 5134 1342 3070
0.3
3 0.0714 12-9 6664 0.1269 0.0940
1 0.0458 5-9-3 1480 997 2192
0.2
(a) PINN evaluation runs for a decreasing 0.0791 0.0624
number of adjacent points. The numbers in 2104 2692
0.1
the architecture column refer to the units in 0.0432 0.0336
each layer, so 6-6-4 means 6 units in the first (b) Loss values for the NN and PINN in the
and second layer and 4 in the third. case of noisy data. The additive noise values
are sampled from a Gaussian distribution with
zero mean and variable standard deviations
shown in the first column. For each entry, the
first number indicates the optimal iteration
and the second number indicates the corre-
sponding RMSE.
Table 3.4
3.3 Ideal Pendulum Simulation 29

Fig. 3.9 PINN predicted solution based on the first 5 points of the numerical solution.
The PINN consists of two hidden layers with 12 units in the first and 9 in the second —
corresponding to the 9th entry in Table 3.4a. Remarkably, the PINN is able to accurately
predict the solution despite being trained with only the first 5 points.
30 Predicting the Oscillation Angle of a Swinging Pendulum

Fig. 3.10 PINN vs NN predictions on 100 linearly-spaced points with added Gaussian noise
with a mean of 0 and a standard deviation of 0.5. The PINN and NN solutions are similar,
although the PINN is slightly less impacted by the noise.
3.4 Real Pendulum Experiment 31

3.4 Real Pendulum Experiment


This section presents an experiment evaluating PINNs on real-world data for the pendulum
system.

3.5 Hardware Block Design


We chose to use PYNQ as the framework of choice for the pendulum experiment outlined
in this section as well as the heat diffusion experiment in Section 4.3. Based on this choice,
we used the PYNQ-Z1 board as the hardware platform. The advantage of this is that we
were able to conveniently interface with the hardware block design from the high-level
Python-based API, without the need to dive deep into the low-level hardware. Figure 3.11
shows the block design.

Fig. 3.11 AXI IIC block design that we use for our experiments. The ZYNQ7 processing
system is the operating system side of the system-on-chip (SoC) FPGA, on which the Python
layer runs. The AXI IIC block is the direct interface with the sensors through I2C. We
configure the I2C clock frequency to be 1000 KHz. The AXI IIC block interfaces with the
processing system through the AXI interconnect block. Read and write commands are issued
to the AXI interconnect through the Python driver API.

3.5.1 Experimental setup


We perform the experiment using a variable g pendulum with the oscillation plane adjusted
at 0°, i.e. the plane of oscillation is perpendicular to the ground. We use the PYNQ-
Z1 FPGA as the sensing platform, and measure angular displacement using the BNO055
32 Predicting the Oscillation Angle of a Swinging Pendulum

absolute orientation sensor [90]. The BNO055 contains an accelerometer, a gyroscope,


and a magnetometer. It combines data from all three to calculate the absolute Euler angle
orientation through a proprietary sensor fusion algorithm. It uses the Inter-Integrated Circuit
(I2C) protocol [79] for communication. The PYNQ-Z1 processing system communicates
with the sensor by accessing data that is captured by an AXI IIC IP block [111] on the
FPGA programmable logic (PL) fabric. Figure 3.11 shows the hardware block design. The
PYNQ subsystem displays the sensor values through the Python API. Figure 3.13 shows the
experimental setup.
Before capturing data, we ensure that the BNO055 is calibrated to filter out offsets with
the three internal sensors. We do this by moving the BNO055 in random directions, and in a
figure-eight motion. After this, we begin the experiment by first moving the pendulum mass
towards the left until it is parallel to the ground. We start recording the data from the sensor
and release the mass, allowing it oscillate freely. We record until the oscillation nearly stops.
This was after 241 seconds. The data obtained still contains an offset, so we remove it by
subtracting the midpoint value from the signal. Figure 3.12 shows the sensor data. The data
shows a slight downwards trend which we attribute to the residual gyroscope drift which was
not filtered by the fusion algorithm.

Fig. 3.12 Pendulum oscillation data captured from the experimental setup shown in Fig-
ure 3.13. The data has a much higher frequency than the numerically-generated solution in
Figure 3.3, although the sinusoidal nature is similar enough for making comparisons.

3.5.2 Training setup


The training setup is similar to the one in Section 3.3.2. In this scenario we also solve an
inverse problem. Rather than keeping the friction coefficient parameter b constant, we assess
whether the PINN can discover its value by setting it as a trainable parameter. We report on
its value during PINN training runs. Our default configuration is an MLP architecture with 3
hidden layers and 32 neurons in each. We use Equation 3.8 as our PINN loss component,
with 8000 collocation points. We choose LBFGS as the optimizer, but this time we run it for
3.5 Hardware Block Design 33

Fig. 3.13 Experimental setup for the pendulum system. The BNO055 [90] is attached to the
pendulum mass. The PYNQ-Z1 board (on top of the desktop computer) interfaces with the
BNO055 through I2C [79].

as many iterations as is necessary and report this value. This is to overcome the spectral bias
issue that we discuss in Section 3.5.3. Once again, we use sine as the activation function. We
use a λ p value of 0.1 and a learning rate of 0.05.
34 Predicting the Oscillation Angle of a Swinging Pendulum

3.5.3 Results
We evaluate training cases similar to the ones in Section 3.3.3. First we outline a problem
encountered when training over large domains, and find an appropriate domain size for
the training cases. Then we evaluate network performances after 50000 iterations using
linearly-spaced, random uniformly-distributed, and adjacent data.

Training over a large domain

In the initial attempt to train the data, the networks failed to converge to an accurate solution
when the data was sampled over the entire time domain of oscillation. Figures 3.14 and 3.15
show this problem for the PINN and NN respectively. Both networks meet their termination
conditions at just over 2000 iterations, as the optimization algorithm is not able to learn
anything further. This relates to a common problem that NNs, and particularly PINNs, face.
They are difficult to train for high-frequency features or solutions [107]. This is due to the
spectral bias of NNs, where the rate of convergence for low-frequency loss components
is much faster than that of high-frequency ones. Moseley et al. proposed FBPINNs as a
solution to this problem [76], but for our purposes we choose a simpler approach. Increasing
the size of the domain is equivalent to increasing the solution’s frequency. Therefore we
gradually decrease the domain size to find an appropriately-sized problem that we can fix for
the training cases.

Fig. 3.14 PINN predictions over the entire domain of the sampled training data. The PINN
fails to arrive at a valid solution due to the difficulty of optimizing over a large domain.

We keep the network structures fixed and vary the domain proportion of the data that we
use for training. For all of the data proportions, we use linearly-spaced data with a spacing of
7 and 23 points for train and test datasets respectively. In contrast to the train and test points,
we do not constrain the collocation points to be within the domain proportion that we evaluate.
This yielded the best results, and is reasonable choice given that collocation points can be
3.5 Hardware Block Design 35

Fig. 3.15 NN predictions over the entire domain of the sampled training data. Similarly to the
PINN the NN also fails to converge, although it is more flexible in its predictive capability
due to not being constrained by the physics loss term.

evaluated outside of the problem scope if the governing equation is known. Additionally we
set both the gradient and function value/parameter tolerance parameters (tolerance_grad
and tolerance_change) to 0, thereby eliminating the termination condition and allowing
LBFGS to train indefinitely until the maximum number of iterations. This allowed the
optimizer to overcome a training rut where it mistakenly assumes that there is nothing left
to learn. We run the optimizer for as many iterations as required and report the findings in
Table 3.5.

Domain Proportion Time (s) Nd Nt NN PINN b


0.0227 0.1095
0.1 24.33 1429 435 0.0587
30000 30000
0.0259 0.1482
0.2 48.62 2858 870 0.0324
29895 55000
0.0324 0.2811
0.4 97.37 5715 1740 0.0212
27930 67005
0.0414 0.7250
0.6 145.90 8572 2609 0.0002
35805 240
0.0500 0.6528
0.8 194.33 11429 3479 0.0000
39615 135
Table 3.5 RMSE values based on varying sizes of the domain. The time column is the size
of the domain in seconds. Nd and Nt are the number of train and test points respectively. b
is the learned value of the friction coefficient. For the NN and PINN entries we report the
RMSE on the first line and the iteration number on the second line. We stop the training early
if we observe the RMSE value remaining constant for an extended number of iterations.
36 Predicting the Oscillation Angle of a Swinging Pendulum

Fig. 3.16 NN predictions over the entire domain of the sampled training data, after 34380
iterations. The main difference from the predictions shown in Figure 3.15 is that we do not
enforce any termination conditions, and instead allow the training to run indefinitely.

The results show that the NN is eventually able to overcome the spectral bias issue for all
domain proportions if it is allowed to run indefinitely. We re-ran the training case with the
entire domain for the NN, but with indefinite training, and found that it is able to converge to
an accurate solution after 34380 iterations with an RMSE of 0.0538. Figure 3.16 shows this
solution. The PINN on the other hand does not perform as well, and is only able to converge
to a reasonable solution for 20% of the domain size.
After a careful investigation of this problem, we found that the cause of it is the large Nd
values for each training case. PINNs can often make more accurate predictions if they are
provided with less training data. This is because of the data loss term Ld dominating over the
physics loss term L p , causing it to become less flexible at adapting to the underlying physics
of the problem. Therefore we re-ran the last three PINN evaluations in Table 3.5 with less
data. The tables in 3.6 show the results based on a variation of the number of training points.
By comparing the RMSE values in Table 3.5 with the ones in Table 3.6, we observe that using
less training points allows the PINN to make more accurate predictions after training. More
expressive architectures and an extensive hyperparameter grid search would be required to
find models that show significant increases in accuracy. For our purposes, we fix the domain
size at 20% to allow for training flexibility for the test cases.

Linearly-spaced data

Table 3.7a shows the training results based on a variation of the number of linearly-spaced
training points. The PINN maintains its accuracy even with the reduction Nd , while the NN
suffers considerably after 167 points. Figure 3.21 shows this trend visualized, where we
3.5 Hardware Block Design 37

Nd RMSE Iters b Nd RMSE Iters b


223 0.1923 75000 0.0180 250 0.5598 70575 0.0308
334 0.1967 74910 0.0214 500 0.3257 75000 0.0307
500 0.2801 74775 0.0185 1000 0.3340 75000 0.0238
(a) Domain proportion = 0.4 (b) Domain proportion = 0.6
Nd RMSE Iters b
334 0.5111 65069 0.0212
667 0.3672 74804 0.0302
889 0.3999 75000 0.0471
(c) Domain proportion = 0.8
Table 3.6 PINN prediction RMSE values for the last three domain proportions shown in
Table 3.5, but with less data. Decreasing the amount of data enables more accurate PINN
models.

see that the NN predictions fail for less data points and the PINN predictions stay relatively
consistent. A closer inspection into the predicted solutions in Figures 3.17 and 3.18, allows us
to see that similar to the ideal case predictions in Section 3.3.3, the PINN is able to regularise
the solution according to physics whereas the NN is only able to fit the data points.

Uniformly-distributed random data

Table 3.7b shows the training results based on a variation of the number of uniformly-
distributed training points. Similar to the case with linearly-spaced points, the PINN maintains
its accuracy regardless of Nd whereas the NN does not. However the NN suffers more in the
case of uniformly-distributed points than it does for linearly-spaced points, as we show in
Figure 3.21. By comparing Figures 3.19 and 3.20, we observe the continuing trend of NNs
fitting data points in contrast with PINNs that are able to regularise based on the governing
equation.

Adjacent points

Table 3.8 shows the results for training based on adjacent points, taken as percentages of
the problem domain. In contrast to the previous evaluations, here both the PINN and the
NN fail at predicting the solution outside of the training data, even though the NN suffers
more significantly. Figure 3.23 shows that the PINN predictions maintain a semblance of the
physical behaviour just outside of the training points, but then fall to 0 after that. The NN
predictions on the other hand, shown in Figure 3.22, maintain no semblance of the governing
physics and are in ranges that are entirely outside of physical plausibility.
38 Predicting the Oscillation Angle of a Swinging Pendulum

Nd NN PINN b Nd NN PINN b
1000 0.0245 0.1357 0.0346 1000 0.0319 0.1394 0.0292
500 0.0231 0.1427 0.0284 500 0.0512 0.1428 0.0273
334 0.0253 0.1636 0.0042 334 0.1596 0.1491 0.0293
250 0.2142 0.1344 0.0288 250 0.1012 0.1514 0.0237
200 0.1110 0.1381 0.0336 200 0.6762 0.1724 0.0204
167 0.0408 0.1772 0.0374 167 1.0134 0.2131 0.0198
143 0.7083 0.3675 0.0468 143 0.7937 0.1942 0.0182
125 0.8337 0.1718 0.0362 125 1.3762 0.2154 0.0157
100 1.3689 0.1661 0.0334 100 2.3499 0.2020 0.0165
84 1.3930 0.1838 0.0342 84 2.0139 0.2046 0.0200
67 2.1644 0.2325 0.0350 67 3.3333 0.3644 0.0142
50 1.3206 0.1828 0.0350 50 2.9213 0.2221 0.0170
(a) Linearly-spaced points. (b) Uniformly-distributed points.
Table 3.7 RMSE values for variations of numbers of training points. The PINN predictions
for both cases stay relatively consistent, whereas the NN predictions fail as the number of
points decreases.

Train Percent NN PINN b


80 2.3288 0.3474 0.0381
60 4.6108 0.5218 0.0514
40 6.3657 0.6498 0.0653
20 6.2091 0.8083 0.1065
Table 3.8 RMSE values for percentages of adjacent points starting from t = 0. Both the PINN
and NN fail at predicting accurate solutions but the NN fails harder.

3.6 Closing Remarks


In this chapter we have analysed the predictive performance and trainability of PINNs against
standard NNs for the prediction of a pendulum’s oscillation angle, in an ideal simulated
scenario and using real sensor data from an experiment. We have shown that in most cases,
for both the simulated and real system, the PINN is able to regularise the the solution
according to physical principles resulting in accurate predictions in low-data scenarios. This
includes extrapolating the solution outside of the training data in the simulated case, since the
numerical solution directly adheres to the governing equation. We hypothesize that, if given
a more accurate description of the governing equation for the real system, PINNs should also
be able to extrapolate outside of the training domain for the real system as well. In contrast,
standard NNs fail to predict an accurate solution in the absence of sufficient data, which
highlights a flaw in uninformed deep learning approaches where the NN is treated as a black
3.6 Closing Remarks 39

Fig. 3.17 NN predictions based on training with 50 linearly-spaced points. The NN solution
misses the majority of the sinusoids as it is only able to fit data.

box. The pendulum system has served as a simple but effective illustrative example for the
effectiveness of incorporating physics knowledge into deep learning for physical systems.
40 Predicting the Oscillation Angle of a Swinging Pendulum

Fig. 3.18 PINN predictions based on training with 50 linearly-spaced points. In contrast to
the NN prediction in Figure 3.17, the PINN is able to capture the sinusoids correctly due to
physics loss term.

Fig. 3.19 NN predictions based on training with 50 uniformly-distributed points. The NN


fails to make reasonable predictions in areas with no training points.
3.6 Closing Remarks 41

Fig. 3.20 PINN predictions based on training with 50 uniformly-distributed points. The
PINN maintains the trend of making valid predictions.

Fig. 3.21 NN and PINN RMSE values for linearly-spaced and uniformly-distributed data,
reported in Table 3.7. The PINN maintains a constant accuracy irrespective of the number of
training points, while the NN fails as the points get less.
42 Predicting the Oscillation Angle of a Swinging Pendulum

Fig. 3.22 NN predictions based on adjacent points that comprise 40% of the problem domain.
The predictions become unstable outside of the training data region.

Fig. 3.23 PINN predictions based on adjacent points that comprise 40% of the problem
domain. The predictions maintain stability just outside of the training data, but fail to
extrapolate for the rest of the domain.
Chapter 4

Predicting the Surface Temperatures


Across a Metal Block During Heating

4.1 Introduction
In this chapter, we study the performance of PINNs for a slightly more complicated dynamical
system — heat diffusion across the 2D surface of a metal block. The setup that we use
for our experiment is irregular and does not necessarily adhere to a perfect physical model
or governing equation. Additionally, we use a non-uniform scrap metal block that we do
not have ground-truth reference values for its physical parameters (thermal conduction,
density, emissivity, etc.), so we use best-guess values for the coefficients. The sensor that
we use is inexpensive and highly susceptible to noise. Therefore instead of running an
idealized simulation as a benchmark, we immediately start by analysing the experimental
data. Therefore, we are evaluating PINNs on real data in a situation that is dominated by
significant amounts of noise, and where the exact physics is unknown. First we go over the
dynamics of heat diffusion by showing a derivation of the heat equation. Then we outline
our experimental procedure for data collection, denoising, and training. Finally we end the
chapter with some closing insights based on the results.

4.2 Heat Diffusion Dynamics


The heat equation [63], also referred to as the heat diffusion or heat conduction equation, is
commonly one of the first PDEs that students are introduced to [37]. It provides a physical
interpretation for the dynamics of heat transfer across space through the process of conduction.
44 Predicting the Surface Temperatures Across a Metal Block During Heating

We outline its derivation which we borrow jointly from Strauss [98] and a PDEs course
handout from Stanford University [60].
First we consider a region D ∈ Rn where n is the number of dimensions. Let x =
[x1 , ..., xn ]T be a spatial vector in Rn , and let u(x, t) be the temperature at point x and time
t. Additionally, let c be the specific heat of the material of region D and ρ its density. We
express H(t), the total amount of heat in calories contained in D as follows:
Z
H(t) = c ρ u(x, t) dx
D
By considering the change in heat we get the following (note the time derivative of u):

dH
Z
= c ρ ut (x, t) dx (4.1)
dt D
Fourier’s law states that the rate of heat transfer is proportional to the negative temperature
gradient, meaning that heat can only flow from hot to cold regions at a rate proportional to
the thermal conductivity k. Mathematically, this is expressed as follows:

dH
Z
= k ∇ u · n̂ dS (4.2)
dt ∂D
∂ D is the boundary of D, n̂ is the outward normal unit vector to ∂ D, and dS is the surface
measure over ∂ D. By equating Equations 4.1 and 4.2 we obtain the following:
Z Z
c ρ ut (x, t) dx = k ∇ u · n̂ dS (4.3)
D ∂D
The Divergence theorem states that the volume integral over an enclosed volume is equal
to the surface integral over the boundary of the volume. For a vector field F this is represented
as follows:
Z Z
F · n̂ dS = ∇ · F dx
∂D D
Therefore, we simplify Equation 4.3 to get:
Z Z
c ρ ut (x, t) dx = ∇ · (k ∇ u) dx
D D
By further simplifying, we obtain the following PDE:

c ρ ut = ∇ · (k ∇u)

Since c, ρ, and k are constants, by simplifying once more we get the heat equation:
4.3 Block Heating Experiment 45

ut = α ∆u (4.4)

4.3 Block Heating Experiment


4.3.1 Experimental setup

Fig. 4.1 MLX90640 pixel RAM chess reading pattern configuration, borrowed from the
datasheet [71]. The highlighted cells correspond to a subpage and we read one with each I2C
transaction. The subpages get updated with new data after each read.

In devising a setup for the heat diffusion experiment, we needed to find a way to con-
veniently collect data without requiring complicated equipment. Therefore, we use scrap
metal aluminium alloy block which we heat using a soldering iron. We chose to use an
aluminium block because we needed a metal that had a medium level of conductivity so that
the observed temperature gradients can be apparent over time. Using a metal that was too
thermally conductive, like copper for example, would result in temperature gradients that are
not very pronounced.
We use the PYNQ-Z1 FPGA as the sensing platform, and record surface temperatures
across the block using the MLX90640 infra-red (IR) thermal camera [71]. The MLX90640
has a resolution of 32x24 pixels, a field-of-view (FOV) of 55°x35°, and a temperature range
of -40°C – 300°C. The pixels are split into two subpages within the RAM of the sensor, for
the odd and even pixels. These pixels are arranged in a chess-like pattern as Figure 4.1 shows,
and we read the RAM twice and compile the subpages together to get a valid frame. The
FPGA issues bulk I2C commands to read the RAM all at once rather than individual I2C
reads for each pixel, as we have found that this method is faster. We use the same AXI IIC IP
block design shown in Figure 3.11 to communicate with the sensor. We implement a custom
driver for interfacing with the MLX90640 through the Python PYNQ API, based off of the
46 Predicting the Surface Temperatures Across a Metal Block During Heating

manufacturer’s device driver library [70]. The main difference in our implementation is that
we obtain the raw frames during the heating and only perform the conversion routines after
the experiment is over and we have collected all of the data. The conversion routines require
the specification of an emissivity parameter, and based on our aluminium block we use an
estimated value of 0.05 for this. For the heat source we use a WES51 soldering iron, but we
replace the conical nib with a custom cylindrical copper tip for better surface contact for the
conduction. Figure 4.2 shows the custom tip and Figure 4.3 shows the entire experimental
setup.

Fig. 4.2 Custom copper tip to improve surface contact for conduction.

Before we start the experiment, we configure the sensor by setting the analogue-to-digital
converter (ADC) resolution to 19 bits, and the frame refresh rate to 64 Hz. Next we ensure
that the camera is positioned correctly so that it covers the entire area of the block. To do
this we implement real-time thermal imaging so that we can observe what the camera is
capturing. We heat the block slightly so that it is visually apparent on the thermal imager.
Using this setup, we move the camera and the block accordingly until both are positioned
in place with the entire block surface appearing in frame. Then we leave the block to cool
down and after that we turn the temperature up to approximately 298°C. We insert the iron
into the hole once the temperature has stabilized and begin recording.
We record the raw sensor readings for 15 minutes. After that we convert the readings
into temperature measurement values using conversion routines specified by the sensor
4.3 Block Heating Experiment 47

Fig. 4.3 Block heating experimental setup. The MLX90640 is held with an alligator clip
attached to a flexible helping hand. The custom solder tip is inserted into the block from the
side into a hole so that it fits in place during heating. The sensor is directly connected to the
FPGA.

manufacturer. We save the readings and then move them onto the workstation to process the
data.

4.3.2 Data analysis


Over the duration of the heating, the data exhibits temperature gradients that can be used
for training. Figure 4.5 shows a 3D plot of the heating landscape for raw data at different
instances in time, plotted using the Plotly graphing library [84]. Figure 4.4 shows the
temperature measurements after the sensor conversion routines for 8 randomly-selected
pixels. The temperature measurements are initially very noisy and dominated by high-
amplitude spikes. These spikes usually correspond to instances when the sensor data fails to
update with new data in time, resulting in the sharp dotted pattern that Figure 4.6 shows.
48 Predicting the Surface Temperatures Across a Metal Block During Heating

Fig. 4.4 Converted temperature measurements over time for 8 randomly-selected pixels. The
measurements are noisy and are also dominated by noise spikes.

The existence of high amplitude spikes, as well as the high fluctuations in the temper-
ature signal will cause training difficulties in the optimization problem that we formulate.
Therefore, it was necessary for us to implement a denoising strategy to eliminate the spikes
and smooth out the data over time. Later on, we investigate the training results with and
without denoising.

4.3.3 Denoising strategy


Spike filtering

The first step was to filter out the spikes in the the data. The main issue that we faced here
was that different pixels exhibit spikes at different instances in time. Therefore, we adopt a
strategy where we remove the entire frame from the data if any given pixel exhibits spiky
behaviour. A better alternative to this would be to replace spike pixels with the average of
their neighbouring pixels, however this introduces an additional level of complexity given that
neighbouring pixels in both space and time may also be spiky. For our purposes, our filtering
method has acceptable performance. Algorithm 1 outlines the spike filtering algorithm. We
iterate over all of the pixels and their respective measurements over time, and compare their
values against their neighbours, adding the spike value indices to the spiky_indices array.
4.3 Block Heating Experiment 49

(a) 0.05 s. (b) 239.57 s.

(c) 478.85 s. (d) 897.75 s.

Fig. 4.5 3D plots of the heating profile at different instances in time. The plots show valid
temperature gradients over time.

We delete the spiky indices from the data to obtain the spike-filtered data. We use a threshold
value of 100, and based on this we find that the spikes comprise 8.53% of the data. Figure 4.7
shows a plot of the spike-filtered data for 4 randomly chosen pixels. We notice that two
regions of the data have been filtered out completely as they correspond to durations with
high spike concentratons across many pixels. Additionally, we notice that a few downwards
spikes are still present. These will be smoothed out with our next denoising step.

Data smoothing

After filtering out spikes, the data still exhibits fluctuations in temperatures that cannot
be physically possible. Therefore we perform an additional step to smooth out the data
fluctuations and capture the true behaviour of the temperature signals. For this we apply a
Savitzky-Golay filter [93] on the time-series for each pixel, using the SciPy library [106]. The
Savitzky-Golay filter is a data-smoothing algorithm based on fitting low-order polynomials
using the linear least squares method. It is a simple and effective method that is suitable
for our purposes since the temperatures measurements maintain a relatively linear upwards
50 Predicting the Surface Temperatures Across a Metal Block During Heating

Fig. 4.6 Frame visualisation of one of the temperature spikes shown in Figure 4.4. These
correspond to instances when the camera fails to capture valid frames.

trend. We use a polynomial order of 3, and a window size of 400. Figure 4.8 shows the
measurements after smoothing with the filter.

4.3.4 Training setup


In this section we keep the following parameters constant unless otherwise specified:

• Architecture: 2-layer MLP with 64 and 32 units.

• Training set: Denoised data.

• Optimizer: LBFGS.

• Activation function: Hyperbolic tangent (Tanh).


4.3 Block Heating Experiment 51

Algorithm 1: Spike filtering method


get_spike_indices(data, threshold):
spike_indices ← [ ]
foreach pixel_measurements ∈ data do
current_valid_measurement = pixel_measurements[0];
foreach measurement ∈ pixel_measurements do
if |current_valid_measurement − measurement| > threshold then
spike_indices.append(measurement)
end
else
current_valid_measurement = measurement
end
end
end
return sort(unique_values(spike_indices))

• Learning rate: 0.01.

• λ p : 0.5.

• N p : 32000 points corresponding to 500 frames.

PINN loss function

The loss function we use is based on a substitution of the 2D version of Equation 4.4 in
Equation 2.4. Additionally we solve an inverse problem to discover the α parameter, although
we assume that the value might differ slightly for the x and y dimensions so we use the
coefficients α and β for x and y respectively. Thus, the result is Equation 4.5.

N
1 Nd λp p i
L=
Nd i=1
| f − u i 2
∑ u d Np ∑ |ut − α uixx − β uiyy|2
| + (4.5)
i=1

We use an initial value of 10.0 for α and β .

Initial training difficulties

Our initial attempt at training involved using LBFGS with full batch training, similar to
the approach we took in Chapter 3, but with 3 dimensions for the x, y, and t coordinates.
The current LBFGS Pytorch implementation only supports full-batch training. The full
dataset collected from our experiment consists of 18, 798 frames where each frame consists
of 768 2-byte pixels, and the denoised data similarly consists of 17, 195 frames. Each pixel
52 Predicting the Surface Temperatures Across a Metal Block During Heating

Fig. 4.7 Spike-filtered temperature time-series for four randomly-chosen pixels. The regions
near 340 and 470 seconds displayed high concentrations of spikes across the pixels so were
filtered out completely.

is associated with a 3-dimensional spatio-temporal coordinate point. This gives us a total


of 18, 798 × 768 × 2 = 28, 873, 728 bytes (≈ 28.9 MB) for the label set and 18, 798 × 768 ×
3 × 4 = 173, 242, 368 bytes (≈ 173.2 MB) for the feature set. Given that we need to train a
large model for this problem, in addition to needing to compute large Jacobian and Hessian
matrices, we began to face memory issues with full-batch training on our 4 GB GPU.
The bigger issue however, is the difficulty posed on the optimization problem since our
problem involves a 3-dimensional cuboid with millions of points. Therefore we attempted to
use Adam [52] instead of LBFGS since it supports mini-batch training, and we down-sample
our data in time. Unfortunately, it was still the case that the training failed to converge to
a satisfactory solution even with many variations on the down-sampled training data, the
network architectures, and the training hyperparameters. Figure 4.9 shows example training
evaluations with a 2-layer 32-unit fully-connected architecture with 57, 775 points randomly
chosen throughout the domain, and with a batch size of 4096. We find that in most cases
using a simpler architecture performs better than a complex one. The predictions are better
after denoising, although in all cases we could not get an RMSE lower than 26.
Therefore, we reduce the size our problem by reducing the size of the training frame.
4.3 Block Heating Experiment 53

Fig. 4.8 Data for 4 random pixels after applying the Savitzky-Golay filter [93]. The data
retains small amounts of noise, although most of it has been smoothed out.

Frame size reduction

We reduce our problem size so that we focus on a central square of the frame. Figure 4.10
shows contour plots for different reduced-size frames after 433.43 seconds of heating. After
frame reduction the memory taken up by our training data has reduced, and we are now
also able to use LBFGS with full-batch training. Figure 4.11 shows the 8x8 reduced frame
training cases compared against the full-frame ones using Adam, and Figure 4.12 shows a
similar evaluation but only for the 8x8 reduced frame using LBFGS. For LBFGS we used
a 3-layer network with 64 units in the first layer as the 2-layer architecture caused training
instabilities. Additionally we trained the LBFGS case for 500 iterations as the network was
still able to learn more, as opposed to the Adam case where the RMSE plateaued after 20
iterations. The best case RMSE for Figure 4.11 was 23.67 while the best case RMSE for
Figure 4.12 was 9.42. We can see that LBFGS performs much better than Adam so we fix it
for the upcoming evaluations. Additionally we will only be training with the denoised data.

4.3.5 Results
Frame size variation

Here we investigate the training performance based varying frame size. We train with 819
linearly-spaced frames and test with 441 linearly-spaced frames. We train until convergence
54 Predicting the Surface Temperatures Across a Metal Block During Heating

Fig. 4.9 RMSE graph for a 2-layer 32-unit PINN and NN trained with Adam [52]. The
training data is shown based on the raw and denoised data, and is sampled from the full
768-pixel frames. Denoising certainly helps achieve better training results, although there is
a negligible difference between PINN and NN training performance.

and report on the number of iterations needed. Table 4.1 shows the training results for
different frame dimensions. We notice a predictable pattern in the PINN RMSE column in
that the values decrease with a decreasing frame size. The NN RMSEs do not adhere to this
pattern as strictly, as we can see with the RMSE for the 6x6 pixel frame RMSE which is
greater than that of the 10x10 frame.

Frame Dimensions NN PINN α β


9.894 24.894
16x16 9.9626 9.9845
5000 1000
8.524 10.433
10x10 9.9445 9.9942
5000 4999
9.822 9.420
8x8 10.0266 9.9754
976 4971
10.804 8.084
6x6 9.9166 9.9319
1998 4352
4.293 4.291
4x4 9.9486 9.9153
4938 5000
Table 4.1 RMSE values based on varying the frame size. The first line in the NN and PINN
entries corresponds to the RMSE, and the second line corresponds to the iteration number.
4.3 Block Heating Experiment 55

(a) 16x16. (b) 8x8.

(c) 6x6. (d) 4x4.

Fig. 4.10 Reduced size frames at t = 433.43. Smaller frames are easier to train with than
larger ones.

Linearly-spaced frames

We fix the frame size to be 8x8 pixels and investigate predictive performance for a variation
of the number of linearly-spaced frames. Table 4.2 shows the results, and the second and third
rows of Figure 4.13 show visualisations of the predictions. Contrary to our expectation, the
PINN results do not appear to be better than the NN results as the amount of data decreases,
except for the 36-frame and 6-frame cases. Additionally, we generally see from Figure 4.13
that the PINNs failed to capture spatial temperature gradients compared to NNs, as they
made near constant temperature predictions for their entire frames. This may be because the
α and β parameters are not optimized, or because of the fact that the data does not strongly
adhere to the physics. This is most likely due to the large amounts of noise in the system
56 Predicting the Surface Temperatures Across a Metal Block During Heating

Fig. 4.11 Reduced frame vs full frame training comparison. The minimum RMSE is 23.67.
Both denoising and reducing the frame size improve the training performance, although the
training accuracy remains to be satisfactory.

which influences what the physics looks like for the observed data. It may be possible to get
results for the PINN that are better than the NN by tweaking the hyperparameters since the
optimization process is stochastic in nature.

Uniformly-distributed points

Similarly to the case for linearly-spaced frames, we vary the number of training points
within the domain but this time based on a random choice of uniformly-distributed points.
Table 4.3 shows the training performance for these evaluations, and the fourth and fifth rows
of Figure 4.13 show visualisations for their predictions. With the exception of a few outlier
training cases, such as Nd = 73408 and Nd = 384, the PINN and NN have similar RMSEs.
In general, the concentration of points at different spatiotemporal regions would affect how
well the networks predict for those regions. However, based on the constant RMSE trend at
values close to 9 irrespective of Nd , it may be the case that for this problem the networks are
converging to similar solutions regardless of the training data. The similarity between the
predicted solutions in the second and fourth rows of Figure 4.13 further support this idea.

4.4 Closing Remarks


In this chapter we have investigated and compared the predictive performance and trainability
of PINNs against standard NNs for the reconstruction of the thermal diffusion regime of
4.4 Closing Remarks 57

Fig. 4.12 LBFGS training evaluation for a 3-layer network with 64-32-32 units. The minimum
RMSE is 9.42. Using LBFGS improves training performance considerably.

a block as it is being heated, based on an experiment that we set up. We also looked into
issues related to sensor denoising and its effect on training. Unfortunately we have not
been able to reconstruct a sufficiently accurate solution with PINNs nor with NNs, although
we have found ways of achieving better accuracy. These include denoising the data, using
a smaller frame size to reduce the difficulty of the optimization problem, and using the
LBFGS optimizer instead of Adam. We hypothesize that, similarly to what we discussed in
Section 3.5.3, the training domain in time is too large, resulting in stiffness in the optimization
due to the difficulty of traversing the loss landscape. This is especially the case when we
are trying to reconstruct 2D temperature grids rather than single angle measurements as in
Section 3.5.2. In Section 3.5.3 we solved this issue by reducing the domain of our problem
to make the optimization easier. This was applicable for the case of the pendulum since it
had a predictable sinusoidal pattern in a single dimension. The complexity of the 2D heat
diffusion problem on the other hand is something that we want to fully investigate for the
entire duration of our experiment, so we are interested in solving it for the entire domain.
One possible approach to this is to use sequence-to-sequence training, as was proposed by
Krishnapriyan et al. [55]. Sequence-to-sequence training involves splitting the time domain
into time-steps and training on the consecutive time-steps, one after the other. Krishnapriyan
et al. show that using it enables them to achieve lower losses in the order of 1-2 magnitudes
for a simulation of a 1D reaction-diffusion problem. We believe that applying it to our
problem could allow us to arrive at accurate solutions for both NNs and PINNs. Additionally,
by observing the coefficient values in Tables 4.1, 4.2, and 4.3 we notice that the α and β
coefficients do not change much from their initial value. It it generally the case that inversion
58 Predicting the Surface Temperatures Across a Metal Block During Heating

(a) Test data, i = 35. (b) Test data, i = 200. (c) Test data, i = 400.

(d) LS, NN, i = 35. (e) LS, NN, i = 200. (f) LS, NN, i = 400.

(g) LS, PINN, i = 35. (h) LS, PINN, i = 200. (i) LS, PINN, i = 400.

(j) UD, NN, i = 35. (k) UD, NN, i = 200. (l) UD, NN, i = 400.

(m) UD, PINN, i = 35. (n) UD, PINN, i = 200. (o) UD, PINN, i = 400.

Fig. 4.13 Comparison between the test data and the predictions of NNs and PINNs after
training with 832 points. LS denotes linearly-spaced data and UD denotes uniformly-
distributed random data. i refers to the time sample index. The NNs capture slight heating
gradients in space, whereas the PINNs predict almost constant temperatures for specific
frames.
4.4 Closing Remarks 59

Nfr
NN PINN α β
Nd
3439 8.344 9.537
10.0382 9.8948
220096 5000 5000
1147 8.701 17.526
9.9853 9.9753
73408 5000 1901
574 9.805 9.426
10.0397 10.0631
36736 5000 4987
287 9.272 9.462
9.9517 9.9112
18368 4999 4994
144 8.952 9.421
9.9574 9.8677
9216 5000 5000
72 8.828 9.443
10.0699 9.7978
4608 2898 5000
36 11.381 9.436
9.7256 9.8428
2304 4388 5000
28 9.579 23.606
10.0837 9.9646
1792 1943 461
18 9.508 15.905
9.9908 9.8933
1152 1859 819
13 7.915 9.578
9.7833 9.8016
832 5000 5000
11 9.683 14.848
9.8494 9.9277
704 1633 4820
6 13.394 11.281
9.9824 9.7317
384 823 3547
Table 4.2 RMSE values based on a variation of the number of linearly-spaced frames. Nfr is
the number of frames and Nd is the corresponding number of points.

problems suffer from ill-posedness of optimization, require large numbers of network forward
passes, and are highly susceptible to noise [75]. Instead of posing their optimization as an
inverse problem, Krishnapriyan et al. introduce a curriculum learning approach where the
PINN is initially trained on small coefficient values, and then gradually retrained with larger
coefficients [55]. This may be a promising approach to look into for our problem, since the
abundance of noise in our setup may have affected the optimal values for the coefficients. A
curriculum learning approach may be promising in terms of searching the solution space of
the coefficients.
60 Predicting the Surface Temperatures Across a Metal Block During Heating

Nd NN PINN α β
9.475 9.429
220096 9.9781 9.9761
3557 2542
7.016 15.436
73408 -4e5 -8e5
3500 4562
9.2324 9.5814
36736 9.9888 9.7793
5000 5000
8.792 10.384
18368 10.0844 9.8509
4341 5000
8.453 9.848
9216 9.8036 9.8602
2605 5000
14.030 14.599
4608 9.9838 9.9897
1163 4506
9.482 10.517
2304 9.9933 10.0076
4468 1311
20.813 23.072
1792 -1e13 -9e12
155 5000
11.198 9.756
1152 9.8414 9.8285
2666 5000
7.819 9.467
832 10.0405 9.932
5000 3139
9.241 9.470
704 10.1434 10.0899
5000 2458
9.637 33.971
384 10.0501 10.0402
3564 225
Table 4.3 RMSE values based on a variation of the number of uniformly-distributed points
Nd .
Chapter 5

Parallel Hardware and Time-coherent


Sensing

5.1 Introduction
This chapter studies sensing issues related to deployment within hardware for real-time
inference, and focuses particularly on time coherence. Time coherence is the degree to which
two or more n-dimensional data inputs that occur at the same time instance are captured
at with minimal latency between them, for an arbitrary value of n. The inputs to deployed
predictive models will commonly arrive from digital interfaces of sensors, and in many cases
they could be multi-dimensional inputs arriving from many different sensors. Common
embedded microcontrollers face difficulties in maintaining time coherence for many sensors
due to the sequential nature of their programs. FPGAs on the other hand are inherently
parallel, and so are an appropriate choice for this type of sensing architecture as they are able
to sample independently from parallel interfaces. Based on this, we present an experiment to
shed light on the issues of time and space coherence for digital sensing applications.

5.2 Parallel Capture Heating Experiment


One of the issues that came up in the heat diffusion investigation in Chapter 4, was the
noisiness of the data. Therefore we perform a similar block heating experiment but this time
with multiple thermal cameras. Specifically, we are interested in understanding the aleatoric
uncertainty by comparing the data from the different cameras, as well as looking into the
viability of time-coherent sensing.
62 Parallel Hardware and Time-coherent Sensing

5.2.1 Experimental setup


The setup for this experiment is similar to the one outlined in Section 4.3.1, but with 5
thermal cameras instead of one. Figure 5.1 shows the experimental setup.
We use 5 PYNQ-Z1 boards to interface with the 5 cameras. An ideal setup would involve
5 different AXI IIC blocks on one FPGA where the data from each block is offloaded to a
BRAM. The BRAM would be read from in a single time instance to ensure time-coherence.
Unfortunately however, due to project time constraints we settled on using 5 FPGAs instead.
The PYNQ boards require an ethernet connection to access the Jupyter server that runs the
Python subsystem. For the single camera setup we would connect the PYNQ board directly
to the computer’s ethernet port. However, since we have 5 boards for this experiment we
need 5 ethernet ports. Therefore we use a network switch to connect the boards together.
The cameras are held in place with magnetic alligator clamp holders, and each camera is
angled so that it covers as much of the block’s surface as possible. We also apply thermal
compound to increase the amount of conduction through surface contact.
We start the thermal data capture programs on each FPGA with as minimal human delay
as possible, insert the soldering iron into the block once it is at just under 298°C, and begin
the experiment.

5.2.2 Data alignment


After performing the experiment, we found that one the cameras had unfortunately failed to
capture any data. This was the third camera which is positioned vertically above the block
in Figure 5.1. Therefore we make do with data captured from 4 cameras. The following
sections discuss issues with data alignment in both time and space.

Time

During the experiment, we captured timestamps for each measurement for all of the cameras
using the Python time.time() function which returns the time elapsed in seconds since the
1st of January 1970. Therefore, all of the measurements have an absolute reference of time
that we can compare against. First, we subtract the first timestamp for each of the cameras
from their respective timestamp arrays so that time starts from 0 for each of them. To compare
the time-coherence of the data between different cameras, we subtract their timestamps from
each other and observe the time difference. Figure 5.2 shows a plot of the differences. We
can see that over time, the ∆ T values between the cameras increases, indicating that the data
misalignment is increasing. In the worst case, ∆T between cameras 2 and 4 drifts to a 6
second difference by the end of the experiment duration, which is significant.
5.2 Parallel Capture Heating Experiment 63

Fig. 5.1 Experimental setup for the parallel heating experiment. We use 5 magnetic alligator
clamps to hold 5 MLX90640 thermal cameras, which are connected to 5 PYNQ-Z1 FPGA
boards. The cameras are pointed at the block so that the block surface takes up the most area
in the camera FOVs.

To ensure data validity, the frames for all of the cameras would have to be aligned in time,
but this unfortunately is not the case with the data. There are two issues that are apparent
here. The first is that the cameras have different levels of delays between each other, so a
proper reference point would have to be found which is not a trivial matter. The second is
that between any two given cameras the amount of delay is not constant, nor is it constant
across the measurements of any single camera. Therefore, shifting the measurements for the
cameras by fixed amounts would still not lead to correct alignment. The proper course of
action in this case is to ensure that samples are taken from one device from 5 independant
interfaces using dedicated AXI IIC blocks.
64 Parallel Hardware and Time-coherent Sensing

Fig. 5.2 Time sample difference over the experiment duration for different cameras. The time
coherence between each camera reduces over time, and in the worst case the difference is 6
seconds (cameras 2 and 4).

Space

From right to left in Figure 5.1, the cameras are at angles 49.0°, 71.5°, 90.0°, 114.0°, and
144.0°. Since the vertical camera (third camera) did not work, we will call the next two
cameras, from right to left, cameras 3 and 4. Figures 5.4, 5.5, 5.6, and 5.7 show the raw
frames obtained from the four cameras at different time samples. These samples are delayed
in time for the different cameras based on the sample differences that Figure 5.2 shows. One
of the first things we notice from the camera data, is the spatial misalignment of the frame
views between the different cameras. Camera 1’s view is positioned slightly to the right of
camera 2’s, as we can see from the solder end which appears in frame. Cameras 3 and 4
on the other hand do not have the solder end in the frame. We also have reason to believe
that camera 3’s lens is not at the same orientation as that of the rest of the cameras, despite
it being positioned in the correct way in the experimental setup. This is because the hot
part resulting from the solder is seen appearing vertically between X pixels 15 and 20 (see
Figure 5.7c), rather than horizontally similar to the rest of the cameras. Also, camera 3 might
be faulty since its recorded temperatures are not in the same range as the other cameras
(almost 50 – 100°C less).
5.3 Closing Remarks 65

In an ideal case where the cameras are aligned in time, we would align the data in space
so that the frames from each camera represent the same area. We would do this by shifting
their perspectives so they all point vertically downwards. Since the data is not aligned in
time nor in space, it would not make sense to do this so we instead compare the data from
cameras 1 and 2 from t = 0 s to t = 600 since these cameras are closest in space, and have
the least time delay between them up to 600 seconds.
We compare the rectangular patch just after the solder end for the two cameras. For
camera 1, the patch has corners with pixel coordinates (1, 7), (29, 7), (1, 14), (29, 14). For
camera 2, the corner coordinates are (4, 9), (32, 9), (4, 16), (32, 16). Figure 5.3 shows a
comparison of the patch for the two cameras at t = 240.85 s. Figure 5.8 shows histograms
for the differences between the pixels within the rectangular patch at five instances in time.
The expectation was that the temperature differences would be near 0 for the early part of the
heating process, and then they would increase due to the gradual loss of time coherence. The
histogram plots show that this is not the case, as there are large temperature differences that
are consistent throughout the heating process.

5.3 Closing Remarks


This chapter studied issues related to coherent sensing in both time and space. To do this we
designed an experiment where the aim was to sample from 5 different thermal cameras in
parallel and compare their data. The data turned out to be incoherent in both time and space,
and we show that this incoherence is not easily amendable.
For the time incoherence, different cameras have different levels of delays between them.
Additionally, the sampling rates turned out to be inconsistent across the heating duration for
any given camera. In repeating this experiment a fixed delay would need to be introduced
between each consecutive sample, and the samples would need to be aligned based on the
absolute time reference provided by the time.time() function.
For the space incoherence, the cameras were aligned at different angles with slight
deviations in their horizontal and vertical offsets. This has affected which areas of the block
they were able to capture, and to some extent the range of the temperatures that were captured.
A structured setup based on FOV calculations would help in reducing errors caused by spatial
alignment. Additionally, the large deviations in the temperature ranges that were captured by
the cameras could be resolved by calibrating each of them.
66 Parallel Hardware and Time-coherent Sensing

Fig. 5.3 A comparison of the rectangular patch which we attribute as focusing in on the same
area between the two cameras. The temperature ranges and the temperature grid are visually
similar.
5.3 Closing Remarks 67

(a) Camera 1. (b) Camera 2.

(c) Camera 3. (d) Camera 4.

Fig. 5.4 Frame visualisations at time sample 500.


68 Parallel Hardware and Time-coherent Sensing

(a) Camera 1. (b) Camera 2.

(c) Camera 3. (d) Camera 4.

Fig. 5.5 Frame visualisations at time sample 3250.


5.3 Closing Remarks 69

(a) Camera 1. (b) Camera 2.

(c) Camera 3. (d) Camera 4.

Fig. 5.6 Frame visualisations at time sample 7500.


70 Parallel Hardware and Time-coherent Sensing

(a) Camera 1. (b) Camera 2.

(c) Camera 3. (d) Camera 4.

Fig. 5.7 Frame visualisations at time sample 15000.


5.3 Closing Remarks 71

(a) Time = 11.27 s. (b) Time = 120.53 s.

(c) Time = 288.93 s. (d) Time = 480.71 s.

(e) Time = 580.82 s.

Fig. 5.8 Histogram plots for the temperature difference between the pixels in the rectangular
patch which corresponds to the same area in two different cameras (1 and 2). We can see
that, even though a significant number of the differences are near 0, the majority of them are
to the right of the graph and have large errors.
Chapter 6

Discussion and Future Work

6.1 Discussion
In Section 1.1, we introduced the central research questions that motivated our work in
this dissertation. The first question assessed the viability of physics-informed models as
predictive bases for experimental data captured from real-world systems. Chapters 3 and 4
addressed this, with a study of the performance of PINNs on two different physical systems:
a simple 1D nonlinear pendulum, and a more complicated 2D heat diffusion system.
For the pendulum many of the training cases, for both the idealized system and the
experimental data, have proven that PINNs outperform standard NNs when it comes to
regularizing differential equation solutions for sparse, noisy, and low-density data regions.
This puts forward a strong case for encoding known information about system dynamics in
deep learning as opposed to treating NNs as black boxes, especially for experimental data.
The nature of experimental data is that it is dominated by aleatoric and epistemic uncertainty,
and using PINNs or other techniques that encode physics information could be a promising
strategy for taking these uncertainties into account.
Unfortunately, training for the heat diffusion system did not fare as well as it did for the
pendulum system. This was the case for both the NN and PINN cases. The reason for this
ties back to the explanation in Section 3.5.3 — the optimization problem is too difficult to
solve on the entire domain of the heating process. This is especially the case for the PINN,
since the loss landscape is more difficult to traverse with the second-order derivative terms.
Table 4.1 supports this point, since the RMSE values improve once the spatial domain is
reduced. A reduction in the temporal domain may also be necessary. Additionally, it may
be the case that the data collected from the experiment is not adherent to the physics to a
sufficient degree. This would explain the PINN frame predictions in Figure 4.13, where
the model predicts near constant temperatures throughout entire frames for specific time
6.2 Outlook 73

instances. It may be valuable to study the thermal and sensor noise models better to find a
way to incorporate them into the training.
The second question focused on the feasibility of deployment of physics-informed models
in physical setups, and the issues that might be faced in attempting to do so. To this end
we used inexpensive sensors for our experiments and an FPGA as our embedded platform,
and presented a review of issues related to sensor time coherence and spatial alignment in
Chapter 5.
The hardware design shown in Figure 3.11 uses a single AXI IIC block with a 1000 KHz
I2C clock frequency. The experiment in Section 5.2 uses this design in the 5 FPGAs to sense
in parallel. While this may be a reasonable setup for independant parallel sensing, as shown
in Section 5.2.2 it fails to retain time coherence in practise. A better approach that retains
it, is to use an FPGA design with 5 AXI IIC blocks on a single board with the same I2C
clock. The FPGA design should include a double buffer BRAM, where in one clock cycle
the first BRAM stores the sensor data and the second BRAM outputs the data through the
AXI interface. In the next cycle the BRAM roles are reversed — the AXI interface receives
the previously stored data from the first BRAM and the second BRAM stores the sensor data.
This approach ensures that time coherence would be retained.
Spatial data alignment is a more complicated issue to tackle. The first step is to ensure
that all of the cameras are calibrated properly so that they measure temperatures accurately
and within the same ranges. After that, the cameras should be positioned correctly based on
proper alignment of their FOVs with the block’s surface. Once this is done, post-processing
frame transformations will be required to ensure that the camera views all point vertically
downwards on the surface of the camera. These steps will ensure that the pixels from each
camera are aligned with each other, after which their comparison will be valid.

6.2 Outlook
The motivation behind this work is to develop methods for encoding dynamics for the
deployment of robust physics-aware models within real physical systems, for real-time
inference and by extension model predictive control (MPC). Figure 6.1 shows an architectural
diagram for the system that our methods work towards. ML models are being deployed in
a wide variety of modern technologies such as autonomous vehicles [94], biosensing [73],
patient health monitoring [29], and smart manufacturing [5]. Most of the models used in these
applications are context-agnostic and are unaware of the dynamics of the environments that
they exist within. The development of the systems that follow the architecture in Figure 6.1
would introduce domain knowledge into models that make safety-critical decisions under
74 Discussion and Future Work

uncertain and changing environments. This would increase their robustness and adaptability
and would also give us confidence in their decisions, since we know that they are based
on widely understood physical principles. Extending domain-specific languages such as
Newton [64] with features for encoding dynamics, coupled with the reconfigurability of
FPGAs, would enable generalization of deployment across a wide variety of physical systems
with different dynamical behaviours and with adaptive computer architectures.

6.3 Future Work


The work presented in this dissertation is the initial step towards the development of the
proposed system discussed in Section 6.2. In this section we list the future work to be
performed based on the insights gained and issues faced from the work of this dissertation.

6.3.1 Resolve optimization difficulties


The trained networks for the heat diffusion system did not perform to a satisfactory degree.
As mentioned previously, this is due to the difficulties faced with predicting 2D temperature
frames across a large time domain. One method that shows promise is to use FBPINNs pro-
posed by Moseley et al. [76], or similar domain decomposition machine learning techniques.
This would simplify the loss landscape by constraining it to small sub-domains, which are
much easier to traverse and find minima within.
Additionally it would be valuable to run a numerical simulation for the heating regime,
similar to Section 3.3, and to train using it. This would provide us with a benchmark so that
we can see how much the network is being impacted by the amount of noise in the system.

6.3.2 Resolve coherent sensing issues


Section 5.2.2 showed the difficulties faced in attempting to sample sensor data coherently
in time, and aligning that data in space. In repeating the parallel heating experiment, extra
care will be taken to ensure that the sensors are working, calibrated, and operating within
the same ranges. The sensing regime will make better use of the absolute time reference by
beginning the experiment and heating after a specific time instance, and by sampling after a
fixed time delay that is consistent between all of the sensors.
It may also be worthwhile to repeat the experiment with different thermal cameras —
ones that are less susceptible to noise. Possible candidates for this include the AMG8833 [80]
and the FLIR Lepton [39].
6.3 Future Work 75

Fig. 6.1 Architectural diagram for our proposed system. A user encodes the differential
equation for a system using a description language such as Newton [64]. A back-end compiler
performs static analysis on the Newton description to generate a PINN architecture, which
can be trained offline using experimental measurements taken from the system. The trained
PINN can then be synthesized onto an FPGA using high-level synthesis (HLS) tools. Finally,
the user can then integrate the FPGA with the synthesized model into the system for real-time
inference, and by extension control.
76 Discussion and Future Work

6.3.3 Study alternative PIML approaches


This dissertation focused on PINNs as the candidate architecture for encoding differential
equations, however other models might also be promising. These include neural ODEs [22],
deep operator networks [67], or physics-constrained Gaussian processes [87]. A thorough
investigation into the application of different models would result in a deeper understanding
of the interplay between physics and machine learning for real physical systems.

6.3.4 Investigate deployment


An idea that is central to the themes of this dissertation is the deployment of physics-informed
model in real systems. As discussed in Section 2.3.1, the FINN [16] framework provides a
workflow for quantized NNs by exploring the design space of network parameter precision,
and deploying accelerator designs using the FINN compiler. One of the major advantages of
FINN is that it supports NN implementations on the PYNQ-Z1 board. A prototype version of
one of the PINNs that we trained in Chapters 3 and 4 can be deployed as a proof-of-concept
for PINN implementations in hardware. For the pendulum system, the issue is that currently
there is no support for a quantized sine activation function. For the heat diffusion system, a
sufficiently accurate model must be trained first.
One other issue with deploying PINNs is that a specific PINN is trained only on a
specified set of initial and boundary conditions. A new instance of a PINN would have to be
trained for a different set of conditions, so a solution must be found that accounts for a wide
variety of initial and boundary conditions in a real-world setting.

6.3.5 Extend Newton with dynamics constructs


Newton [64] currently supports constructs for defining signals, units, and invariant relation-
ships between signals. Extending it with constructs for defining governing equations for
dynamics would be a step forward towards the generalized deployment of physics-aware
models within hardware.

6.3.6 Implement MPC


The advent of physics-informed MPC is an attractive prospect, especially in real-time, safety-
critical or high-uncertainty systems. Arnold and King have shown the possibility of using
PINNs for MPC for a simulated application based on the Burgers equation [6], however
further work is required to verify the applicability of these methods for actuation within a
real system.
Chapter 7

Conclusion

This dissertation has investigated two central motivating questions. The first is whether or not
the encoding of differential equations in machine learning improves predictive performance
for data collected from real physical systems. The second relates to the viability of deploying
physics-informed models within physical systems for real-time inference. To answer the first
question, we studied the performance of physics-informed neural networks for two different
systems: a simple nonlinear pendulum, and 2D heat diffusion across the surface of a metal
block.
For the first system, we found that the inclusion of the physics loss term based on the
system’s governing equation, helped in regularizing the solution according to the underlying
physics. This resulted in accurate predictions of the exact solution for both the ideal numerical
solution and for the experimental data, for very few training points. In the best case, the
PINN achieved an 18× accuracy improvement over an equivalent NN for 10 linearly-spaced
training points for the ideal data, and an over 6× improvement for 10 uniformly-distributed
random points. For the real data case, the PINN achieved accuracy improvements of 9.3×
and 9.1× for 67 linearly-spaced and uniformly-distributed random points respectively. This
proves the predictive performance benefits of encoding known physics into neural networks
for both ideal and real data cases for a simple pendulum system.
For the heat diffusion system, we addressed challenges related to denoising thermal
camera data and simplifying the optimization for a complex 2D system. We have shown that
data denoising, frame size reduction, and optimization using LBFGS are ways to improve
the accuracy of network predictions. The PINN and NN showed similar RMSE values, and
we were unable to obtain satisfactory accuracy for both despite the improvements. This was
because of the difficulty in the underlying optimization problem when it exists within a large
domain.
78 Conclusion

To answer the second question, we design an experiment involving 5 thermal cameras to


investigate issues related to parallel sensing. We identify two important issues in the context
of using the thermal cameras: time coherence and spatial data alignment.
We end the dissertation with a discussion of the results, an outlook based on the motiva-
tion, and future work.
References

[1] Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S.,
Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G.,
Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., and Zheng, X. (2016).
Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on
Operating Systems Design and Implementation (OSDI 16), pages 265–283.
[2] Ablowitz, M. and Prinari, B. (2008). Nonlinear Schrodinger systems: continuous and
discrete. Scholarpedia, 3(8):5561. revision #137230.
[3] Abowd, G. D., Dey, A. K., Brown, P. J., Davies, N., Smith, M., and Steggles, P. (1999).
Towards a better understanding of context and context-awareness. In Gellersen, H.-W.,
editor, Handheld and Ubiquitous Computing, pages 304–307, Berlin, Heidelberg. Springer
Berlin Heidelberg.
[4] Abramowitz, M. and Stegun, I. A. (1948). Handbook of mathematical functions with
formulas, graphs, and mathematical tables, volume 55. US Government printing office.
[5] Ahmad, H. M. and Rahimi, A. (2022). Deep learning methods for object detection in
smart manufacturing: A survey. Journal of Manufacturing Systems, 64:181–196.
[6] Arnold, F. and King, R. (2021). State–space modeling for control based on physics-
informed neural networks. Engineering Applications of Artificial Intelligence, 101:104195.
[7] Arroyo Leon, M., Ruiz Castro, A., and Leal Ascencio, R. (1999). An artificial neural
network on a field programmable gate array as a virtual sensor. In Proceedings of the Third
International Workshop on Design of Mixed-Mode Integrated Circuits and Applications
(Cat. No.99EX303), pages 114–117.
[8] Arzani, A., Wang, J.-X., and D’Souza, R. M. (2021). Uncovering near-wall blood
flow from sparse data with physics-informed neural networks. Physics of Fluids, 33(7).
071905.
[9] Bade, S. and Hutchings, B. (1994). Fpga-based stochastic neural networks-
implementation. In Proceedings of IEEE Workshop on FPGA’s for Custom Computing
Machines, pages 189–198.
[10] Bai, J., Lu, F., Zhang, K., et al. (2019). Onnx: Open neural network exchange.
https://github.com/onnx/onnx.
80 REFERENCES

[11] Baker, N., Alexander, F., Bremer, T., Hagberg, A., Kevrekidis, Y., Najm, H., Parashar,
M., Patra, A., Sethian, J., Wild, S., Willcox, K., and Lee, S. (2019). Workshop report
on basic research needs for scientific machine learning: Core technologies for artificial
intelligence. U.S. Department of Energy Office of Scientific and Technical Information.
[12] Beléndez, A., Pascual, C., Méndez, D., Beléndez, T., and Neipp, C. (2007). Exact
solution for the nonlinear pendulum. Revista brasileira de ensino de física, 29:645–648.
[13] Bhustali, P. (2021). Physics-informed-neural-networks. https://github.com/
omniscientoctopus/Physics-Informed-Neural-Networks.
[14] BIPM (2019). Le Système international d’unités / The International System of Units
(‘The SI Brochure’). Bureau international des poids et mesures, ninth edition.
[15] Bishop, C. M. (2006). Pattern Recognition and Machine Learning (Information Science
and Statistics). Springer-Verlag, Berlin, Heidelberg.
[16] Blott, M., Preußer, T. B., Fraser, N. J., Gambardella, G., O’brien, K., Umuroglu, Y.,
Leeser, M., and Vissers, K. (2018). Finn-r: An end-to-end deep-learning framework for
fast exploration of quantized neural networks. ACM Trans. Reconfigurable Technol. Syst.,
11(3).
[17] Boyce, W., DiPrima, R., and Meade, D. (2017). Elementary Differential Equations and
Boundary Value Problems. Wiley.
[18] Buckingham, E. (1914). On physically similar systems; illustrations of the use of
dimensional equations. Phys. Rev., 4:345–376.
[19] Burgers, J. (1948). A mathematical model illustrating the theory of turbulence. In Von
Mises, R. and Von Kármán, T., editors, Advances in Applied Mechanics, volume 1, pages
171–199. Elsevier.
[20] Cai, S., Mao, Z., Wang, Z., Yin, M., and Karniadakis, G. E. (2021a). Physics-
informed neural networks (pinns) for fluid mechanics: A review. Acta Mechanica Sinica,
37(12):1727–1738.
[21] Cai, S., Wang, Z., Wang, S., Perdikaris, P., and Karniadakis, G. E. (2021b). Physics-
Informed Neural Networks for Heat Transfer Problems. Journal of Heat Transfer,
143(6):060801.
[22] Chen, R. T. Q., Rubanova, Y., Bettencourt, J., and Duvenaud, D. K. (2018). Neural
ordinary differential equations. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K.,
Cesa-Bianchi, N., and Garnett, R., editors, Advances in Neural Information Processing
Systems, volume 31. Curran Associates, Inc.
[23] Cloutier, J., Cosatto, E., Pigeon, S., Boyer, F., and Simard, P. (1996). Vip: an fpga-based
processor for image processing and neural networks. In Proceedings of Fifth International
Conference on Microelectronics for Neural Networks, pages 330–336.
[24] Cox, C. and Blanz, W. (1992). Ganglion-a fast field-programmable gate array implemen-
tation of a connectionist classifier. IEEE Journal of Solid-State Circuits, 27(3):288–299.
REFERENCES 81

[25] Crank, J. (1975). The Mathematics of Diffusion. Oxford science publications. Clarendon
Press.
[26] Cromer, A. (1981). Stable solutions using the Euler approximation. American Journal
of Physics, 49(5):455–459.
[27] Cuomo, S., Di Cola, V. S., Giampaolo, F., Rozza, G., Raissi, M., and Piccialli, F. (2022).
Scientific machine learning through physics–informed neural networks: where we are and
what’s next. Journal of Scientific Computing, 92(3):88.
[28] Dahmen, S. R. (2015). On pendulums and air resistance: the mathematics and physics
of denis diderot. The European Physical Journal H, 40:337–373.
[29] Davoudi, A., Malhotra, K. R., Shickel, B., Siegel, S., Williams, S., Ruppert, M.,
Bihorac, E., Ozrazgat-Baslanti, T., Tighe, P. J., Bihorac, A., et al. (2019). Intelligent icu
for autonomous patient monitoring using pervasive sensing and deep learning. Scientific
reports, 9(1):8020.
[30] Denil, M., Shakibi, B., Dinh, L., Ranzato, M., and de Freitas, N. (2013). Predicting
parameters in deep learning. In Proceedings of the 26th International Conference on
Neural Information Processing Systems - Volume 2, NIPS’13, page 2148–2156, Red Hook,
NY, USA. Curran Associates Inc.
[31] Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., and Fergus, R. (2014). Exploiting
linear structure within convolutional networks for efficient evaluation. Advances in neural
information processing systems, 27.
[32] Ding, Y., Wu, J., Gao, Y., Wang, M., and So, H. K.-H. (2023). Model-platform opti-
mized deep neural network accelerator generation through mixed-integer geometric pro-
gramming. In 2023 IEEE 31st Annual International Symposium on Field-Programmable
Custom Computing Machines (FCCM), pages 83–93.
[33] Eeckhout, L. (2017). Is moore’s law slowing down? what’s next? IEEE Micro,
37(04):4–5.
[34] El-Maksoud, A. J. A., Ebbed, M., Khalil, A. H., and Mostafa, H. (2021). Power efficient
design of high-performance convolutional neural networks hardware accelerator on fpga:
A case study with googlenet. IEEE Access, 9:151897–151911.
[35] Evans, L. (2010). Partial Differential Equations. Graduate studies in mathematics.
American Mathematical Society.
[36] Farlow, S. (1993a). Partial Differential Equations for Scientists and Engineers. Dover
books on advanced mathematics. Dover Publications.
[37] Farlow, S. (1993b). Partial Differential Equations for Scientists and Engineers. Dover
books on advanced mathematics. Dover Publications.
[38] Ferrer, D., Gonzalez, R., Fleitas, R., Acle, J., and Canetti, R. (2004). Neurofpga-
implementing artificial neural networks on programmable logic devices. In Proceedings
Design, Automation and Test in Europe Conference and Exhibition, volume 3, pages
218–223 Vol.3.
82 REFERENCES

[39] FLIR (2018). FLIR LEPTON® Engineering Datasheet.


[40] Guan, Y., Liang, H., Xu, N., Wang, W., Shi, S., Chen, X., Sun, G., Zhang, W., and Cong,
J. (2017). Fp-dnn: An automated framework for mapping deep neural networks onto
fpgas with rtl-hls hybrid templates. In 2017 IEEE 25th Annual International Symposium
on Field-Programmable Custom Computing Machines (FCCM), pages 152–159.
[41] Guo, K., Sui, L., Qiu, J., Yu, J., Wang, J., Yao, S., Han, S., Wang, Y., and Yang, H.
(2018). Angel-eye: A complete design flow for mapping cnn onto embedded fpga. IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems, 37(1):35–47.
[42] Guo, K., Zeng, S., Yu, J., Wang, Y., and Yang, H. (2019). [dl] a survey of fpga-based
neural network inference accelerators. ACM Trans. Reconfigurable Technol. Syst., 12(1).
[43] Han, S., Pool, J., Tran, J., and Dally, W. (2015). Learning both weights and connections
for efficient neural network. In Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., and
Garnett, R., editors, Advances in Neural Information Processing Systems, volume 28.
Curran Associates, Inc.
[44] Hao, Z., Liu, S., Zhang, Y., Ying, C., Feng, Y., Su, H., and Zhu, J. (2023). Physics-
informed machine learning: A survey on problems, methods and applications.
[45] Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., and Bengio, Y. (2016). Binarized
neural networks. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R., edi-
tors, Advances in Neural Information Processing Systems, volume 29. Curran Associates,
Inc.
[46] Jia, X., Zhang, Y., Liu, G., Yang, X., Zhang, T., Zheng, J., Xu, D., Wang, H., Zheng, R.,
Pareek, S., Tian, L., Xie, D., Luo, H., and Shan, Y. (2022). Xvdpu: A high performance cnn
accelerator on the versal platform powered by the ai engine. In 2022 32nd International
Conference on Field-Programmable Logic and Applications (FPL), pages 01–09.
[47] Jiang, X., Wang, D., Fan, Q., Zhang, M., Lu, C., and Lau, A. P. T. (2022). Physics-
informed neural network for nonlinear dynamics in fiber optics. Laser & Photonics
Reviews, 16(9):2100483.
[48] Jin, X., Cai, S., Li, H., and Karniadakis, G. E. (2021). Nsfnets (navier-stokes flow
nets): Physics-informed neural networks for the incompressible navier-stokes equations.
Journal of Computational Physics, 426:109951.
[49] Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S.,
Bhatia, S., Boden, N., Borchers, A., Boyle, R., Cantin, P.-l., Chao, C., Clark, C., Coriell,
J., Daley, M., Dau, M., Dean, J., Gelb, B., Ghaemmaghami, T. V., Gottipati, R., Gulland,
W., Hagmann, R., Ho, C. R., Hogberg, D., Hu, J., Hundt, R., Hurt, D., Ibarz, J., Jaffey,
A., Jaworski, A., Kaplan, A., Khaitan, H., Killebrew, D., Koch, A., Kumar, N., Lacy,
S., Laudon, J., Law, J., Le, D., Leary, C., Liu, Z., Lucke, K., Lundin, A., MacKean, G.,
Maggiore, A., Mahony, M., Miller, K., Nagarajan, R., Narayanaswami, R., Ni, R., Nix,
K., Norrie, T., Omernick, M., Penukonda, N., Phelps, A., Ross, J., Ross, M., Salek, A.,
Samadiani, E., Severn, C., Sizikov, G., Snelham, M., Souter, J., Steinberg, D., Swing, A.,
Tan, M., Thorson, G., Tian, B., Toma, H., Tuttle, E., Vasudevan, V., Walter, R., Wang,
W., Wilcox, E., and Yoon, D. H. (2017). In-datacenter performance analysis of a tensor
processing unit. SIGARCH Comput. Archit. News, 45(2):1–12.
REFERENCES 83

[50] Karniadakis, G. E., Kevrekidis, I. G., Lu, L., Perdikaris, P., Wang, S., and Yang, L.
(2021). Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440.
[51] Khan, A. and Lowther, D. A. (2022). Physics informed neural networks for electromag-
netic analysis. IEEE Transactions on Magnetics, 58(9):1–4.
[52] Kingma, D. and Ba, J. (2014). Adam: A method for stochastic optimization. Interna-
tional Conference on Learning Representations.
[53] Kingma, D. P. and Welling, M. (2022). Auto-encoding variational bayes.
[54] Korteweg, D. D. J. and de Vries, D. G. (1895). Xli. on the change of form of long
waves advancing in a rectangular canal, and on a new type of long stationary waves.
The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science,
39(240):422–443.
[55] Krishnapriyan, A., Gholami, A., Zhe, S., Kirby, R., and Mahoney, M. W. (2021). Char-
acterizing possible failure modes in physics-informed neural networks. In Beygelzimer,
A., Dauphin, Y., Liang, P., and Vaughan, J. W., editors, Advances in Neural Information
Processing Systems.
[56] Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with
deep convolutional neural networks. In Pereira, F., Burges, C., Bottou, L., and Weinberger,
K., editors, Advances in Neural Information Processing Systems, volume 25. Curran
Associates, Inc.
[57] Kuhn, K. J. (2009). Cmos scaling beyond 32nm: Challenges and opportunities. In
Proceedings of the 46th Annual Design Automation Conference, DAC ’09, page 310–313,
New York, NY, USA. Association for Computing Machinery.
[58] Lagaris, I., Likas, A., and Fotiadis, D. (1998). Artificial neural networks for solving
ordinary and partial differential equations. IEEE Transactions on Neural Networks,
9(5):987–1000.
[59] Leoni, P. C. D., Agarwal, K., Zaki, T. A., Meneveau, C., and Katz, J. (2023). Recon-
structing turbulent velocity and pressure fields from under-resolved noisy particle tracks
using physics-informed neural networks. Experiments in Fluids, 64(5).
[60] Levandosky, J. (2003). Math 220b lecture notes. https://web.stanford.edu/class/
math220b/handouts/HEATEQN.pdf. Accessed: 14/08/2023.
[61] Li, S., Wang, G., Di, Y., Wang, L., Wang, H., and Zhou, Q. (2023). A physics-informed
neural network framework to predict 3d temperature field without labeled data in process
of laser metal deposition. Engineering Applications of Artificial Intelligence, 120:105908.
[62] Liang, S., Yin, S., Liu, L., Luk, W., and Wei, S. (2018). Fp-bnn: Binarized neural
network on fpga. Neurocomputing, 275:1072–1086.
[63] Lienhard, IV, J. H. and Lienhard, V, J. H. (2019). A Heat Transfer Textbook. Dover
Publications, Mineola, NY, 5th edition.
84 REFERENCES

[64] Lim, J. and Stanley-Marbell, P. (2018). Newton: A language for describing physics.
CoRR, abs/1811.04626.
[65] Liu, D. C. and Nocedal, J. (1989). On the limited memory bfgs method for large scale
optimization. Mathematical programming, 45(1-3):503–528.
[66] Liu, Z., Dou, Y., Jiang, J., and Xu, J. (2016). Automatic code generation of convolutional
neural networks in fpga implementation. In 2016 International Conference on Field-
Programmable Technology (FPT), pages 61–68.
[67] Lu, L., Jin, P., Pang, G., Zhang, Z., and Karniadakis, G. E. (2021). Learning nonlinear
operators via deeponet based on the universal approximation theorem of operators. Nature
machine intelligence, 3(3):218–229.
[68] Ma, Y., Cao, Y., Vrudhula, S., and Seo, J.-s. (2017). Optimizing loop operation and
dataflow in fpga acceleration of deep convolutional neural networks. In Proceedings of
the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays,
FPGA ’17, page 45–54, New York, NY, USA. Association for Computing Machinery.
[69] Meech, J. T. and Stanley-Marbell, P. (2022). An algorithm for sensor data uncertainty
quantification. IEEE Sensors Letters, 6(1):1–4.
[70] Melexis (2018). mlx90640-library. https://github.com/melexis/mlx90640-library.
[71] Melexis (2019). MLX90640 32x24 IR array.
[72] Misyris, G. S., Venzke, A., and Chatzivasileiadis, S. (2020). Physics-informed neural
networks for power systems. In 2020 IEEE Power & Energy Society General Meeting
(PESGM), pages 1–5.
[73] Moin, A., Zhou, A., Rahimi, A., Menon, A., Benatti, S., Alexandrov, G., Tamakloe,
S., Ting, J., Yamamoto, N., Khan, Y., et al. (2021). A wearable biosensing system with
in-sensor adaptive machine learning for hand gesture recognition. Nature Electronics,
4(1):54–63.
[74] Moseley, B. (2021). harmonic-oscillator-pinn. https://github.com/benmoseley/
harmonic-oscillator-pinn.
[75] Moseley, B. (2022). Physics-informed machine learning: from concepts to real-world
applications. PhD thesis, University of Oxford.
[76] Moseley, B., Markham, A., and Nissen-Meyer, T. (2021). Finite basis physics-informed
neural networks (fbpinns): a scalable domain decomposition approach for solving differ-
ential equations.
[77] Moss, D. J. M., Nurvitadhi, E., Sim, J., Mishra, A., Marr, D., Subhaschandra, S., and
Leong, P. H. W. (2017). High performance binary neural networks on the xeon+fpga™
platform. In 2017 27th International Conference on Field Programmable Logic and
Applications (FPL), pages 1–4.
[78] Nakahara, H., Fujii, T., and Sato, S. (2017). A fully connected layer elimination for a
binarizec convolutional neural network on an fpga. In 2017 27th International Conference
on Field Programmable Logic and Applications (FPL), pages 1–4.
REFERENCES 85

[79] NXP Semiconductors (2021). I2C-bus specification and user manual. Rev. 7.0.
[80] Panasonic (2017). Infrared Array Sensor Grid-EYE (AMG88).
[81] Pappalardo, A. (2023). Xilinx/brevitas.
[82] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin,
Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison,
M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019).
Pytorch: An imperative style, high-performance deep learning library. In Advances in
Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc.
[83] Patterson, D. (2018). 50 years of computer architecture: From the mainframe cpu to
the domain-specific tpu and the open risc-v instruction set. In 2018 IEEE International
Solid - State Circuits Conference - (ISSCC), pages 27–31.
[84] Plotly Technologies Inc. (2015). Collaborative data science. https://plot.ly.
[85] Qasaimeh, M., Denolf, K., Lo, J., Vissers, K., Zambreno, J., and Jones, P. H. (2019).
Comparing energy efficiency of cpu, gpu and fpga implementations for vision kernels. In
2019 IEEE International Conference on Embedded Software and Systems (ICESS), pages
1–8.
[86] Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., Song,
S., Wang, Y., and Yang, H. (2016). Going deeper with embedded fpga platform for
convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International
Symposium on Field-Programmable Gate Arrays, FPGA ’16, page 26–35, New York, NY,
USA. Association for Computing Machinery.
[87] Raissi, M. and Karniadakis, G. E. (2018). Hidden physics models: Machine learning of
nonlinear partial differential equations. Journal of Computational Physics, 357:125–141.
[88] Raissi, M., Perdikaris, P., and Karniadakis, G. (2019). Physics-informed neural net-
works: A deep learning framework for solving forward and inverse problems involving
nonlinear partial differential equations. Journal of Computational Physics, 378:686–707.
[89] Rajkumar, R. R., Lee, I., Sha, L., and Stankovic, J. (2010). Cyber-physical systems:
The next computing revolution. In Proceedings of the 47th Design Automation Conference,
DAC ’10, page 731–736, New York, NY, USA. Association for Computing Machinery.
[90] Robert Bosch GmbH (2021). BNO055: Intelligent 9-axis absolute orientation sensor.
Rev. 1.8.
[91] Rohrhofer, F. M., Posch, S., and Geiger, B. C. (2021). On the pareto front of physics-
informed neural networks. CoRR, abs/2105.00862.
[92] Sahli Costabal, F., Yang, Y., Perdikaris, P., Hurtado, D. E., and Kuhl, E. (2020). Physics-
informed neural networks for cardiac activation mapping. Frontiers in Physics, 8:42.
[93] Savitzky, A. and Golay, M. J. E. (1964). Smoothing and differentiation of data by
simplified least squares procedures. Analytical Chemistry, 36:1627–1639.
86 REFERENCES

[94] Schwarting, W., Alonso-Mora, J., and Rus, D. (2018). Planning and decision-making
for autonomous vehicles. Annual Review of Control, Robotics, and Autonomous Systems,
1(1):187–210.
[95] Shen, J., Huang, Y., Wang, Z., Qiao, Y., Wen, M., and Zhang, C. (2018). Towards
a uniform template-based architecture for accelerating 2d and 3d cnns on fpga. In
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable
Gate Arrays, FPGA ’18, page 97–106, New York, NY, USA. Association for Computing
Machinery.
[96] Simonyan, K. and Zisserman, A. (2015). Very deep convolutional networks for large-
scale image recognition. In International Conference on Learning Representations.
[97] Stefan, J. (1891). Über die theorie der eisbildung, insbesondere über die eisbildung im
polarmeere. Annalen der Physik, 278(2):269–286.
[98] Strauss, W. (2008). Partial Differential Equations: An Introduction. Wiley.
[99] Suda, N., Chandra, V., Dasika, G., Mohanty, A., Ma, Y., Vrudhula, S., Seo, J.-s., and
Cao, Y. (2016). Throughput-optimized opencl-based fpga accelerator for large-scale
convolutional neural networks. In Proceedings of the 2016 ACM/SIGDA International
Symposium on Field-Programmable Gate Arrays, FPGA ’16, page 16–25, New York, NY,
USA. Association for Computing Machinery.
[100] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D.,
Vanhoucke, V., and Rabinovich, A. (2015). Going deeper with convolutions. In 2015
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–9, Los
Alamitos, CA, USA. IEEE Computer Society.
[101] Tessier, R., Pocek, K., and DeHon, A. (2015). Reconfigurable computing architectures.
Proceedings of the IEEE, 103(3):332–354.
[102] Tsoutsouras, V., Kaparounakis, O., Samarakoon, C., Bilgin, B., Meech, J., Heck, J.,
and Stanley-Marbell, P. (2022). The laplace microarchitecture for tracking data uncertainty.
IEEE Micro, 42(4):78–86.
[103] van Daalen, M., Jeavons, P., and Shawe-Taylor, J. (1993). A stochastic neural ar-
chitecture that exploits dynamically reconfigurable fpgas. In [1993] Proceedings IEEE
Workshop on FPGAs for Custom Computing Machines, pages 202–211.
[104] Venieris, S. I. and Bouganis, C.-S. (2016). fpgaconvnet: A framework for map-
ping convolutional neural networks on fpgas. In 2016 IEEE 24th Annual International
Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 40–47.
[105] Vipin, K. and Fahmy, S. A. (2018). Fpga dynamic and partial reconfiguration: A
survey of architectures, methods, and applications. ACM Comput. Surv., 51(4).
[106] Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau,
D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M.,
Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson,
E., Carey, C. J., Polat, İ., Feng, Y., Moore, E. W., VanderPlas, J., Laxalde, D., Perktold,
REFERENCES 87

J., Cimrman, R., Henriksen, I., Quintero, E. A., Harris, C. R., Archibald, A. M., Ribeiro,
A. H., Pedregosa, F., van Mulbregt, P., and SciPy 1.0 Contributors (2020). SciPy 1.0:
Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–
272.
[107] Wang, S., Wang, H., and Perdikaris, P. (2021). On the eigenvector bias of fourier
feature networks: From regression to solving multi-scale pdes with physics-informed
neural networks. Computer Methods in Applied Mechanics and Engineering, 384:113938.
[108] Wang, Y., Willis, S., Tsoutsouras, V., and Stanley-Marbell, P. (2019). Deriving
equations from sensor data using dimensional function synthesis. ACM Trans. Embed.
Comput. Syst., 18(5s).
[109] Willink, R. (2013). Measurement Uncertainty and Probability. Cambridge University
Press.
[110] Xiao, Q., Liang, Y., Lu, L., Yan, S., and Tai, Y.-W. (2017). Exploring heterogeneous
algorithms for accelerating deep convolutional neural networks on fpgas. In 2017 54th
ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1–6.
[111] Xilinx (2021). AXI IIC Bus Interface. v2.1.
[112] Zhang, C., Fang, Z., Zhou, P., Pan, P., and Cong, J. (2016). Caffeine: Towards
uniformed representation and acceleration for deep convolutional neural networks. In
2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages
1–8.
[113] Zhang, L., Yan, X., and Ma, D. (2022). A binarized neural network approach to
accelerate in-vehicle network intrusion detection. IEEE Access, 10:123505–123520.
[114] Zhou, A., Yao, A., Guo, Y., Xu, L., and Chen, Y. (2017). Incremental network
quantization: Towards lossless CNNs with low-precision weights. In International
Conference on Learning Representations.

You might also like