Sciadv Abi8605
Sciadv Abi8605
solution operator in the sense that the predicted output functions nonlinear, nonequilibrium processes across diverse applications in-
are not guaranteed to satisfy the underlying PDE. Recent efforts cluding engineering design and control, Earth System science, and
(16, 50–53) attempt to address some of these challenges by design- computational biology.
ing appropriate architectures and loss functions for learning dis-
cretized operators (i.e., maps between high-dimensional Euclidean
spaces). Although these approaches can relax the requirement for RESULTS
paired input-output training data, they are limited by the resolution of The proposed physics-informed DeepONet architecture is summa-
their underlying mesh discretization and, consequently, need modi- rized in Fig. 1. Motivated by the universal approximation theorem
fications to their architecture for different resolutions/discretizations for operators (35, 36), the architecture features two neural networks
to achieve consistent convergence [if at all possible, as demonstrated coined as the “branch” and “trunk” networks, respectively; the
in (32)]. automatic differentiation of which enables us to learn the solution
In this work, we aim to address the aforementioned challenges operator of arbitrary PDEs. The associated loss functions, perform
by exploring a simple yet remarkably effective extension of the ance metrics, computational cost, hyperparameters, and training
DeepONet framework (35). Drawing motivation from physics- details are discussed in the Supplementary Materials. In the following,
informed neural networks (14), we recognize that the outputs of a we demonstrate the effectiveness of physics-informed DeepONets
DeepONet model are differentiable with respect to their input co- across a series of comprehensive numerical studies for solving various
ordinates, therefore allowing us to use automatic differentiation types of parametric PDEs. A summary of the different benchmarks
(54, 55) to formulate an appropriate regularization mechanism for considered is presented in Table 1. It is worth emphasizing that, in
biasing the target output functions to satisfy the underlying PDE all cases, the proposed deep learning models are trained without any
constraints. This yields a simple procedure for training physics- paired input-output data, assuming only knowledge of the governing
DeepONet
Branch net PDE
Minimize
Losss
Fig. 1. Making DeepONets physics informed. The DeepONet architecture (35) consists of two subnetworks, the branch net for extracting latent representations of input
functions and the trunk net for extracting latent representations of input coordinates at which the output functions are evaluated. A continuous and differentiable rep-
resentation of the output functions is then obtained by merging the latent representations extracted by each subnetwork via a dot product. Automatic differentiation
can then be used to formulate appropriate regularization mechanisms for biasing the DeepONet outputs to satisfy a given system of PDEs. BC, boundary conditions;
IC, initial conditions.
reaction _
∂s
∂t
= D _
∂ s 2
2 + k s + u(x)
∂ x
responding reference solutions. Furthermore, we provide a com-
parison against the conventional DeepONet formulation recently put
Initial
Burgers’ _
∂s
∂t
+ s _
∂s
∂x
− ν _
2
∂ s
2 = 0 conditions
1.38 ± 1.64% forth by Lu et al. (35). This case necessitates observations of paired
x
input-output pairs [u(x), s(x, t)] to be provided as training data, as
Variable
Advection _
∂s
∂t
+ u _
∂s
∂x
= 0 coefficients
2.24 ± 0.68% no physical constraints are leveraged during model training. The
mean and SD of relative L2 errors of the conventional DeepONet
Domain
Eikonal ∥ ∇ s∥2 = 0 0.42 ± 0.11% and physics-informed DeepONet over the test dataset are visualized
geometries
in the bottom panel of Fig. 3. The average relative L2 error of
DeepONet and physics-informed DeepONet are ∼1.92 and ∼0.45%,
respectively. In contrast to the conventional DeepONet that is trained
unseen realizations that are not used during model training. Results on paired input-output measurements, the proposed physics-informed
Fig. 3. Solving a parametric diffusion-reaction system. (Top) Exact solution versus the prediction of a trained physics-informed DeepONet for a representative exam-
ple in the test dataset. (Bottom) Mean and SD of the relative L2 prediction error of a trained DeepONet (with paired input-output training data) and a physics-informed
DeepONet (without paired input-output training data) over 1000 examples in the test dataset. The mean and SD of the relative L2 prediction are ∼1.92 ± 1.12% (DeepONet)
and ∼0.45 ± 0.16% (physics-informed DeepONet), respectively. The physics-informed DeepONet yields ∼80% improvement in prediction accuracy with 100% reduction
in the dataset size required for training. Tanh, hyperbolic tangent; ReLU, rectified linear unit.
The average relative L2 error of the best trained model is ∼1.38% to = 0.1 and requires training on a large corpus of paired input-
(see figs. S17 to S19). The physics-informed DeepONet achieves the output data. Furthermore, visualizations corresponding to the worst
comparable accuracy compared to Fourier operator methods (34), example in the test dataset are shown in the top panels of Fig. 4.
albeit the latter has been only tested for a simpler case corresponding One can see that the predicted solution achieves a good agreement
against the reference solution, with a the relative L2 error of 3.30%. with the IBC
Here, we must also emphasize that a trained physics-informed
s(x, 0 ) = f(x) (9)
DeepONet model can rapidly predict the entire spatiotemporal solu-
tion of the Burgers equation in ∼10 ms. Inference with physics-
informed DeepONets is trivially parallelizable, allowing for the
s(0, t ) = g(t) (10)
solution of 𝒪(103) PDEs in a fraction of a second, yielding up to
three orders of magnitude in speed up compared to a conventional where f(x) = sin (x) and g(t ) = sin ( π_2 t). To make the input
spectral solver (58), see the bottom panel of Fig. 4. function u(x) strictly positive, we let u(x ) = v(x ) − min x v(x ) + 1,
Despite the promising results presented here, we must note the where v(x) is sampled from a GRF with a length scale l = 0.2. The
need for further methodological advances toward enhancing the ac- goal is to learn the solution operator G mapping variable coefficients
curacy and robustness of physics-informed DeepONets in tackling u(x) to associated solutions s(x, t) (see the Supplementary Materials
PDE systems with stiff, turbulent, or chaotic dynamics. For example, for more details).
we have observed that the predictive accuracy of physics-informed As shown in Fig. 5, the trained physics-informed DeepONet is
DeepONets degrades in regions where the PDE solution exhibits able to achieve an overall good agreement with the reference PDE
steep gradients; a behavior that is pronounced as the viscosity solution, although some inaccuracies can be observed in regions
parameter in the Burgers equation is further decreased (see fig. S20 where the solution exhibits steep gradients (similarly to the Burgers’
and table S11 for more details and quantitative results). We conjec- example discussed above; see additional visualizations presented in
ture that these issues can be tackled in the future by designing of fig. S21). The resulting relative L2 prediction averaged over all
more specialized architectures that are tailored to the dynamic be- examples in the test dataset is 2.24%, leading to the conclusion
havior of a given PDE, as well as more effective optimization algo- that physics-informed DeepONets can be effective surrogates for
rithms for training. advection-dominated PDEs.
where x = (x, y) ∈ ℝ2 denotes 2D spatial coordinates, and is an learned signed distance function and compare it with the exact air-
open domain with a piece-wise smooth boundary ∂. A solution to foil geometry. As shown in Fig. 6, the zero-level sets achieve a good
the above equation is a signed distance function measuring the dis- agreement with the exact airfoil geometries. One may conclude that
tance of a point in to the closest point on the boundary ∂, i.e. the proposed framework is capable of achieving an accurate approxi-
mation of the exact signed distance function. Additional systematic
{− d(x, ∂ Ω) if x ∈
d(x, ∂ Ω) if x ∈ Ω studies and quantitative comparisons are provided in the Supple-
s(x) =
Ω c mentary Materials (see figs. S23 to S25).
Fig. 5. Solving a parametric advection equation. Exact solution versus the prediction of a trained physics-informed DeepONet for a representative example in the
test dataset.
Fig. 6. Solving a parametric eikonal equation (airfoils). (Top) Exact airfoil geometry versus the zero-level set obtained from the predicted signed distance function for
three different input examples in the test dataset. (Bottom) Predicted signed distance function of a trained physics-informed DeepONet for three different airfoil geometries
in the test dataset.
physics-informed DeepONet architecture can be broadly applied in [b1, b2, …, bq]T ∈ ℝq as output, where u = [u(x1), u(x2), …, u(xm)]
science and engineering since PDEs are prevalent across diverse represents a function u ∈ 𝒰 evaluated at a collection of fixed loca-
problem domains including fluid mechanics, electromagnetics, quan- tions {x i}m
. The trunk network takes the continuous coordinates y
i=1
tum mechanics, and elasticity. However, despite the early promise as inputs and outputs a features embedding [t1, t2, …, tq]T ∈ ℝq. To
demonstrated here, numerous technical questions remain open and obtain the final output of the DeepONet, the outputs of the branch
require further investigation. Motivated by the successful application and trunk networks are merged together via a dot product. More
of Fourier feature networks (56), it is natural to ask the following: specifically, a DeepONet G prediction of a function u evaluated at
For a given parametric governing law, what is the optimal features y can be expressed by
embedding or network architecture of a physics-informed DeepONet? q
⏟
Recently, Wang et al. (61) proposed a multiscale Fourier feature G θ(u) (y) = ∑
b k(u(x 1 ) , u(
x 2 ) , t
… , u(x m ) ) k(y) (15)
network to tackle PDEs with multiscale behavior. Such an architec- k=1 branch trunk
ture may be potentially used as the backbone of physics-informed
DeepONets to learn multiscale operators and solve multiscale para- where denotes the collection of all trainable weight and bias
metric PDEs. Another question arises from the possibility of achieving parameters in the branch and trunk networks.
improved performance by assigning weights in the physics-informed Notice that the outputs of a DeepONet model are continuously
DeepONet loss function. It has been shown that these weights play differentiable with respect to their input coordinates. Therefore,
an important role in enhancing the trainability of constrained neural one may use automatic differentiation (54, 55) to formulate an
networks (62–64). Therefore, it is natural to ask the following: What appropriate regularization mechanism for biasing the target output
are the appropriate weights to use for training physics-informed functions to satisfy any given differential constraints.
DeepONets? How to design effective algorithms for accelerating Consequently, we may then construct a “physics-informed”
METHODS
DeepONets (35) present a specialized deep learning architecture that N Q m 2
encapsulates the universal approximation theorem for operators (36). ℒ physics(θ ) = ─1 ∑ ∑ ∑ ∣𝒩(u (i)(x ) , G (u (i) ) (y(i)
k θ r ,j ) ) ∣ (18)
NQm i=1j=1k=1
Here, we illustrate how DeepONets can be effectively applied to
learning the solution operator of parametric PDEs. Here, the termi- N
nology “parametric PDEs” refers to the fact that some parameters of Here, {u (i)}i=1
denotes N separate input functions sampled from 𝒰.
a given PDE system are allowed to vary over a certain range. These P
For each u(i), {y(i)
u,j }j=1
are P locations that are determined by the data
input parameters may include, but are not limited to, the shape of Q
the physical domain, the initial or boundary conditions, constant or observations, initial or boundary conditions, etc. Besides, {y(i) r ,j }j=1
is
variable coefficients (e.g., diffusion or reaction rates), source terms, a set of collocation points that can be randomly sampled in the do-
etc. To describe such problems in their full generality, let (𝒰, 𝒱, 𝒮) main of G(u(i)). As a consequence, ℒoperator() fits the available solu-
be a triplet of Banach spaces and 𝒩 : 𝒰 × 𝒮 → 𝒱 be a linear or non- tion measurements while ℒphysics() enforces the underlying PDE
linear differential operator. We consider general parametric PDEs constraints. Contrary to the fixed sensor locations of {x i}m
, we re-
i=1
taking the form P Q
mark that the locations of {y(i) and {y(i)
u,j }j=1 r ,j }j=1
may vary for different
𝒩(u, s ) = 0
(13) i, thus allowing us to construct a continuous representation of the
output functions s ∈ 𝒮. More details on how this general framework
where u ∈ 𝒰 denotes the parameters (i.e., input functions) and s ∈ can be adapted the different PDE systems presented in Results—
𝒮 denotes the corresponding unknown solutions of the PDE system. including the choice of neural network architectures, formulation
Specifically, we assume that, for any u ∈ 𝒰, there exists an unique of loss functions, and training details—are provided in the Supple-
solution s = s(u) ∈ 𝒮 to 13 (subject to appropriate IBCs). Then, we mentary Materials.
can define the solution operator G : 𝒰 → 𝒮 as
SUPPLEMENTARY MATERIALS
G(u) = s(u)
(14) Supplementary material for this article is available at https://science.org/doi/10.1126/
sciadv.abi8605
Following the original formulation of Lu et al. (35), we represent the
solution map G by an unstacked DeepONet G, where denotes all
trainable parameters of the DeepONet network. As illustrated in REFERENCES AND NOTES
1. R. Courant, D. Hilbert, Methods of Mathematical Physics: Partial Differential Equations
Fig. 1, the unstacked DeepONet is composed of two separate neural (John Wiley & Sons, 2008).
networks referred to as the branch and trunk networks, respectively. 2. T. J. Hughes, The Finite Element Method: Linear Static and Dynamic Finite Element Analysis
The branch network takes u as input and returns a features embedding (Courier Corporation, 2012).
3. D. J. Lucia, P. S. Beran, W. A. Silva, Reduced-order modeling: New approaches 32. Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, A. Anandkumar,
for computational physics. Prog. Aerosp. Sci. 40, 51–117 (2004). Neural operator: Graph kernel network for partial differential equations.
4. J. N. Kutz, S. L. Brunton, B. W. Brunton, J. L. Proctor, Dynamic Mode Decomposition: arXiv:2003.03485 (2020).
Data-Driven Modeling of Complex Systems (SIAM, 2016). 33. Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, A. Anandkumar,
5. P. Benner, M. Ohlberger, A. Patera, G. Rozza, K. Urban, Model Reduction of Parametrized Multipole graph neural operator for parametric partial differential equations.
Systems (Springer, 2017). arXiv:2006.09535 (2020).
6. W. H. Schilders, H. A. Van der Vorst, J. Rommes, in Model Order Reduction: Theory, Research 34. Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, A. Anandkumar,
Aspects and Applications (Springer, 2008), vol. 13. Fourier neural operator for parametric partial differential equations. arXiv:2010.08895
7. A. Quarteroni, G. Rozza, in Reduced Order Methods for Modeling and Computational (2020).
Reduction (Springer, 2014), vol. 9. 35. L. Lu, P. Jin, G. Pang, Z. Zhang, G. E. Karniadakis, Learning nonlinear operators via
8. I. Mezić, Spectral properties of dynamical systems, model reduction and decompositions. DeepONet based on the universal approximation theorem of operators. Nat. Mach. Intell.
Nonlinear Dyn. 41, 309–325 (2005). 3, 218–229 (2021).
9. B. Peherstorfer, K. Willcox, Data-driven operator inference for nonintrusive projection- 36. T. Chen, H. Chen, Universal approximation to nonlinear operators by neural networks
based model reduction. Comput. Methods Appl. Mech. Eng. 306, 196–215 (2016). with arbitrary activation functions and its application to dynamical systems. IEEE Trans.
10. A. J. Majda, D. Qi, Strategies for reduced-order models for predicting the statistical Neural Netw. 6, 911–917 (1995).
responses and uncertainty quantification in complex turbulent dynamical systems. SIAM 37. A. D. Back, T. Chen, Universal approximation of multiple nonlinear operators by neural
Rev. 60, 491–549 (2018). networks. Neural Comput. 14, 2561–2566 (2002).
11. T. Lassila, A. Manzoni, A. Quarteroni, G. Rozza, Model order reduction in fluid dynamics: 38. H. Kadri, E. Duflos, P. Preux, S. Canu, A. Rakotomamonjy, J. Audiffren, Operator-valued
Challenges and perspectives, in Reduced Order Methods for Modeling and Computational kernels for learning from functional response data. J. Mach. Learn. Res. 17, 1–54 (2016).
Reduction (Springer, 2014), pp. 235–273. 39. M. Griebel, C. Rieger, Reproducing kernel hilbert spaces for parametric partial differential
12. D. C. Psichogios, L. H. Ungar, A hybrid neural network-first principles approach to process equations. SIAM-ASA J. Uncertain. Quantif. 5, 111–137 (2017).
modeling. AIChE J. 38, 1499–1511 (1992). 40. H. Owhadi, Do ideas have shape? plato’s theory of forms as the continuous limit of
13. I. E. Lagaris, A. Likas, D. I. Fotiadis, Artificial neural networks for solving ordinary artificial neural networks. arXiv:2008.03920 (2020).
and partial differential equations. IEEE Trans. Neural Netw. 9, 987–1000 (1998). 41. N. H. Nelsen, A. M. Stuart, The random feature model for input-output maps between
63. S. Wang, X. Yu, P. Perdikaris, When and why PINNs fail to train: A neural tangent kernel 73. S. Wang, P. Perdikaris, Long-time integration of parametric evolution equations with
perspective. arXiv:2007.14527 (2020). physics-informed deeponets. arXiv:2106.05384 (2021).
64. L. McClenny, U. Braga-Neto, Self-adaptive physics-informed neural networks using a soft 74. S. M. Cox, P. C. Matthews, Exponential time differencing for stiff systems. J. Comput. Phys.
attention mechanism. arXiv:2009.04544 (2020). 176, 430–455 (2002).
65. J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula,
A. Paszke, J. VanderPlas, S. Wanderman-Milne, Q. Zhang, JAX: Composable Acknowledgments: We thank the developers of the software that enabled our research,
transformations of Python+NumPy programs (2018). including JAX (65), Matplotlib (66), and NumPy (67). Funding: This work received support from
66. J. D. Hunter, Matplotlib: A 2D graphics environment. IEEE Ann. Hist. Comput. 9, 90–95 DOE grant DE-SC0019116, AFOSR grant FA9550-20-1-0060, and DOE-ARPA grant DE-
(2007). AR0001201. Author contributions: S.W. and P.P. conceptualized the research and designed
67. C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, the numerical studies. S.W. and H.W. implemented the methods and conducted the numerical
E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, M. H. van Kerkwijk, experiments. P.P. provided funding and supervised all aspects of this work. All authors contributed
M. Brett, A. Haldane, J. F. del Río, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. Sheppard, in writing the manuscript. Competing interests: The authors declare that they have no
T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke, T. E. Oliphant, Array programming competing interests. Data and materials availability: All data needed to evaluate the conclusions
with numpy. Nature 585, 357–362 (2020). in the paper are present in the paper and/or the Supplementary Materials. All code and data
68. C. Rasmussen, C. Williams, Gaussian Processes for Machine Learning, Adaptive accompanying this manuscript are publicly available at https://doi.org/10.5281/zenodo.5206676
Computation and Machine Learning (MIT Press, 2006). and https://github.com/PredictiveIntelligenceLab/Physics-informed-DeepONets.
69. D. P. Kingma, J. Ba, Adam: A method for stochastic optimization. arXiv:1412.6980 (2014).
70. C. Finn, P. Abbeel, S. Levine, International Conference on Machine Learning (PMLR, 2017), Submitted 5 April 2021
pp. 1126–1135. Accepted 3 August 2021
71. A. Iserles, in A First Course in the Numerical Analysis of Differential Equations (Cambridge Published 29 September 2021
Univ. Press, 2009), no. 44. 10.1126/sciadv.abi8605
72. E. Haghighat, M. Raissi, A. Moure, H. Gomez, R. Juanes, A physics-informed deep learning
framework for inversion and surrogate modeling in solid mechanics. Comput. Methods Citation: S. Wang, H. Wang, P. Perdikaris, Learning the solution operator of parametric partial
Appl. Mech. Eng. 379, 113741 (2021). differential equations with physics-informed DeepONets. Sci. Adv. 7, eabi8605 (2021).
Science Advances (ISSN 2375-2548) is published by the American Association for the Advancement of Science. 1200 New York Avenue
NW, Washington, DC 20005. The title Science Advances is a registered trademark of AAAS.
Copyright © 2021 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim
to original U.S. Government Works. Distributed under a Creative Commons Attribution License 4.0 (CC BY).