0% found this document useful (0 votes)

16 views300 pages

Informed Machine Learning

Uploaded by

Zwe Yan Naing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views300 pages

Informed Machine Learning

Uploaded by

Zwe Yan Naing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Cognitive Technologies

Daniel Schulz
Christian Bauckhage Editors

Informed
Machine
Learning
Titles in this series now included in the Thomson Reuters Book Citation Index and
Scopus!
The Cognitive Technologies (CT) series is committed to the timely publishing
of high-quality manuscripts that promote the development of cognitive technolo-
gies and systems on the basis of artificial intelligence, image processing and
understanding, natural language processing, machine learning and human-computer
interaction.
It brings together the latest developments in all areas of this multidisciplinary
topic, ranging from theories and algorithms to various important applications. The
intended readership includes research students and researchers in computer science,
computer engineering, cognitive science, electrical engineering, data science and
related fields seeking a convenient way to track the latest findings on the founda-
tions, methodologies and key applications of cognitive technologies.
The series provides a publishing and communication platform for all cognitive
technologies topics, including but not limited to these most recent examples:
• Interactive machine learning, interactive deep learning, machine teaching
• Explainability (XAI), transparency, robustness of AI and trustworthy AI
• Knowledge representation, automated reasoning, multiagent systems
• Common sense modelling, context-based interpretation, hybrid cognitive tech-
nologies
• Human-centered design, socio-technical systems, human-robot interaction, cog-
nitive robotics
• Learning with small datasets, never-ending learning, metacognition and intro-
spection
• Intelligent decision support systems, prediction systems and warning systems
• Special transfer topics such as CT for computational sustainability, CT in
business applications and CT in mobile robotic systems
The series includes monographs, introductory and advanced textbooks, state-
of-the-art collections, and handbooks. In addition, it supports publishing in Open
Access mode.
Daniel Schulz • Christian Bauckhage
Editors

Informed Machine Learning

Editors
Daniel Schulz Christian Bauckhage
Research Center Machine Learning Fraunhofer Institute for Intelligent Analysis
Fraunhofer Institute for Intelligent Analysis and Information Systems
and Information Systems IAIS Sankt Augustin, Nordrhein-Westfalen,
Sankt Augustin, Nordrhein-Westfalen, Germany
Germany

ISSN 1611-2482 ISSN 2197-6635 (electronic)

Cognitive Technologies
ISBN 978-3-031-83096-9 ISBN 978-3-031-83097-6 (eBook)
https://doi.org/10.1007/978-3-031-83097-6

This work was supported by Fraunhofer “Center for Machine Learning” within the Fraunhofer “Cluster
of Excellence Cognitive Internet Technologies”.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 Inter-
national License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation,
distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons license and indicate if changes
were made.
The images or other third party material in this book are included in the book’s Creative Commons
license, unless indicated otherwise in a credit line to the material. If material is not included in the book’s
Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the
permitted use, you will need to obtain permission directly from the copyright holder.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

If disposing of this product, please recycle the paper.

Preface

The past decade has seen substantial progress in the field of Artificial Intelligence
(AI). This has primarily been due to the increasingly rapid developments in the
field of machine learning (ML) which, in turn, benefited from the confluence of
four technological trends: (1) availability of ever-increasing training data sets, (2)
comparatively cheap high-performance computing hardware, (3) open source code
sharing and access to software for model training or to pre-trained models, and (4)
theoretical and practical progress in deep learning and artificial neural networks. As
a consequence, there have been significant advancements, say, in natural language
processing, image/speech recognition, or autonomous systems. As a result of these
developments, AI has now made its way out of academic research into companies
and our daily lives.
Already the resulting economic impact is enormous. Practitioners in every sector,
from finance or medicine to logistics or administration, have begun using AI or
are planning for its introduction. Seemingly not a day goes by without the media
reporting on new AI applications and how these will transform economies and
societies on a level comparable to the industrial revolution.
However, there still are considerable challenges when it comes to harnessing the
full potential of AI in areas or domains outside of fully digitized industries.
A key feature of today’s cutting-edge AI technologies is their hunger for
resources. This is because modern ML models (deep neural networks) have become
incredibly large and complex and involve millions if not billions of adjustable
parameters. Their training therefore requires enormous amounts of data and con-
siderable computing infrastructures and therefore energy. Alas, in many industries
and application domains, data is still scarce or incomplete and there often is limited
access to large-scale high performance computing facilities.
But even if data availability, compute resources, and energy costs are not an
issue, model complexity may still pose challenges with respect to explainability,
accountability, or trustworthiness of AI solutions which can be dire in settings where
regulatory guidelines have to be met or safety guarantees must be ensured.
This is where the paradigm of Informed Machine Learning (Informed ML) comes
into play.

v
vi Preface

In a nutshell, the idea of Informed ML is to systematically leverage additional

prior knowledge for the design and training of data-driven AI models. The overall
goal is to use reliable background knowledge in order to, on the one hand, reduce
model complexity and the need for extensive training data and, on the other hand,
increase interpretability and explainability of the decisions made by trained models.
There are of course various possibilities for how to inject what kind of additional
knowledge into data-driven learning. It can consist of human expertise, scientific
insights, or simple common sense facts, all of which may be represented in different
forms, and these representations may enter the ML pipeline at various stages.
The contributions gathered in this volume illustrate the broad range of pos-
sibilities when working with different knowledge sources, representations, and
integration strategies. They largely assume an application-oriented perspective and
discuss working solutions for a wide range of industrial AI applications. We hope
readers will find them interesting, get an appreciation for the many practical benefits
of Informed ML, and find inspiration for their own work.

Sankt Augustin, Germany Daniel Schulz

August 2024 Christian Bauckhage
Contents

1 Introduction and Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Christian Bauckhage, Daniel Schulz, and Laura von Rueden
1.1 Introduction to Informed Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Historical Context and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Concept and Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Benefits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Part I Digital Twins

2 Optimizing Cooling System Operations with Informed ML
and a Digital Twin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Steffen Wallner, Thomas Bernard, and Christian Kühnert
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.2 Informed Machine Learning for Cooling System
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.3 Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Cooling System Description and Plant Operation . . . . . . . . . . . . . . . . . . 20
2.2.1 Components of the Cooling System . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.2 Sensors of the Cooling System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.3 Analysis of the Operation Strategy . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.4 Cooling Reserve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Modeling of the Plant Using Machine Learning . . . . . . . . . . . . . . . . . . . . 26
2.3.1 Submodels of the Cooling System . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.2 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.3 Training and Plausibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.4 Recalculation of the Entire Cooling System . . . . . . . . . . . . . . 32

vii
viii Contents

2.4 Optimization Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.4.1 Variable Switchpoint Temperature . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.2 Forecast Horizon. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4.3 Software Implementation as Assistance System . . . . . . . . . . 36
2.5 Conclusion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3 AITwin: A Uniform Digital Twin Interface for Artificial
Intelligence Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Alexander Diedrich, Christian Kühnert, Georg Maier, Joshua
Schraven, and Oliver Niggemann
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 ML/AI and the Digital Twin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 AI Reference Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.1 Synchronized Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.2 Prediction-Enabled Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.3 Causalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3.4 The AITwin Reference Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.4.1 Applying AITwin to a Four Tank Model . . . . . . . . . . . . . . . . . . 53
3.4.2 Applying AITwin to Tennessee Eastman Process . . . . . . . . . 53
3.4.3 Applying AITwin to a Quality Assurance Example . . . . . . 55
3.4.4 Applying AITwin to a Sensor-Based Sorting System. . . . . 55
3.5 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Part II Optimization
4 A Regression-Based Predictive Model Hierarchy for
Nonwoven Tensile Strength Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Dario Antweiler, Jan Pablo Burgard, Marc Harmening, Nicole
Marheineke, Andre Schmeißer, Raimund Wegener, and Pascal Welke
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.1.1 Literature Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.1.2 New Regression-Based Predictive Model Hierarchy . . . . . 66
4.1.3 Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2 First Principle Oriented Model Chain for Dataset Generation . . . . . 68
4.2.1 Fiber Graph Generation and Tensile Strength
Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2.2 Production Process Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2.3 Stress-Strain Curve Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2.4 Fiber Graph Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.2.5 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Contents ix

4.3 Linear Regression-Based Predictive Models . . . . . . . . . . . . . . . . . . . . . . . . 76

4.3.1 Linear Regression and Monte Carlo Simulations . . . . . . . . . 77
4.3.2 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.4 Sequential Predictive Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.4.1 Coupled Polynomial Regression and
Errors-In-Variabels Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.4.2 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5 Machine Learning for Optimizing the Homogeneity of
Spunbond Nonwovens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Viny Saajan Victor, Andre Schmeißer, Heike Leitte, and Simone
Gramsch
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3 Machine Learning-Based Optimization Workflow Using
Simulation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.3.1 Parameter Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.3.2 Data Collection with Knowledge Integration . . . . . . . . . . . . . 98
5.3.3 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.3.4 Training and Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.3.5 Homogeneity Optimization with Human Validation . . . . . . 107
5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.4.1 Models Evaluation Based on the Accuracy . . . . . . . . . . . . . . . 108
5.4.2 Models Evaluation Based on Computational
Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6 Bayesian Inference for Fatigue Strength Estimation . . . . . . . . . . . . . . . . . . . 113
Dorina Weichert, Elena Haedecke, Gunar Ernis, Sebastian Houben,
Alexander Kister, and Stefan Wrobel
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.2.1 Fatigue Testing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.2.2 Experimental Procedure and Analysis of the
Staircase Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.2.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.3 Informed Fatigue Strength Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.3.1 Overview of Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.3.2 Machine Learning Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.3.3 Bayesian Inference on the Distribution Parameters . . . . . . . 126
6.3.4 Details on the Overall Experimental Procedure . . . . . . . . . . . 129
6.4 Validation of Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
x Contents

7 Incorporating Shape Knowledge into Regression Models . . . . . . . . . . . . . 135

Miltiadis Poursanidis, Patrick Link, Jochen Schmid, and Uwe Teicher
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
7.3.1 SIASCOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
7.3.2 ISI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
7.4 Application Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
7.4.1 Press Hardening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.4.2 Brushing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
7.4.3 Milling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.5 Synthetic Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

Part III Neural Networks

8 Predicting Properties of Oxide Glasses Using Informed
Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Gregor Maier, Jan Hamaekers, Dominik-Sergio Martilotti, and
Benedikt Ziebarth
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
8.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
8.2.1 Data Collection and Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
8.2.2 Model Setups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.2.3 Model Training and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
8.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
8.4 Conclusion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9 Graph Neural Networks for Predicting Side Effects and New
Indications of Drugs Using Electronic Health Records . . . . . . . . . . . . . . . . 187
Jayant Sharma, Manuel Lentzen, Sophia Krix, Thomas Linden,
Sumit Madan, Van Dinh Tran, and Holger Fröhlich
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
9.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
9.2.1 Overview About Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
9.2.2 Code Normalization and Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 190
9.2.3 Initial Knowledge Graph Construction . . . . . . . . . . . . . . . . . . . . 191
9.2.4 Extended Knowledge Graph Construction . . . . . . . . . . . . . . . . 191
9.2.5 Relation Aware Graph Attention Networks . . . . . . . . . . . . . . . 193
9.2.6 Evaluation against Alternative Methods . . . . . . . . . . . . . . . . . . . 195
9.2.7 Performance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Contents xi

9.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

9.3.1 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
9.3.2 Use Case: Trazodone in the Treatment of Bipolar
Disorder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
9.3.3 Predicted Side Effects of Marketed Drugs . . . . . . . . . . . . . . . . 200
9.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
9.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
10 On the Interplay of Subset Selection and Informed Graph
Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Niklas Breustedt, Paolo Climaco, Jochen Garcke, Jan Hamaekers,
Gitta Kutyniok, Dirk A. Lorenz, Rick Oerder, and Chirag Varun
Shukla
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
10.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
10.3 Methods and Sampling Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
10.3.1 SchNet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
10.3.2 Kernel Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
10.3.3 Spatial 3-Hop Convolution Network . . . . . . . . . . . . . . . . . . . . . . 214
10.3.4 Graph Rate-Distortion Explanations . . . . . . . . . . . . . . . . . . . . . . . 216
10.3.5 Sampling Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
10.4 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
10.4.1 QM9 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
10.4.2 SchNet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
10.4.3 Kernel Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
10.4.4 Spatial 3-Hop Convolution Network . . . . . . . . . . . . . . . . . . . . . . 226
10.4.5 Explanation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
10.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
11 Informed Machine Learning Aspects for the Multi-Agent
Neural Rewriter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Nathalie Paul, Tim Wirtz, Stefan Wrobel, and Alexander Kister
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
11.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
11.2.1 Informed Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
11.3 Multi-Agent Neural Rewriter (MANR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
11.3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
11.3.2 Game Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
11.3.3 Game Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
11.3.4 Game Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
11.4 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
11.4.1 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
11.4.2 Experiment Results for the MANR . . . . . . . . . . . . . . . . . . . . . . . . 251
11.4.3 Transfer Learning Investigations. . . . . . . . . . . . . . . . . . . . . . . . . . . 254
xii Contents

11.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

Part IV Hybrid Methods

12 Training Support Vector Machines by Solving Differential
Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Christian Bauckhage and Rafet Sifa
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
12.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
12.1.2 Mathematical Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
12.2 Setting the Stage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
12.2.1 L2 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
12.2.2 Invoking the Kernel Trick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
12.2.3 A Baseline Training Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
12.3 Gradient Flows for L2 SVM Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
12.4 Practical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
12.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
13 Informed Machine Learning to Maximize Robustness and
Computational Performance of Linear Solvers . . . . . . . . . . . . . . . . . . . . . . . . . 285
Sebastian Gries
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
13.2 Short Overview on Linear Solvers in Numerical Simulations . . . . . 289
13.3 Genetic Optimization of Parameters with Tree Hierarchy . . . . . . . . . 291
13.4 Pre-evolution via Surrogate Learning Model . . . . . . . . . . . . . . . . . . . . . . . 293
13.5 Online vs. Offline Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
13.6 Reproducibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
13.7 Controlling Solver Setup Reusage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
13.8 Results: Informed Machine Learning for Linear Solver
Parameters in Various Practical Applications . . . . . . . . . . . . . . . . . . . . . . . 299
13.8.1 Mere Parameter Optimization: Single Reservoir
Simulation Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
13.8.2 Parameter Optimization: Linear Elasticity Problem . . . . . . 301
13.8.3 Setup Reusage: Sequence of Reservoir
Simulation Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
13.8.4 Full Simulation Result: Reservoir Application
(SPE10) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
13.8.5 Full Simulation Result: Groundwater Application . . . . . . . . 305
13.8.6 Full Simulation Result: Computational Fluid
Dynamics Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
13.8.7 Full Simulation Result: Battery Aging Simulation . . . . . . . 308
13.9 Conclusions and Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
Contents xiii

14 Anomaly Detection in Multivariate Time Series Using

Uncertainty Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
Moritz Müller, Gunar Ernis, and Michael Mock
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
14.2 Background and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
14.2.1 Problem Formulation and Anomaly Categorization . . . . . . 315
14.2.2 Unsupervised Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . 317
14.2.3 Bayesian Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
14.2.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
14.3 Detecting Anomalies in Time Series Using Uncertainty
Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
14.3.1 Window Processing and Forecast Modelling . . . . . . . . . . . . . . 320
14.3.2 Formalization of Multivariate Anomaly Detection . . . . . . . 322
14.3.3 Anomaly Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
14.3.4 Anomaly Threshold Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
14.4 Experimental Setup and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
14.4.1 Skoltech Anomaly Benchmark Data Set . . . . . . . . . . . . . . . . . . 326
14.4.2 Experimental Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
14.4.3 Evaluation Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
14.4.4 Discussion of Utilized Anomaly Detection Metrics . . . . . . 329
14.4.5 Experimental Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . 330
14.5 Discussion of Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
14.5.1 Quantile Based Threshold Versus Tabulated
Control Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
14.5.2 Competitiveness to Recent Work . . . . . . . . . . . . . . . . . . . . . . . . . . 337
14.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
Chapter 1
Introduction and Overview

Christian Bauckhage, Daniel Schulz, and Laura von Rueden

Abstract Informed Machine Learning (Informed ML) refers to the idea of injecting
additional prior knowledge into data-driven learning systems. Such knowledge can
be given in various forms such as scientific equations or logic rules which provide
relevant information about a problem domain or task at hand. Integrating prior
knowledge at various stages of the machine learning pipeline can help to improve
generalization and trustworthiness. Specifically, Informed ML can help to train
models when training data is scarce or to ensure conformity with regulations or
safety demands.
In this introductory chapter, we briefly explain the concept of Informed ML,
provide an overview of the chapters in this book, and categorize the contributed
research and results with respect to a taxonomy of Informed ML.

1.1 Introduction to Informed Machine Learning

Over the past couple of years, Artificial Intelligence (AI) has finally found its way
into the consciousness of a wider public and into the reporting of the mainstream
media. On the one hand, this is not surprising as the capabilities of modern
(generative) AI tools are astounding and will likely disrupt societies and economies.
On the other hand, we said “finally” because the scientific discipline of AI has a
long and venerable history which largely went unnoticed except by its practitioners,
science fiction authors, and filmmakers. Yet, a brief look at this history can provide
context and motivation for what this book on Informed Machine Learning is all
about.

C. Bauckhage · D. Schulz (✉) · L. von Rueden

Fraunhofer IAIS, Sankt Augustin, Germany
e-mail: [email protected]; [email protected];
[email protected]

D. Schulz, C. Bauckhage (eds.), Informed Machine Learning,
Cognitive Technologies, https://doi.org/10.1007/978-3-031-83097-6_1
2 C. Bauckhage et al.

1.1.1 Historical Context and Motivation

Ideas for computational intelligence date back to the 1940s when the first electronic
computers became available, and people began researching how to equip these
“electronic brains” with thinking capabilities. The roots of Machine Learning
(ML) date back to this time, too: McCulloch and Pitts devised a mathematical
model of neural computation in 1943, Turing coined the term Machine Learning
in his 1948 work on “learning machinery”, Hebb thought of associative learning
in 1949, and Rosenblatt introduced perceptron learning in 1957. Despite this
immediate appearance of the idea of (neural) learning systems, early AI research
was dominated by methods based on symbolic logic and logical inference. In the
1950s, pioneers like Shannon, McCarthy, or Minsky thought about computer chess,
logical programming languages, and automated theorem proving. The 1960s saw the
emergence of knowledge-based systems, the development of a rule-based chatbot
(Weizenbaum’s ELIZA) and, importantly, the publications of a book by Minsky
and Pappert which was largely read as a discouragement of further neural networks
research. Indeed, AI research in the 1970s was dominated by work on rule- or
knowledge-based expert systems and neurocomputing resurfaced only in the 1980s
when the back-propagation algorithm was independently discovered several times
and finally allowed for a consistent, data-driven training neural network models.
In the 1990s, there were thus two major paradigms: knowledge-based deduction
which largely relied on hand-crafted rules for planning and decision making and
example- or data-driven learning which often involved features engineered by
experts and was mainly used for pattern recognition. Both approaches worked
reasonably well at the time (albeit not well enough for the public to take notice)
but seemed to be irreconcilable. Indeed, there were numerous issues pertaining
to the problem of the semantic gap between observations (data) and symbolic
representations (abstract concepts and their relations) and the question of whether
learning-based systems can perform symbolic inference.
These issues remained unresolved until the late 2000s when the availability of
massive amounts of data (on the Web), affordable high-performance computing
(GPUs), and open-source libraries for neural network training kickstarted what
has become known as the deep learning revolution. Ever since, the remarkable
capabilities of large-scale end-to-end machine learning models across a wide range
of domains, such as computer vision, speech recognition, text understanding, or
gaming, have become common lore [4, 14, 19, 31, 34] and deep neural networks
have begun to revolutionize engineering and the sciences [6, 7, 20].
These achievements are rooted in systematic big data analysis which allows
learning algorithms to draw insights from- or identify pattern in billions of
(input/output) examples. However, these achievements also come at a cost.
First, modern (foundation) models require massive amounts of data and compute
resources for their training. These are not always available or at least not to
everybody. Moreover, insufficient data can hinder the training of well-performing
and generalizing models and miss out on constraints or easily explained facts such
1 Introduction and Overview 3

as those imposed by natural laws or regulatory guidelines, which are essential for
ensuring trustworthy Artificial Intelligence (AI) [5].
Second, as machine learning models are becoming more and more complex,
demands for explainability and trustworthiness are growing [29]. This poses a
challenge for deep learning solutions as massive neural networks are essentially
black boxes whose internal decision making processes involving several billions of
adjustable parameters are largely intractable even to experts in the field.
This has spurred research into enhancing machine learning models through by
means of hybrid approaches which integrating reliable prior knowledge into the
data-driven learning process. While one could argue that such an integration of
knowledge into learning is common through techniques such as example selection,
data labelling, or feature engineering, hybrid learning is supposed to go beyond such
measures and to incorporate more profound knowledge and formal representations.
For instance, researchers have explored the inclusion of logic rules [10, 40] and
algebraic equations [18, 32] as a means of constraining loss functions. Another
example are knowledge graphs which have been utilized to equip neural networks
with information about relationships between instances, particularly relevant in
image classification [17, 23]. Last but not least, physical simulations are now playing
an increasingly important role in enriching training data [8, 21, 27].
To refer to these methods under a single umbrella term, the designation
“Informed Machine Learning” (Informed ML) has been proposed [36]. This concept
describes the systematic fusion of data-driven and knowledge-driven approaches
and is gaining momentum as an avenue for further advancements in Artificial
Intelligence.

1.1.2 Concept and Taxonomy

From a very abstract point of view, the main idea of Informed ML is to inject
additional prior knowledge into data-driven learning as illustrated in Fig. 1.1.
Such prior knowledge is usually specific to the application context and task at
hand. For example, fundamental, well established scientific- or medical knowledge
can inform the modeling process for applications in the domains of material science
or healthcare (e.g., Chaps. 8 and 9 of this volume). Basic knowledge like this
often exists independently and in parallel to the practically gathered data samples a
machine learning system uses for training and thus constitutes a valuable additional
source of information.
Knowledge about an application context or domain is often available as formal
representations like (logic) rule bases, equations describing insights in the natural
sciences, or knowledge graphs. For example, in Chap. 8, scientific equations from
material sciences are used and, in Chap. 9, a knowledge graph is used to improve
healthcare analytics.
In Informed ML approaches, such formalized representations of prior knowledge
are injected into the ML pipeline. In general, this can happen at various stages
4 C. Bauckhage et al.

Fig. 1.1 Schematic illustration of where data independent, prior knowledge can be integrated into
the machine learning pipeline. Diagram adapted from [36]

of this pipeline. Knowledge can, for instance, inform the selection of training
data, the design of the model architecture, the choice of learning algorithm, or the
final model. Further dimensions for categorizing Informed ML approaches are the
sources for- and the representations of additional knowledge. The former can be
scientific facts and known laws of nature, general world knowledge about history,
politics, economy, society and the like, or specific expert knowledge about, say,
organizations, products, or markets. The latter typically comprise representations
in form of algebraic equations, differential equations, logic rules, invariances,
probabilistic relations, knowledge graphs, or simulations of real world phenomena.
Being conceptualized this broadly, Informed ML therefore includes more specific
paradigms such as Neuro-Symbolic ML [11, 15, 16] or Neuro-Mechanistic ML
[12, 25] which—as their names suggest—focus on hybrid modeling centered
around neural networks. Indeed, the above three dimensions of knowledge source,
knowledge representation, and knowledge integration are deliberately general and
have been used to devise a taxonomy of the field of Informed ML [36]. It resulted
from an extensive literature survey of more than 150 scientific reports on hybrid
learning and allows for a more fine-grained categorization of how different solutions
or frameworks integrate knowledge into various data-driven learning approaches.
For instance, Table 1.1 shows how the different chapters of this book can be
categorized with respect to this taxonomy.
Looking at this table, it becomes apparent that there exists a wide spectrum
of combinations of knowledge sources and representations and stages where
knowledge is integrated into the machine learning pipeline. This naturally goes
hand in hand with a variety of system designs and processing modules which
seems to be in stark contrast to modern end-to-end learning systems which may
have different (neural) architectures but are fairly standardized when it comes to
information processing and information flow. The obvious question is then what are
the particular benefits that make it worthwhile to design and apply Informed ML
systems?
1 Introduction and Overview 5

1.1.3 Benefits

One of the main benefits of Informed ML is that the use of additional knowledge
about what is to be learned can allow for reducing the number adjustable parameters
(degrees of freedom) of a machine learning model as well as for restricting the
ranges of parameter values; in short, it can help to reduce model sizes and restrict
search spaces.
This is of particular interest in practical settings where training data is scarce
as the generalization capabilities of very large models typically correlate with the
amount of data they have processed during training. At first sight, it seems peculiar
to point to situations where data is scarce; after all, we are living in the age of (very)
big data and modern foundation models are now being trained on data sets in the
petabyte range. However, not all industries and organizations that want to enhance
production and business with Artificial Intelligence have such massive data at their
disposal. On the contrary, hardly any player outside of the IT sector has access to
such vast amounts of data and not everybody can fine-tune available foundation
models to their needs. Put differently, lack of data can prevent modern general
purpose architectures from generalizing well and performing reliably. Problem
specific informed architectures, on the other hand, may achieve these goals from
training with substantially less data.
Similar aspects of where Informed ML may lead to improvements are of
economical and environmental nature. While modern foundation models whose
hundreds of billions of parameters are trained on vast amounts of (multi-modal) data
are capable of remarkable feats, there are growing concerns as to the sustainability
of this current paradigm. On the one hand, the energy demands for transformer-
model training have reached levels which are difficult to justify in times of global
warming [39]. On the other hand, there now are signs of diminishing returns of
training ever ore complex models with ever growing data sets [33].
Again, an appropriate use of additional knowledge for tailoring learning systems
to specific contexts or resources may lead to smaller models with reduced training
efforts and thus reduced energy consumption. Moreover, it may even lead to novel
training procedures or algorithms which could run on resource efficient hardware
such as, say, FPGAs or reemerging analog computers. An example for the latter
is found in Chap. 12 which proposes to train simple classifier by means of solving
differential equations in a manner that could be implemented using energy efficient
analog circuits.
Finally, there are the aspects of explainability, accountability, and trustworthiness
of AI models and their alignment with human intention. We all have heard anecdotes
of hallucinating large language models or of vision systems which recognize, say,
trains because they learned and trains and railroad tracks go together and implicitly
infer trains from the presence of tracks. Then there are reinforcement learning
systems which were supposed to determine train schedules with minimal risk of
accidents and concluded that the best way of avoiding train collisions is not to
have trains riding at all. While these examples are silly, they illustrate the potential
6 C. Bauckhage et al.

(or, more daringly, the “importance”) of informed learning: fact checking against
knowledge bases, carefully curated training data, or expertly formulated learning
goals, i.e. the integration of knowledge at different stages of the ML pipeline, can
circumvent issues like these.
It is obvious that AI solutions for real world applications in most industrial
sectors must be reliable and their decisions must be in line with regulatory guidelines
and the kind of reasoning that is explainable to- or interpretable by human experts.
Modern end-to-end deep learning poses challenges in these regards. Decisions made
by purely data-driven models with (hundreds of) billions of parameters are typically
opaque and hardly ever tractable and can lead to unintended results in down-stream
processing. This, in turn, may cause accidents or costly mistakes or may even
prevent the use of learning-based AI in scenarios where there are legal requirements
with respect to the transparency of decision making processes. Informed ML with
knowledge-driven models (as in Chap. 4) or knowledge-based data augmentation
(as in Chap. 10) can circumvent such shortcomings.

1.2 Overview

Above, we emphasized that Informed Machine Learning approaches are typically

tailored to specific contexts or problem domains so that there exists a plethora of
knowledge-integration methods. The Informed Machine Learning taxonomy in [36]
systematically structures the vast landscape of hybrid techniques which integrate
data- and knowledge-driven models using the broad categories of knowledge source,
knowledge representation, and knowledge integration. These, in turn, are further
refined into fifteen subcategories (see Table 1.1) so that the taxonomy covers a wide
spectrum of combinations of knowledge sources, representations, and integration
strategies. The contributions gathered in this volume emphatically illustrate the
variety of possibilities. They report applied- and basic research on Informed
Machine Learning and account for various methodologies and the kinds of results
they allow for. In the following, we provide a short overview over the chapters of this
book and classify their contributions according to the Informed Machine Learning
taxonomy as summarized in Table 1.1. Furthermore, we sorted and arranged all
chapters after the methods they rely on respectively their area of application. This
results into four parts, namely “Digital Twins”, “Optimization”, “Neural Networks”
and “Hybrid Methods”.
Part I: Digital Twins
In Chap. 2, Wallner et al. [37] are concerned with energy optimal climate control
(cooling) for data centers, industrial plants, or office buildings. They describe how
to generate data-driven digital twins for cooling systems which can predicting
the effects of adjusting control parameters and, when combined with monitoring
capabilities, allow operators to make informed decisions for adjustments. Their
Table 1.1 Overview of book parts and chapters. Each chapter employs a different Informed ML strategy. We categorize them with respect to the taxonomy in [36], which
considers knowledge sources, knowledge representation, and stages where knowledge is integrated into the ML pipeline
Informed ML approach
Source Representation Integration
Scientific World Expert Algebraic Differential Simulation Spatial Logic Knowledge Probabilistic Human Training Hypothesis Learning Final
Part Chapter knowledge knowledge knowledge equations equations results invariances rules graphs relations feedback data set algorithm hypothesis
Digital Twins 2 Optimizing Cooling System ✓ ✓ ✓ ✓
Operations with Informed
ML and a Digital Twin
3 AITwin - A Uniform Digital ✓ ✓ ✓ ✓ ✓
Twin Interface for Artificial
Intelligence Applications
Optimization 4 A Regression-Based Predictive ✓ ✓ ✓ ✓ ✓ ✓
Model Hierarchy for
Nonwoven Tensile Inference
5 Machine Learning for Optimizing ✓ ✓ ✓ ✓ ✓
the Homogeneity of
Spunbond Nonwovens
6 Bayesian Inference for Fatigue ✓ ✓ ✓
Strength Estimation
7 Incorporating Shape Knowledge ✓ ✓ ✓ ✓
into Regression Models

(continued)
Table 1.1 (continued)
Informed ML approach
Source Representation Integration
Scientific World Expert Algebraic Differential Simulation Spatial Logic Knowledge Probabilistic Human Training Hypothesis Learning Final
Part Chapter knowledge knowledge knowledge equations equations results invariances rules graphs relations feedback data set algorithm hypothesis
Neural 8 Predicting Properties of Oxide ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
N

Informed Machine Learning

Uploaded by

Informed Machine Learning

Uploaded by

Cognitive Technologies

Informed Machine Learning

ISSN 1611-2482 ISSN 2197-6635 (electronic)

If disposing of this product, please recycle the paper.

In a nutshell, the idea of Informed ML is to systematically leverage additional

Sankt Augustin, Germany Daniel Schulz

1 Introduction and Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Part I Digital Twins

2.4 Optimization Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3 Linear Regression-Based Predictive Models . . . . . . . . . . . . . . . . . . . . . . . . 76

7 Incorporating Shape Knowledge into Regression Models . . . . . . . . . . . . . 135

Part III Neural Networks

9.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

11.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

Part IV Hybrid Methods

14 Anomaly Detection in Multivariate Time Series Using

Christian Bauckhage, Daniel Schulz, and Laura von Rueden

1.1 Introduction to Informed Machine Learning

C. Bauckhage · D. Schulz (✉) · L. von Rueden

© The Author(s) 2025 1

1.1.1 Historical Context and Motivation

1.1.2 Concept and Taxonomy

Above, we emphasized that Informed Machine Learning approaches are typically