Graph Neural Networks
Graph Neural Networks
1038/s43586-024-00294-7
Abstract Sections
iGraphs are flexible mathematical objects that can represent many Introduction
entities and knowledge from different domains, including in the life Experimentation
sciences. Graph neural networks (GNNs) are mathematical models Results
that can learn functions over graphs and are a leading approach for
Applications
building predictive models on graph-structured data. This combination
Reproducibility and data
has enabled GNNs to advance the state of the art in many disciplines,
deposition
from discovering new antibiotics and identifying drug-repurposing
Limitations and optimizations
candidates to modelling physical systems and generating new
molecules. This Primer provides a practical and accessible introduction Outlook
1
CSAIL, MIT, Cambridge, MA, US. 2School of CIT, TU Munich, Munich, Germany. 3These authors contributed
equally: Gabriele Corso, Hannes Stark. e-mail: gcorso@[Link]; hstark@[Link]; regina@[Link]
HIV
inhibition
Fig. 1 | Molecular property prediction example: given a molecule, a GNN c, In the second message-passing step, the example node receives a message
predicts its ability to inhibit HIV replication. a, A molecule, for example from one of its neighbours that received messages from more distant nodes
specified by its simplified molecular-input line-entry system string, is converted in the previous layers. The representation now contains information from a
into a graph, and the node representations are initialized as vectors describing two-hop neighbourhood. d, The representations of all nodes are aggregated
the atom. b, In the first message-passing layer, an example node receives into a single vector by summing them. e, A feedforward neural network takes the
messages from its neighbours and updates its representation based on them. produced vector representation and outputs a single logit to classify whether
Its representation now contains information from a one-hop neighbourhood. the molecule inhibits HIV replication. GNN, graph neural network.
learnable linear transformationW (l ) and ReLU non-linearity. Therefore, One approach to parallelize these operations is to store edge informa-
the mathematical operation performed to update the representation tion and messages as pairwise matrices, which can be efficiently trans-
(l )
hi of node i at layer ℓ can be written as: formed via matrix and dot products. However, this method would incur
a runtime and memory complexity of O(|V |2), which is quadratic in the
(l +1) (l ) (l )
hi = ReLU (W (l +1)(hi + ∑ hj )) number of nodes, and for large sparse graphs it can be substantially
j
larger than O(|E|).
Final prediction Instead, if the graph is represented in a sparse matrix data
After the final message-passing layer, the representations of indi- structure or adjacency list, computations can be parallelized while
vidual nodes are aggregated and transformed to make task-specific maintaining the O(|E|) complexity. For large structures, such as the
predictions, as different problems may require outputs at different atomic resolution graph of a protein (typically 1,000–10,000 atoms)
scales. or large knowledge graph (>100,000 nodes7), sparse implementations
For example, the molecular property prediction task is a graph- enable the data to fit in memory, meaning the computation can be
level problem in which to make a single prediction for the graph, the completed orders of magnitude faster. As these sparse computations
representations of all nodes — whose count may vary across molecules — require careful implementation, specialized libraries for GNNs have
must be aggregated into a fixed-size vector that represents the whole been developed. The most widely used include PyTorch Geometric8
molecule. In the provided implementation, after four message-passing and Deep Graph Library9 for general graphs, Chemprop10 for molecular
layers, the final prediction (likelihood of inhibition) is reached by sum- graphs, and e3nn11 for 3D geometric graphs. These libraries provide
ming the nodes’ features in the last layer and passing their sum through instantiations of existing models, simplify the implementation of
a linear layer with output dimension 1. novel architectures (see Jupyter notebook on the GitHub repository
By contrast, in node-level tasks, such as the functional charac- provided) and give access to datasets and auxiliary tools, such as
terization of proteins in a protein interaction network, the node rep- featurization.
resentations after the message-passing layers can be directly used as
outputs for prediction. Finally, the most common class of edge-level Data format and splitting
tasks is link prediction, in which the model is trained to predict miss- When using deep neural networks like GNNs, a key question is whether
ing edges in the graph, for example knowledge graph completion or the features learned from the training data will generalize to real-world
recommendation systems. For this, a classifier is typically trained by scenarios. To tackle this question without collecting additional data,
aggregating the final representations of the two nodes or surrounding splitting data between training and testing is critical. For graphs,
subgraphs connected by the edge in question. data-splitting approaches differ between inductive and transductive
settings.
Efficient implementation
In many applications, GNNs are run on graphs with thousands or mil- Inductive tasks. Inductive tasks closely resemble the common para-
lions of nodes. In such cases, efficient sparse implementations of digm of machine learning problems in which the training, validation
message-passing are necessary to run training and inference in a rea- and testing datasets involve separate objects. Each set contains differ-
sonable time. The computational complexity of a message-passing ent graphs over which the GNNs are trained and evaluated. Molecular
layer is O(|E| + |V |) = O(|E|), in which O indicated the big O notation, or property prediction is a common example, as models are trained and
linear in the number of edges, as messages have to be computed for tested on different sets of molecules. Deciding how to split the graphs
every edge and in connected graphs |E| ≥ |V | − 1. To run these operations between the different sets often requires domain expertise. In drug dis-
efficiently on graphics processing unit or tensor processor unit hard- covery, although labelled data are sourced from commonly observed
ware, it is critical to parallelize message computation and aggregation. parts of molecular space, to find novel drugs, unexplored parts of the
chemical space are searched. To simulate this distribution shift, the training process. However, such a substantial difference might indicate
community commonly uses scaffold splits (Fig. 2) or time splits, in overfitting. Overfitting means that a model has so many parameters
which molecules in the training, validation and test sets have different that it is able to memorize the training data and labels instead of learn-
molecular scaffolds or are sourced from experiments conducted over ing to recognize patterns that generalize to unseen data points. This
different time periods. is a common problem for machine learning algorithms in data-scarce
settings, which is often the case in the life sciences. To ensure that
Transductive tasks. Transductive tasks (semi-supervised learning) a method is useful for new data, it is crucial to check if overfitting
train and test on the same graph, which is typically large and incom- occurred and to evaluate generalization capabilities, for instance, via
plete. For example, the goal of knowledge graph completion is to scaffold splits (Fig. 2).
detect missing edges based on existing ones. In biological settings, it In the worked example, overfitting can be avoided by stopping
may be desirable to repurpose existing drugs for new diseases. In this the training early. As the losses are tracked across training, the training
case, drugs and diseases are nodes, whereas efficacy relationships process can be stopped at the point of highest validation performance
are edges. Care must be taken when dividing the known edges of this (77.9% ROC-AUC) before the model starts overfitting, which translates
single graph into training and testing splits. Randomly masking edges to a 74.5% ROC-AUC on the test set. This is considerably better than the
between drugs and diseases may lead the model to just learn that simi- performance (70.5% test set ROC-AUC) obtained with a shallow FF-NN
lar drugs are likely to work against similar diseases. Although valid, this on Morgan fingerprints.
conclusion would not allow the discovery of drugs for diseases that
lack known treatments. Results
Properties of GNNs
Training and evaluation Although deep learning models offer a way to learn complex patterns
The HIV inhibition prediction example is an inductive task, which directly from raw data, this usually comes at the cost of data efficiency.
uses data provided by the Drug Therapeutics Program of the NIH’s Given the large number of parameters to optimize, if the number of
National Cancer Institute, accessible from the Open Graph Benchmark labelled examples is not large enough, the deep learning models are
(OGB)12. The dataset contains 40,000 small molecules, together with likely to learn spurious correlations and miss patterns that would enable
a binary label indicating their ability to inhibit HIV growth. The stand- generalization to unseen data points. A key to the success of GNNs is
ardized data splits from the OGB use scaffold splitting, with 80% of the that, compared with standard FF-NNs, they improve data efficiency and
molecules for training, 10% for validation and 10% for testing. accuracy on graph-structured data due to two fundamental properties:
The example GNN is constructed based on the previously locality bias and permutation equivariance.
described architecture, with an embedding layer, four layers of mes-
sage passing and a final add pooling and feedforward network. For Locality bias. Whenever data are represented as graphs, edges are
this classification task, cross-entropy is used as the loss function. To drawn to connect objects with some relation to one another. It is thus
evaluate model performance, the area under the receiver operating natural to think that a better representation of a node can be built by
curve is measured (ROC-AUC). looking at its neighbours, to provide more information than looking
After training for 100 epochs, the ROC-AUC is 82.5% on the training at another node at random. This locality inductive bias is the basis of
set compared with 73.0% on the validation set. The higher ROC-AUC the message-passing concept and induces the model to learn more
for training is expected, as the model sees those scaffolds during its generalizable functions4.
a b
Training set Test data
Different
Equal Equal
Different
Fig. 2 | Molecule similarity and overfitting. a, Examples of molecules that activity predictions of a graph neural network. A scaffold split ensures that no
have the same or a different molecular scaffold (indicated by purple and blue molecules in the training data (purple curve) and test data (blue curve) have the
colour), which is a core substructure. b, A clustered 2D embedding of molecules. same scaffold. The purpose is to evaluate the model’s capability to generalize to
Each point corresponds to a molecule, and similar ones are clustered together. a test distribution that is substantially different from the training data, which is
Points’ colouring corresponds to different data sources. The larger yellow expected in real-world applications. Part b adapted with permission from ref. 67,
points and grey points correspond to true positive and false positive antibiotic Elsevier.
a
Energies are permutation and rotation invariant
E = 0.5 kcal mol−1 E = 0.5 kcal mol−1 E = 0.5 kcal mol−1
F F F
1 5 3 2
6 1
2 4
4 5
Atom-type vectors are Forces are rotation
3 permutation equivariant 6 equivariant
1 2 3 4 5 6 1 2 3 4 5 6
b
These graphs These graphs are not isomorphic but
are isomorphic cannot be distinguished by standard GNNs
Fig. 3 | Important data symmetries for GNNs. a, Examples of properties The energy is invariant with respect to the reordering of the atoms, whereas
that are permutation and rotation invariant or equivariant. The energy of a the vector of atom types or charges is rearranged with the same ordering. b, The
molecule does not depend on its frame of reference, and so it is translation and challenge of graph isomorphism: the Weisfeiler–Leman test as a standard graph
rotation invariant. By contrast, the forces are translation and rotation equivariant neural network (GNN) will never be able to distinguish the third graph from the
vectors, as they rotate with the molecule in the new frame of reference. first two, as nodes of the same colour will always have the same representation.
of message-passing operators in which the messages are constructed to the input data to determine which subgraph and features were the
and passed based on the relative position of the two nodes. most important for the prediction. Another strategy is to build sur-
When working with graphs embedded in three dimensions, such rogate models54, simpler, more interpretable architectures trained
as the 3D structures of a molecule, it is important to consider the sym- to reproduce the inputs and outputs of the base model. Finally, graph
metry of the task with respect to translations and rotations of the frame generation methods can build simple example structures to maximize
of reference (Fig. 3a). This translates into SE(3) invariance or equivari- the likelihood of a class under the model55. More detailed taxonomy
ance, in which SE(3) is the special euclidean group in three dimensions, and description of the different GNN interpretability approaches are
that is, the group of rotations and translations in three dimensions. provided in refs. 56,57.
To design SE(3)-invariant architectures, the coordinates of the
nodes cannot be taken as normal input features, because they would Uncertainty estimation. Uncertainty estimation in machine learn-
cause the model’s output to change when the frame of reference is ing determines how much a prediction can be trusted. Like interpret-
translated. Similarly, taking the relative vectors as edge features ability, this is more difficult in a deep learning setting than in classical
is problematic, as they change when the system is rotated. The easiest approaches. For GNNs, uncertainty quantification comes with unique
way to achieve rotation invariance is to extract only the relative challenges. For example, data uncertainty or epistemic uncertainty can
distances between pairs of nodes and use these as edge features in arise from multiple sources with different impact magnitudes for node
message-passing44,45. features or missing or incorrect edges. Similarly, how uncertainty propa-
However, only using distances in message-passing does not yield gates through layers and passed messages to produce a final prediction
very expressive architectures. Similarly to arbitrary graphs, more pow- in GNNs is different from simpler architectures that are comparatively
erful models are obtained using either higher-order representations or better studied for uncertainty quantification. These challenges mean
multi-hop interactions. Unlike general graphs, for which universality that traditional deep learning uncertainty estimation methods fail when
is unattainable due to the intrinsic challenge of graph isomorphism, applied to GNNs in an inductive setting58. In the transductive setting, a
the grounding of nodes in 3D space makes isomorphism easier on major difficulty is the missing assumption on independent identically
geometric graphs. Both strategies can yield architectures that are distributed samples. Without this, many of the general uncertainty esti-
theoretically maximally expressive and are able to approximate any mation approaches do not apply. In practice, GNNs are underconfident59
continuous equivariant function on a set of points in the 3D space46. in the transductive setting. To address these issues, GNN have been
In the higher-order strategy11,47,48, the hidden representations of developed with tailored techniques, such as custom Bayesian node
nodes contain normal SE(3)-invariant scalar features, SE(3)-equivariant updates, to disentangle epistemic and aleatoric uncertainty60, and
vectors and higher-order representations. These more complex fea- topology-dependent correction steps of the confidence61,62.
tures can represent physical properties, such as forces and polariz-
ability; however, they have to be handled with specific equivariant Applications
operations, such as tensor products. In the multi-hop interaction With the abundance of graph-structured data in science and society,
strategy49,50, in addition to distances between pairs of nodes, the angles GNNs have found wide applicability, with meaningful impact in many
between pairs of connected edges and dihedral angles between three fields. However, due to the range of tasks, it is crucial to consider
consecutive edges are used. These additional features enable complex application-specific information when selecting a model, as there is
relationships to be distinguished that cannot be easily captured when no one-size-fits-all GNN. An architecture should be chosen that best fits
relying on simple distances. the application along multiple axes, such as scalability, expressivity and
data efficiency. For instance, one axis is the trade-off between expressiv-
Interpretability and uncertainty ity and memory usage, a core consideration for large p rotein–protein
Interpretability. Moving from a simple model based on hand-crafted interaction graphs. On another axis, chemical priors — for instance,
features or rules to a deep learning solution comes at the cost of the importance of rings — are crucial to small-molecule property
the degree of interpretability of the predictions. Instead of using prediction26. Finally, in machine-learned interatomic potentials for
human-interpretable rules, predictions are based on layers of transfor- molecular dynamics, inference speed is one of the main challenges63.
mations that produce representations without human-understandable Although standard GNNs can address many tasks adequately,
meanings. This is also the case for GNNs; however, among architectures, there are cases in which simple solutions fail or cannot be used. These
they are inherently more interpretable and explainable, because they cases require additional insights to be built into the architecture. This
learn about relations between human-understandable entities from section demonstrates this with literature examples highlighting some
the nodes used to define the graph. For example, an inference based important GNN applications in the life and physical sciences.
on a GNN’s link prediction in a knowledge graph is easier to explain and
interpret than the same inference made by an unstructured model. Knowledge graphs
The additional structure of the graph-based problem formulation Knowledge graphs model relational data via nodes that represent dif-
can be used for interpretability by inferring which node or subgraph ferent entities and directed edges that symbolize various relation-
of the input explains a prediction the most. To do so, researchers have ships. For instance, in a biomedical knowledge graph, nodes might be
developed several techniques similar to the general approaches for diseases, drug molecules, proteins or viruses (Fig. 4a). The edges could
neural network interpretability but that also take into account the encode relations about whether a drug cures a disease, a drug binds to
discreteness and symmetries of the graph structure. a protein, a protein is relevant for a disease or similar.
Two of the most common strategies are gradient-based51 or To process knowledge graphs, specialized GNNs have been
perturbation-based methods52, both of which try to pinpoint the com- proposed64,65 to handle the heterogeneous types of edges and nodes.
ponents of the input that most affects the output. An example of the Node embeddings from these architectures can predict the probabil-
latter strategy is GNNExplainer53, which applies various modifications ity of unknown relations. In a biomedical context, for example, the
a b
10,000 seconds
Protein Drug Molnupiravir Remdesivir Quantum
SARS-CoV-2 Binds simulations
Ritonavir-boosted
protease
nirmatrelvir
Angles Quantum
properties
Contains Inhibits Treats Treats
Spike protein Treats E, ω0, ...
Fig. 4 | GNNs for knowledge graphs and molecular property prediction. quantum simulations to estimate properties can take hours, GNNs have been
a, An example of a biomedical knowledge graph with different types of successful at predicting quantum properties in fractions of seconds. E, potential
interactions between entities (nodes) that are either proteins, drugs, viruses or energy; ω0, vibrational mode frequency; SARS-CoV, severe acute respiratory
diseases. b, Quantum property prediction with a graph neural network (GNN) syndrome coronavirus; SARS-CoV-2, severe acute respiratory syndrome
as a representative task for molecular property prediction. Although accurate coronavirus 2.
unknown relation could be whether an existing drug can be repurposed feature or using simultaneous message-passing layers for the molecular
to treat additional diseases. In this drug discovery context, knowledge graph34. Large transformer architectures are particularly well-suited to
graphs offer the opportunity to integrate additional data from many utilize the increasing amounts of data generated with quantum simula-
modalities like drugs, phenotypes, diseases, disease exposure, genes tions, from GEOM-QM9 and GEOM-DRUGS74 to PCQM4Mv2 (ref. 12). For
or pathways, each with their own types of relations7. Outside the bio- quantum properties, GNNs are also able to obtain electronic structures
medical context, GNNs for knowledge graphs have heavily impacted via variational quantum Monte Carlo, increasing speeds and bringing
recommender systems used in retail, advertisement and social media66. a new level of generalizability to the field75,76.
In rational protein design, message-passing-based tools are criti- standardized train/validation/test sets and evaluation metrics. They
cal to tackling inverse folding85, in which the aim is to reconstruct the come with PyTorch Geometric and Deep Graph Library interfaces
amino acid sequence from a 3D point cloud representing a backbone for data loaders and evaluation metrics to set up experiments in a
structure. Similar architectures have also been applied to predict the comparable and reproducible manner, with online leaderboards to
strength of the interaction between molecules. For instance, PiGNet86 compare state-of-the-art methods. In drug discovery, the Therapeutic
predicts the affinity between a molecule and the protein it is bound to. Data Commons data collection is notable, with a wide range of tasks,
For multiple additional drug design-related approaches, GNNs have from protein–ligand affinity to retrosynthesis and toxicity prediction.
been used as the base architecture for generative models over molecu- Another large-scale data collection effort is the Protein Data Bank,
lar structures. Notable examples include generating the most likely 3D which contains over 200,000 protein 3D atomic structures and has ena-
structures of small molecules87,88 (conformer generation; Fig. 5b); the bled many developments in machine learning for structural biology.
distribution of protein structures89,90 (protein folding); structures used Multiple protein structures occur as complexes with small molecules,
by small molecules to bind to proteins91 (molecular docking; Fig. 5b); and PDBBind is an effort to extract and curate structures from the Pro-
or structures of novel proteins92,93 (rational protein design). tein Data Bank with publicly available binding affinity values. A large
Another common approach to determining the flexibility of bio- source of bioactivity data is ChEMBL, which has activity measurements
physical structures is to learn their dynamics and increase their simula- for 2.4 million compounds. Drawing from these sources is the precision
tion speed. In this setting, GNNs are used as molecular potentials that are medicine knowledge graph7, which has relationships between 129,000
trained to predict the energy44,49,50,63 of a given atomic structure. After- nodes, with types ranging from diseases, drugs and genes to anatomi-
wards, the gradient, the predicted force, is used in the simulation to update cal regions and disease exposures. Finally, there are multiple sources
the atom positions. Other methods directly predict future atom positions94 of protein–protein interaction graphs, and more information can be
or speed up molecular dynamics simulations by generating abstracted, found in ref. 98, which surveys and compares 16 databases.
lower dimensional, coarse-grained molecular representations95. GNNs
can also undo coarse-grainings in a generative fashion96. Limitations and optimizations
Evaluation
Reproducibility and data deposition The variety of tasks that can be addressed with GNNs means there is
Data releases and good reproducibility practices have helped develop ambiguity in evaluation criteria and a danger of using irrelevant metrics.
GNNs. These have been partially driven by standardized benchmarks, This is particularly relevant for generative tasks, for which the goodness
such as the OGB12 and Therapeutic Data Commons97, which require code of an output is difficult to quantify. When generating new drug-like mole
to reproduce results to be published. Despite this progress, lack of data cules, simple metrics may include chemical validity, synthesizability,
is an issue for many life science applications, because data acquisition is diversity and distance from the training data. However, to evaluate
more expensive and diverse than in computer vision or natural language more complex biological phenomena, such as biological activity or
processing, in which scraping the internet often suffices for data collec- toxicity, computational estimators can be inaccurate and misleading.
tion. These challenges highlight the value of collating and open-sourcing
more data, alongside developing methods for the low data regime. Data dependence
Although GNNs are state of the art for many tasks on graph-structured
Data sources and benchmarks data, they are not the universal best option due to several technical
Benchmark suites — such as OGB, Therapeutic Data Commons or and data limitations. For instance, for some molecular property pre-
the Open Catalyst Project — provide collections of datasets with diction tasks, molecular fingerprints offer better performance10,99,
1 2
Conformer
generation
3
8 4 5 N
N
N Docking to
methyltransferase
7 6
HN O
Fig. 5 | Examples of GNNs for generative modelling. a, Example of fragment-based ligand is docked is visualized with both the amino acid sequence, which is how many
molecular generation process similar to ref. 80. b, Representation of the conformer graph neural network (GNN)-based methods represent it, and the surface. Protein
generation and docking tasks. For docking tasks, the target protein to which the structure from Protein Data Bank 6G29.
it is possible to build fully expressive architectures115. Considerations 12. Hu, W. et al. Open Graph Benchmark: datasets for machine learning on graphs.
Adv. Neural Inf. Process. Syst. 22118–22133 (NeurIPS Proceedings, 2020).
about particular domains will need to be integrated into efficient OGB is the most widely used benchmark for GNNs with a wide variety of datasets, each
architectures that do not suffer from bottlenecks. with its own leaderboard.
Methods to interpret GNNs are currently limited to identifying 13. Dummit, D. S. & Foote, R. M. Abstract algebra 7th edn (Wiley, 2004).
14. Xu, K., Hu, W., Leskovec, J. & Jegelka, S. How powerful are Graph Neural Networks? In
nodes or substructures that most influence a decision. Usually, this International Conference on Learning Representations (ICLR, 2019).
is not enough to truly understand the model’s reasoning or build sur- To our knowledge, this work, concurrently with [Mor+19], was the first to propose and
rogate, less-expressive models. Instead, using domain knowledge and use the analogy of GNNs to WL isomorphism test to study their expressivity.
15. Morris, C. et al. Weisfeiler and Leman go neural: higher-order graph neural networks.
multi-modal integrations, interpretability can be directly built into the Proc. AAAI Conf. Artif. Intell. 33, 4602–4609 (2019).
task the model optimizes for. For example, to predict if a molecule is 16. Vignac, C., Loukas, A. & Frossard, P. Building powerful and equivariant graph neural
toxic, instead of framing the task as a simple binary classification, the networks with structural message-passing. Adv. Neural Inf. Process. Syst. 33, 14143–14155
(2020).
model could be trained to predict which human proteins the ligand 17. Abboud, R., Ceylan, I.I., Grohe, M. & Lukasiewicz, T. The surprising power of graph neural
binds to and whether that interaction causes adverse side effects. This networks with random node initialization. In 30th International Joint Conferences on
Artificial Intelligence 2112–2118 (International Joint Conferences on Artificial Intelligence
prediction is substantially more interpretable and experimentally
Organization, 2021).
verifiable than a binary toxicity classification. 18. Sato, R., Yamada, M. & Kashima, H. Random features strengthen graph neural networks.
Finally, an underexplored GNN application in the life sciences is In Proceedings of the 2021 SIAM International Conference on Data Mining 333–341
(Society for Industrial and Applied Mathematics, 2021).
modelling dynamic graphs. Many biological phenomena with a graph
19. Dwivedi, V. P. et al. Benchmarking graph neural networks. J. Mach. Learn. Res. 24, 1–48
structure change over time. For instance, brain activity profiles can be (2023).
modelled as brain networks with signals for nodes that evolve over time, 20. Beaini, D. et al. Directional graph networks. In Proceedings of the 38th International
Conference on Machine Learning 748–758 (PMLR, 2021).
or disease spread can be modelled as a dynamic graph in which better
21. Lim, D. et al. Sign and basis invariant networks for spectral graph representation learning.
forecasts can have large positive impacts. Temporal graph networks In International Conference on Learning Representations (ICLR, 2023).
are well researched for applications outside of the life sciences 116. 22. Keriven, N. & Vaiter, S. What functions can Graph Neural Networks compute on
random graphs? The role of Positional Encoding. Preprint at [Link]
A promising direction could be applying them to life science problems. arXiv.2305.14814 (2023).
Despite these limitations, GNNs have the capacity to strongly 23. Zhang, B., Luo, S., Wang, L. & He, D. Rethinking the expressive power of GNNs via graph
impact many applications in the life sciences and beyond. With new biconnectivity. In International Conference on Learning Representations (ICLR, 2023).
24. Di Giovanni, F. et al. How does over-squashing affect the power of GNNs? Preprint at
state-of-the-art approaches in fields from drug and antibiotic discov- [Link] (2023).
ery and traffic prediction to structural biology and recommendation 25. Razin, N., Verbin, T. & Cohen, N. On the ability of graph neural networks to model
systems, it is expected that the application of GNNs, in their current interactions between vertices. In 37th Conference on Neural Information Processing
Systems (NeurIPS, 2023).
and future forms, will enable discoveries and the development of a 26. Bouritsas, G., Frasca, F., Zafeiriou, S. & Bronstein, M. M. Improving graph neural network
wide variety of new products. expressivity via subgraph isomorphism counting. IEEE Trans. Pattern Anal. Mach. Intell.
45, 657–668 (2023).
Code availability 27. Sun, Z., Deng, Z.-H., Nie, J.-Y. & Tang, J. RotatE: knowledge graph embedding by relational
rotation in complex space. Preprint at [Link] (2019).
Example code can be found at [Link] 28. Abboud, R., Ceylan, I., Lukasiewicz, T. & Salvatori, T. BoxE: a box embedding model for
GNN-primer/blob/main/GNN-primer_HIV_classification.ipynb. knowledge base completion. Adv. Neural Inf. Process. Syst. 33, 9649–9661 (2020).
29. Pavlović, A. & Sallinger, E. ExpressivE: a spatio-functional embedding for knowledge
graph completion. In International Conference on Learning Representations (ICLR, 2023).
Published online: xx xx xxxx 30. Veličković, P. et al. Graph attention networks. In International Conference on Learning
Representations (ICLR, 2017).
References Graph attention networks are the first application of the idea of attention to graphs,
1. Gori, M., Monfardini, G. & Scarselli, F. A new model for learning in graph domains. and they are one of the most widely used architectures to date.
In Proceedings 2005 IEEE International Joint Conference Neural Networks 729–734 31. Corso, G., Cavalleri, L., Beaini, D., Liò, P. & Veličković, P. Principal neighbourhood
(IEEE, 2005). aggregation for graph nets. Adv. Neural Inf. Process. Syst. 33, 13260–13271 (2020).
2. Merkwirth, C. & Lengauer, T. Automatic generation of complementary descriptors with 32. Gasteiger, J., Weißenberger, S. & Günnemann, S. Diffusion improves graph learning.
molecular graph networks. J. Chem. Inf. Model. 45, 1159–1168 (2005). Adv. Neural Inf. Process. Syst. 32, 13366–13378 (2019).
3. Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M. & Monfardini, G. The graph neural 33. Gutteridge, B., Dong, X., Bronstein, M. & Di Giovanni, F. DRew: dynamically rewired
network model. IEEE Trans. Neural Netw. 20, 61–80 (2008). message passing with delay. In International Conference on Machine Learning (eds
Although the genealogy of the development is multifaced, this is often considered as Krause, A. et. al.) 12252–12267 (ICML, 2023).
the first instance of GNNs. 34. Rampášek, L. et al. Recipe for a general, powerful, scalable graph transformer.
4. Bronstein, M. M., Bruna, J., Cohen, T. & Veličković, P. Geometric deep learning: Adv. Neural Inf. Process. Syst. 35, 14501–14515 (2022).
grids, groups, graphs, geodesics, and gauges. Preprint at [Link] 35. Dwivedi, V. P. et al. Long range graph benchmark. Adv. Neural Inf. Process. Syst. 35,
arXiv.2104.13478 (2021). 22326–22340 (2022).
Book with a very comprehensive introduction to the theoretical aspects behind GNNs 36. Dwivedi, V. P. & Bresson, X. A generalization of transformer networks to graphs. Preprint at
and other geometric deep learning architectures. [Link] (2020).
5. Jegelka, S. Theory of graph neural networks: representation and learning. Preprint at 37. Kreuzer, D., Beaini, D., Hamilton, W., Létorneau, V. & Tossou, P. Rethinking graph
[Link] (2022). transformers with spectral attention. Adv. Neural Inf. Process. Syst. 34, 21618–21629
6. Morgan, H. L. The generation of a unique machine description for chemical (2021).
structures-a technique developed at chemical abstracts service. J. Chem. Doc. 5, 38. Bodnar, C. et al. Weisfeiler and Lehman go topological: message passing simplicial
107–113 (1965). networks. In Proceedings of the 38th International Conference on Machine Learning
7. Chandak, P., Huang, K. & Zitnik, M. Building a knowledge graph to enable precision (eds Meila, M. & Zhang, T.) 1026–1037 (PMLR, 2021).
medicine. Sci. Data 10, 67 (2023). 39. Bodnar, C. et al. Weisfeiler and Lehman go cellular: cw networks. Adv. Neural Inf. Process.
8. Fey, M. & Lenssen, J. E. Fast graph representation learning with PyTorch Geometric. Syst. 34, 2625–2640 (2021).
Preprint at [Link] (2019). 40. Chamberlain, B. et al. Grand: graph neural diffusion. In Proceedings of the 38th
PyTorch Geometric is the most widely used library to develop GNNs. International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 1407–1418
9. Wang, M. et al. Deep Graph Library: a graph-centric, highly-performant package for (PMLR, 2021).
graph neural networks. Preprint at [Link] (2019). 41. Chamberlain, B. et al. Beltrami flow and neural diffusion on graphs. Adv. Neural Inf.
10. Yang, K. et al. Analyzing learned molecular representations for property prediction. Process. Syst. 34, 1594–1609 (2021).
J. Chem. Inf. Model. 59, 3370–3388 (2019). 42. Di Giovanni, F., Rowbottom, J., Chamberlain, B. P., Markovich, T. & Bronstein, M. M. Graph
11. Geiger, M. & Smidt, T. e3nn: Euclidean neural networks. Preprint at [Link] neural networks as gradient flows. Preprint at [Link]
10.48550/arXiv.2207.09453 (2022). (2022).
43. Rusch, T. K., Chamberlain, B., Rowbottom, J., Mishra, S. & Bronstein, M. Graph-coupled 72. Guo, M. et al. Hierarchical grammar-induced geometry for data-efficient molecular
oscillator networks. In Proceedings of the 39th International Conference on Machine property prediction. In Proceedings of the 40th International Conference on Machine
Learning (eds Chaudhuri, K. et al.) 18888–18909 (PMLR, 2022). Learning (eds Krause, A. et al.) 12055–12076 (PMLR, 2023).
44. Schütt, K. et al. SchNet: a continuous-filter convolutional neural network for modeling 73. Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing
quantum interactions. In NIPS’17: Proceedings of the 31st International Conference on for quantum chemistry. In Proceedings of the 34th International Conference on Machine
Neural Information Processing Systems (eds von Luxburg, U. et al.) 992–1002 (Curran Learning (eds Precup, D. & Teh, Y. W.) 1263–1272 (PMLR, 2017).
Associates Inc., 2017). To our knowledge, this paper is the first to formalize the idea of message passing as
SchNet is one of the earliest and most prominent examples of SE(3)-invariant GNNs. presented in this Primer and proposes applications of GNNs to quantum chemistry,
45. Satorras, V. G., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. In which remains one of the scientific fields in which GNNs have seen most applications.
Proceedings of the 38th International Conference on Machine Learning (eds Meila, M. & 74. Axelrod, S. & Gómez-Bombarelli, R. GEOM, energy-annotated molecular conformations
Zhang, T.) 9323–9332 (PMLR, 2021). for property prediction and molecular generation. Sci. Data 9, 185 (2022).
46. Dym, N. & Maron, H. On the universality of rotation equivariant point cloud networks. In 75. Hermann, J., Schätzle, Z. & Noé, F. Deep-neural-network solution of the electronic
International Conference on Learning Representations (ICLR, 2021). Schrödinger equation. Nat. Chem. 12, 891–897 (2020).
47. Thomas, N. et al. Tensor field networks: rotation- and translation-equivariant neural 76. Gao, N. & Günnemann, S. Generalizing neural wave functions. In International
networks for 3D point clouds. Preprint at [Link] Conference on Machine Learning 10708–10726 (ICML, 2023).
(2018). 77. Kingma, D. P. & Welling, M. Auto-encoding variational bayes. In International Conferece
48. Jing, B., Eismann, S., Suriana, P., Townshend, R. J. & Dror, R. Learning from protein on Learning Representations (ICLR, 2014).
structure with geometric vector perceptrons. In International Conference on Learning 78. Goodfellow, I. et al. Generative adversarial networks. Commun. ACM 63, 139–144 (2020).
Representations (ICLR, 2021). 79. Mitton, J., Senn, H. M., Wynne, K. & Murray-Smith, R. A graph VAE and graph transformer
49. Gasteiger, J., Groß, J. & Günnemann, S. Directional message passing for molecular approach to generating molecular graphs. Preprint at [Link]
graphs. In Adv. Neural Inf. Process. Syst. (NeurIPS, 2020). arXiv.2104.04345 (2021).
50. Gasteiger, J., Becker, F. & Günnemann, S. GemNet: universal directional graph neural 80. Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular
networks for molecules. Adv. Neural Inf. Process. Syst. 34, 6790–6802 (2021). graph generation. In Proceedings of the 35th International Conference on Machine
51. Baldassarre, F. & Azizpour, H. Explainability techniques for graph convolutional networks. Learning (eds Dy, J. & Krause, A.) 2323–2332 (PMLR, 2018).
Preprint at [Link] (2019). 81. Jin, W., Barzilay, R. & Jaakkola, T. Hierarchical generation of molecular graphs using
52. Schlichtkrull, M. S., De Cao, N. & Titov, I. Interpreting graph neural networks for NLP with structural motifs. In Proceedings of the 37th International Conference on Machine
differentiable edge masking. In International Conference on Learning Representations Learning (eds Daumé, H. & Singh, A.) 4839–4848 (PMLR, 2020).
(ICLR, 2021). 82. Vignac, C. & Frossard, P. Top-N: equivariant set and graph generation without
53. Ying, Z., Bourgeois, D., You, J., Zitnik, M. & Leskovec, J. GNNExplainer: generating exchangeability. In International Conference on Learning Representations
explanations for graph neural networks. Adv. Neural Inf. Process. Syst. 32, 9240–9251 (ICLR, 2022).
(2019). 83. Jo, J., Lee, S. & Hwang, S. J. Score-based generative modeling of graphs via the system of
54. Huang, Q., Yamada, M., Tian, Y., Singh, D. & Chang, Y. GraphLIME: local interpretable stochastic differential equations. In Proceedings of the 39th International Conference on
model explanations for graph neural networks. IEEE Trans. Knowl. Data Eng. 35, Machine Learning (eds Chaudhuri, K. et al.) 10362–10383 (PMLR, 2022).
6968–6962 (2023). 84. Vignac, C. et al. DiGress: discrete denoising diffusion for graph generation.
55. Yuan, H., Tang, J., Hu, X. & Ji, S. XGNN: towards model-level explanations of graph In International Conference on Learning Representations (ICLR, 2023).
neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on 85. Dauparas, J. et al. Robust deep learning–based protein sequence design using
Knowledge Discovery & Data Mining 430–438 (2020). ProteinMPNN. Science 378, 49–56 (2022).
56. Yuan, H., Yu, H., Gui, S. & Ji, S. Explainability in graph neural networks: a taxonomic 86. Moon, S., Zhung, W., Yang, S., Lim, J. & Kim, W. Y. PIGNet: a physicsinformed deep
survey. IEEE Trans. Pattern Anal. Mach. Intell. 45, 5782–5799 (2022). learning model toward generalized drug–target interaction predictions. Chem. Sci. 13,
57. Kakkad, J., Jannu, J., Sharma, K., Aggarwal, C. & Medya, S. A survey on explainability of 3661–3673 (2022).
graph neural networks. Preprint at [Link] (2023). 87. Xu, M. et al. GeoDiff: a geometric diffusion model for molecular conformation
58. Hirschfeld, L., Swanson, K., Yang, K., Barzilay, R. & Coley, C. W. Uncertainty quantification generation. In International Conference on Learning Representations (ICLR, 2022).
using neural networks for molecular property prediction. J. Chem. Inf. Model. 60, 88. Jing, B., Corso, G., Chang, J., Barzilay, R. & Jaakkola, T. S. Torsional diffusion for molecular
3770–3780 (2020). conformer generation. In Adv. Neural Inf. Process. Syst. (eds Sanmi, K. et al.) (NeurIPS,
59. Hsu, H. H.-H., Shen, Y., Tomani, C. & Cremers, D. What makes graph neural networks 2022).
miscalibrated? In Adv. Neural Inf. Process. Syst. (NeurIPS, 2022). 89. Ingraham, J., Riesselman, A., Sander, C. & Marks, D. Learning protein structure with
60. Stadler, M., Charpentier, B., Geisler, S., Zügner, D. & Günnemann, S. Graph posterior a differentiable simulator. In International Conference on Learning Representations
network: Bayesian predictive uncertainty for node classification. Adv. Neural Inf. Process. (ICLR, 2019).
Syst. 34, 18033–18048 (2021). 90. Jing, B. et al. EigenFold: generative protein structure prediction with diffusion models.
61. Wang, X., Liu, H., Shi, C. & Yang, C. Be confident! towards trustworthy graph neural Preprint at [Link] (2023).
networks via confidence calibration. Adv. Neural Inf. Process. Syst. 34, 23768–23779 91. Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. S. DiffDock: diffusion steps,
(2021). twists, and turns for molecular docking. In International Conference on Learning
62. Huang, K., Jin, Y., Candes, E. & Leskovec, J. Uncertainty quantification over graph Representations (ICLR, 2023).
with conformalized graph neural networks. Preprint at [Link] 92. Ingraham, J. et al. Illuminating protein space with a programmable generative model.
arXiv.2305.14535 (2023). Nature 623, 1070–1078 (2023).
63. Batzner, S. et al. E(3)-equivariant graph neural networks for data-efficient and accurate 93. Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion.
interatomic potentials. Nat. Commun. 13, 2453 (2022). Nature 620, 1089–1100 (2023).
64. Schlichtkrull, M. S. et al. Modeling relational data with graph convolutional 94. Fu, X., Xie, T., Rebello, N. J., Olsen, B. D. & Jaakkola, T. Simulate time-integrated
networks. In The Semantic Web. ESWC 2018. Lecture Notes in Computer Science coarse-grained molecular dynamics with geometric machine learning. Preprint at
(eds Gangemi, A. et al.) 593–607 (Springer, Cham, 2018). [Link] (2022).
65. Sun, Q. et al. SUGAR: subgraph neural network with reinforcement pooling and 95. Wang, W. et al. Generative coarse-graining of molecular conformations. In International
self-supervised mutual information mechanism. In WWW ’21: Proceedings of the Conference on Machine Learning 23213–23236 (ICML, 2022).
Web Conference 2021 (eds Leskovec, J. et al.) 2081–2091 (Association for Computing 96. Yang, S. & Gomez-Bombarelli, R. Chemically transferable generative backmapping
Machinery, 2021). of coarse-grained proteins. In Proceedings of the 40th International Conference on
66. Sharma, K. et al. A survey of graph neural networks for social recommender systems. Machine Learning (eds Krause, A. et al.) 39277–39298 (PMLR, 2023).
Preprint at [Link] (2022). 97. Huang, K. et al. Therapeutics data commons: machine learning datasets and tasks for
67. Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702.e13 drug discovery and development. In Proceedings of the Neural Information Processing
(2020). Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021
Discovery of a novel antibiotic, halicin, via GNNs, one of the most prominent examples (NeurIPS, 2021).
of the application of GNNs to scientific discovery. 98. Bajpai, A. K. et al. Systematic comparison of the protein-protein interaction databases
68. Feinberg, E. N., Joshi, E., Pande, V. S. & Cheng, A. C. Improvement in ADMET prediction from a user’s perspective. J. Biomed. Inform. 103, 103380 (2020).
with multitask deep featurization. J. Med. Chem. 63, 8835–8848 (2020). 99. Tripp, A., Bacallado, S., Singh, S. & Hernández-Lobato, J. M. Tanimoto random features
69. Peng, Y. et al. Enhanced graph isomorphism network for molecular ADMET properties for scalable molecular machine learning. In Adv. Neural Inf. Process. Syst. (NeurIPS,
prediction. IEEE Access 8, 168344–168360 (2020). 2023).
70. Murphy, M. et al. Efficiently predicting high resolution mass spectra with graph neural 100. Stärk, H. et al. 3D Infomax improves GNNs for molecular property prediction.
networks. In Proceedings of the 40th International Conference on Machine Learning (eds In Proceedings of the 39th International Conference on Machine Learning
Krause, A. et al.) 25549–25562 (PMLR, 2023). (eds Chaudhuri, K. et al.) 20479–20502 (PMLR, 2022).
71. Bevilacqua, B. et al. Equivariant subgraph aggregation networks. In International 101. Thakoor, S. et al. Large-scale representation learning on graphs via bootstrapping.
Conference on Learning Representations (ICLR, 2022). In International Conference on Learning Representations (ICLR, 2022).
102. Devlin, J., Chang, M., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional Acknowledgements
transformers for language understanding. In Proceedings of the 2019 Conference of The authors thank R. Wu, S. Yang, D. Lim, A. Corso and M.-M. Troadec for their help in
the North American Chapter of the Association for Computational Linguistics: Human reviewing the manuscript before submission. The authors also thank B. Jing, F. Di Giovanni,
Language Technologies, Volume 1 (Long and Short Papers) (eds Burstein, J. et al.) J. Yim, C. Vignac and F. Faltings for useful discussions. This work was supported by the NSF
4171–4186 (Association for Computational Linguistics, 2019). Expeditions grant (award 1918839), the Machine Learning for Pharmaceutical Discovery and
103. Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, Synthesis (MLPDS) consortium, the DTRA Discovery of Medical Countermeasures Against
1877–1901 (2020). New and Emerging (DOMANE) threats program, the DARPA Accelerated Molecular Discovery
104. Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a program, the NSF AI Institute CCF-2112665 and the NSF Award 2134795.
language model. Science 379, 1123–1130 (2023).
105. Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition Author contributions
at scale. In International Conference on Learning Representations (ICLR, 2021). Introduction (R.B., G.C., H.S., S.J. and T.J.); Experimentation (R.B., G.C., H.S., S.J. and T.J.);
106. Misra, I. & van der Maaten, L. Self-supervised learning of pretext-invariant Results (R.B., G.C., H.S., S.J. and T.J.); Applications (R.B., G.C., H.S., S.J. and T.J.); Reproducibility
representations. In 2020 IEEE/CVF Conference on Computer Vision and Pattern and data deposition (R.B., G.C., H.S. and S.J.); Limitations and optimizations (R.B., G.C., H.S.,
Recognition (CVPR) 6707–6717 (IEEE, 2020). S.J. and T.J.); Outlook (R.B., G.C., H.S., S.J. and T.J.); overview of the Primer (all authors).
107. He, K., Fan, H., Wu, Y., Xie, S. & Girshick, R. Momentum Contrast for unsupervised visual
representation learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Competing interests
Recognition (CVPR) 9726–9735 (IEEE, 2020). The authors declare no competing interests.
108. Liu, Y. et al. Graph self-supervised learning: a survey. IEEE Trans. Knowl. Data Eng. 35,
5879–5900 (2023). Additional information
109. Rusch, T. K., Bronstein, M. M. & Mishra, S. A survey on oversmoothing in graph neural Peer review information Nature Reviews Methods Primers thanks Jiliang Tang; Siddhartha
networks. Preprint at [Link] (2023). Mishra, who co-reviewed with Konstantin Rusch; and Rex Ying, who co-reviewed with Tinglin
110. Xu, K. et al. Representation learning on graphs with jumping knowledge networks. Huang, for their contribution to the peer review of this work.
In Proceedings of the 35th International Conference on Machine Learning (eds Dy, J. &
Krause, A.) 5453–5462 (PMLR, 2018). Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
111. Di Giovanni, F., Rowbottom, J., Chamberlain, B. P., Markovich, T. & Bronstein, M. M. published maps and institutional affiliations.
Understanding convolution on graphs via energies. In Transact. Mach. Learn. Res.
2835–8856 (2023). Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this
112. Rusch, T. K., Chamberlain, B. P., Mahoney, M. W., Bronstein, M. M. & Mishra, S. Gradient article under a publishing agreement with the author(s) or other rightsholder(s); author
gating for deep multi-rate learning on graphs. In International Conference on Learning self-archiving of the accepted manuscript version of this article is solely governed by the
Representations (ICLR, 2023). terms of such publishing agreement and applicable law.
113. Alon, U. & Yahav, E. On the bottleneck of graph neural networks and its practical
implications. In International Conference on Learning Representations (ICLR, 2021).
114. Topping, J., Di Giovanni, F., Chamberlain, B. P., Dong, X. & Bronstein, M. M. Understanding
over-squashing and bottlenecks on graphs via curvature. In International Conference on Related links
Learning Representations (ICLR, 2022). ChEMBL: [Link]
115. Dimitrov, R., Zhao, Z., Abboud, R. & Ceylan, I. I. PlanE: representation learning over planar Chemprop: [Link]
graphs. Preprint at [Link] (2023). Deep Graph Library: [Link]
116. Hosseinzadeh, M. M., Cannataro, M., Guzzi, P. H. & Dondi, R. Temporal networks in e3nn: [Link]
biology and medicine: a survey on models, algorithms, and tools. Netw. Model. Anal. PDBBind: [Link]
Health Inform. Bioinform. 12, 10 (2023). Protein Data Bank: [Link]
117. Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional PyTorch Geometric: [Link]
networks. In International Conference on Learning Representations (ICLR, 2017).
Graph convolutional network was the architecture that set off the recent years © Springer Nature Limited 2024
of development of GNNs.