Tutorial on Graph ML: Part I
Soumyodeep Dey
May 19, 2025
1 Introduction to Graphs and Graph-Structured Data
Graphs are mathematical structures used to model relationships between entities. They are
ubiquitous in domains such as social networks, biological systems, and molecular chemistry,
where entities (e.g., users, proteins, atoms) are nodes, and relationships (e.g., friendships,
interactions, bonds) are edges. Understanding the properties and types of graphs is essential
for analyzing graph-structured data and developing machine learning methods tailored to their
unique characteristics.
Graphs provide a powerful abstraction because they focus on the relationships between entities,
rather than the attributes of the entities themselves. This makes them suitable for representing
a wide variety of complex systems.
1.1 Definition and Properties of Graphs
A graph G = (V, E) consists of:
• A set of nodes V = {v1 , v2 , . . . , vn }, representing entities.
• A set of edges E ⊆ V × V, representing relationships between nodes.
The cardinality of V, denoted as |V| = n, is the number of nodes. The cardinality of E, denoted
as |E| = m, is the number of edges.
The adjacency matrix A ∈ Rn×n (where n = |V|) encodes the graph structure:
• For a simple graph (undirected, unweighted, no self-loops), A[i, j] = 1 if (vi , vj ) ∈ E, and
A[i, j] = 0 otherwise.
• For undirected graphs, A is symmetric (A[i, j] = A[j, i]).
• For directed graphs, A is not necessarily symmetric. If (vi , vj ) ∈ E, it indicates a relationship
from vi to vj .
• For weighted graphs, A[i, j] represents the weight of the edge (vi , vj ), indicating the strength
or intensity of the relationship.
A self-loop is an edge that connects a node to itself, i.e., (vi , vi ) ∈ E. In simple graphs, self-loops
are disallowed.
The neighborhood of a node vi , denoted N (vi ) = {vj | (vi , vj ) ∈ E}, consists of all nodes
connected to vi . The degree of a node vi , denoted as d(vi ), is the number of its neighbors, i.e.,
d(vi ) = |N (vi )|.
1
1.2 Types of Graphs
Graphs vary in structure and complexity, affecting the methods used to analyze them:
• Simple Graphs: Undirected, unweighted, no self-loops or multiple edges between nodes.
Each edge represents a basic, symmetric connection. Example: A friendship network where
edges indicate mutual connections.
• Directed Graphs: Edges have direction, so (vi , vj ) ̸= (vj , vi ), and A is not necessarily
symmetric. These graphs are used to model asymmetric relationships. Example: A citation
network where paper i cites paper j.
• Weighted Graphs: Edges have weights, so A[i, j] ∈ R represents the strength of the
relationship. The weights can be used to represent costs, distances, or affinities. Example: A
protein interaction network where weights indicate binding affinity.
• Multi-Relational Graphs: Edges have types, denoted as triples (vi , τ, vj ) ∈ E, where τ ∈ R
is a relation type. The graph is represented by an adjacency tensor A ∈ Rn×|R|×n , where
A[i, k, j] = 1 if (vi , τk , vj ) ∈ E. This type of graph captures diverse relationships between
entities. Example: A knowledge graph with relations like “treats” or “causes” between drugs
and diseases.
• Heterogeneous Graphs: Nodes have types (e.g., drugs, diseases), and edges connect specific
types. They are represented with node type sets V1 , V2 , . . . and edge type constraints. This
allows for modeling complex systems with different kinds of entities. Example: A biomedical
graph with drug-disease and disease-protein edges.
• Multiplex Graphs: Graphs with multiple layers, each representing a different relation type,
with inter-layer edges connecting the same node across layers. This structure captures different
aspects of the relationships between the same set of entities. Example: A transportation
network with layers for air, train, and bus connections between cities.
1.3 Feature Information
Graphs often include attribute or feature information:
• Node Features: A feature matrix X ∈ Rn×m where each row xi is an m-dimensional feature
vector for node vi . These features can be categorical, numerical, or textual. Example: User
profiles in a social network (age, interests).
• Edge Features: Attributes associated with edges, such as weights, categories, or time stamps.
These features provide additional information about the relationships.
• Graph-Level Features: Properties of the entire graph, such as average degree, diameter,
number of connected components, or global clustering coefficient. These features describe the
overall structure of the graph.
1.4 Importance of Graph-Structured Data
Graphs model complex relationships in diverse domains:
• Social Networks: Nodes are users, edges are friendships, interactions, or communication
patterns.
• Biological Networks: Nodes are proteins, genes, or metabolites, edges are interactions,
regulatory relationships, or metabolic pathways.
2
• Molecular Graphs: Nodes are atoms, edges are chemical bonds, representing molecular
structure and properties.
• Citation Networks: Nodes are documents, edges are citations, showing the flow of information
and influence between publications.
• Transportation Networks: Nodes are locations, edges are roads, routes, or connections,
representing infrastructure and flow.
• Knowledge Graphs: Nodes are entities (e.g., people, places, concepts), edges are relationships
(e.g., “born in,” “is a”).
2 Machine Learning Tasks on Graphs
Machine learning on graphs involves tasks that leverage both the graph’s structure (edges) and
features (node or edge attributes) to make predictions or uncover patterns. Unlike traditional
machine learning, graph data is non-i.i.d. (independent and identically distributed) due to node
dependencies, requiring specialized methods that exploit relational structure. These tasks are
crucial for extracting insights from networked data and have broad applications across domains.
2.1 Node Classification
Goal: Predict labels for nodes given a partially labeled graph. Formally, given a graph G = (V, E),
a feature matrix X ∈ Rn×m , and labels ytrain ∈ R|Vtrain |×c for a subset of nodes Vtrain ⊂ V, predict
labels ytest ∈ R|Vtest |×c for Vtest = V \ Vtrain , where c is the number of classes.
Applications:
• Classifying users as bots or humans in social networks.
• Predicting protein functions in interactomes (protein-protein interaction networks).
• Categorizing documents in citation networks.
• Identifying fraudulent accounts in financial transaction networks.
Assumptions:
• Homophily: Connected nodes tend to have similar labels (e.g., friends share political views,
papers in the same field cite each other).
• Structural Equivalence: Nodes with similar neighborhood structures have similar labels
(e.g., nodes with similar connectivity patterns in a network, even if not directly connected).
• Heterophily: Connected nodes have different labels (less common, e.g., predator-prey rela-
tionships in ecological networks).
Setup: Often semi-supervised, as the structure and features of unlabeled nodes (Vtest ) are
available during training. This allows models to leverage the full graph structure.
Learn a classifier f : V → Rc that maps nodes to a probability distribution over c classes,
leveraging A, X, and ytrain . The objective is to minimize a loss, e.g., cross-entropy:
X X
L=− yi,c log ŷi,c , (1)
vi ∈Vtrain c∈Y
where ŷi,c = f (vi )c is the predicted probability for class c, and yi,c is the true label (1 if vi belongs
to class c, 0 otherwise).
3
2.2 Relation Prediction (Link Prediction, Graph Completion)
Goal: Predict missing edges given a partial graph. Formally, given G = (V, Etrain ), predict edges
in E \ Etrain . This task aims to infer relationships that are not explicitly observed.
Applications:
• Recommending friendships or connections in social networks.
• Predicting drug-target interactions or drug side effects in biomedical knowledge graphs.
• Completing missing links in citation networks or co-authorship networks.
• Inferring new facts or relationships in knowledge bases.
Challenges: Varies by graph type.
• Simple graphs: Heuristics based on common neighbors may suffice.
• Multi-relational graphs: Require modeling complex, relation-specific patterns.
Setup: Learn a scoring function s : V × V → R that predicts the likelihood of an edge (vi , vj ).
The function can be based on node embeddings, graph statistics, or learned neural networks.
The objective is to maximize the likelihood of observed edges while minimizing the likelihood of
unobserved ones:
X X
L= log s(vi , vj ) + log(1 − s(vi , vj )). (2)
(vi ,vj )∈Etrain (vi ,vj )∈E
/ train
2.3 Clustering and Community Detection
Goal: Identify groups of densely connected nodes (communities) without labels. Formally,
partition V into subsets C1 , C2 , . . . , Ck such that intra-community edges are maximized and
inter-community edges are minimized. The number of communities k may be pre-defined or
inferred.
Applications:
• Detecting research groups in collaboration networks.
• Identifying functional modules in genetic interaction networks.
• Uncovering fraud rings in financial transaction networks.
• Analyzing social structures and polarization in online communities.
Communities reflect latent structures where nodes share similar roles, attributes, or functions.
Nodes within the same community are more likely to interact with each other.
2.4 Graph Classification, Regression, and Clustering
Goal: Predict labels or properties for entire graphs, or cluster similar graphs. Formally, given
a dataset of graphs {G1 , G2 , . . . , GN } with labels {y1 , y2 , . . . , yN } (for classification/regression),
learn a model f : G → Y or cluster graphs into groups.
Applications:
• Classifying molecules as toxic or non-toxic based on their molecular structure.
• Predicting properties of network traffic graphs (e.g., congestion levels).
• Clustering graphs by structural similarity (e.g., grouping similar program control flow graphs).
4
• Identifying different types of social networks or biological networks.
Setup: Each graph is treated as an i.i.d. datapoint, unlike node-level tasks where nodes within
a graph are interdependent. The model learns a mapping from entire graphs to labels or values.
• For classification, minimize a loss over graphs:
N X
X
L=− yi,c log f (Gi )c . (3)
i=1 c∈Y
• For regression, minimize a loss function such as Mean Squared Error:
N
1 X
L= (yi − f (Gi ))2 . (4)
N
i=1
Challenge: Define graph-level features that capture the relational structure of each graph, such
as graphlets, graph kernels, or summary statistics.
2.5 Challenges with Non-I.I.D. Data
• Non-I.I.D. Nature: Nodes are interdependent due to edges, violating the i.i.d. assumption
of traditional machine learning. The value or label of one node is often correlated with the
values or labels of its neighbors, introducing complex dependencies. For example, a node’s
label may depend on its neighbors’ labels (homophily).
• Relational Inductive Biases: Models must exploit graph structure, such as:
– Homophily: Connected nodes tend to have similar attributes or labels.
– Structural Equivalence: Nodes with similar local network structures have similar at-
tributes or labels.
– Community Structure: Nodes cluster into groups with dense intra-connections and sparse
inter-connections.
– Hierarchical Organization: Graphs may exhibit multi-scale organization, with nodes
organized into clusters, which are further organized into larger groups.
These biases are crucial for generalization and require models to consider relationships, rather
than treating nodes as isolated data points.
• Scalability Issues: Real-world graphs (e.g., social networks, the web) can have millions or
billions of nodes and edges. Traditional methods often struggle with memory and computational
costs due to the sparse, high-dimensional adjacency matrices. Efficient algorithms and data
structures are needed.
• Dynamic Graphs: Graphs can evolve over time, with nodes and edges added or removed.
Models must adapt to these changes, which is challenging for static methods.