0% found this document useful (0 votes)

15 views5 pages

GML Tutorial I

The document provides an introduction to graph theory and its applications in machine learning, detailing the structure of graphs, types of graphs, and their importance in various domains. It outlines key machine learning tasks on graphs, including node classification, relation prediction, clustering, and graph classification, while addressing challenges posed by non-i.i.d. data and the need for specialized methods. The tutorial emphasizes the significance of understanding graph-structured data for effective analysis and prediction in complex systems.

Uploaded by

Soumyodeep Dey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views5 pages

GML Tutorial I

Uploaded by

Soumyodeep Dey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Tutorial on Graph ML: Part I

Soumyodeep Dey

May 19, 2025

1 Introduction to Graphs and Graph-Structured Data

Graphs are mathematical structures used to model relationships between entities. They are
ubiquitous in domains such as social networks, biological systems, and molecular chemistry,
where entities (e.g., users, proteins, atoms) are nodes, and relationships (e.g., friendships,
interactions, bonds) are edges. Understanding the properties and types of graphs is essential
for analyzing graph-structured data and developing machine learning methods tailored to their
unique characteristics.
Graphs provide a powerful abstraction because they focus on the relationships between entities,
rather than the attributes of the entities themselves. This makes them suitable for representing
a wide variety of complex systems.

1.1 Definition and Properties of Graphs

A graph G = (V, E) consists of:
• A set of nodes V = {v1 , v2 , . . . , vn }, representing entities.
• A set of edges E ⊆ V × V, representing relationships between nodes.
The cardinality of V, denoted as |V| = n, is the number of nodes. The cardinality of E, denoted
as |E| = m, is the number of edges.
The adjacency matrix A ∈ Rn×n (where n = |V|) encodes the graph structure:
• For a simple graph (undirected, unweighted, no self-loops), A[i, j] = 1 if (vi , vj ) ∈ E, and
A[i, j] = 0 otherwise.
• For undirected graphs, A is symmetric (A[i, j] = A[j, i]).
• For directed graphs, A is not necessarily symmetric. If (vi , vj ) ∈ E, it indicates a relationship
from vi to vj .
• For weighted graphs, A[i, j] represents the weight of the edge (vi , vj ), indicating the strength
or intensity of the relationship.
A self-loop is an edge that connects a node to itself, i.e., (vi , vi ) ∈ E. In simple graphs, self-loops
are disallowed.
The neighborhood of a node vi , denoted N (vi ) = {vj | (vi , vj ) ∈ E}, consists of all nodes
connected to vi . The degree of a node vi , denoted as d(vi ), is the number of its neighbors, i.e.,
d(vi ) = |N (vi )|.

1
1.2 Types of Graphs
Graphs vary in structure and complexity, affecting the methods used to analyze them:
• Simple Graphs: Undirected, unweighted, no self-loops or multiple edges between nodes.
Each edge represents a basic, symmetric connection. Example: A friendship network where
edges indicate mutual connections.
• Directed Graphs: Edges have direction, so (vi , vj ) ̸= (vj , vi ), and A is not necessarily
symmetric. These graphs are used to model asymmetric relationships. Example: A citation
network where paper i cites paper j.
• Weighted Graphs: Edges have weights, so A[i, j] ∈ R represents the strength of the
relationship. The weights can be used to represent costs, distances, or affinities. Example: A
protein interaction network where weights indicate binding affinity.
• Multi-Relational Graphs: Edges have types, denoted as triples (vi , τ, vj ) ∈ E, where τ ∈ R
is a relation type. The graph is represented by an adjacency tensor A ∈ Rn×|R|×n , where
A[i, k, j] = 1 if (vi , τk , vj ) ∈ E. This type of graph captures diverse relationships between
entities. Example: A knowledge graph with relations like “treats” or “causes” between drugs
and diseases.
• Heterogeneous Graphs: Nodes have types (e.g., drugs, diseases), and edges connect specific
types. They are represented with node type sets V1 , V2 , . . . and edge type constraints. This
allows for modeling complex systems with different kinds of entities. Example: A biomedical
graph with drug-disease and disease-protein edges.
• Multiplex Graphs: Graphs with multiple layers, each representing a different relation type,
with inter-layer edges connecting the same node across layers. This structure captures different
aspects of the relationships between the same set of entities. Example: A transportation
network with layers for air, train, and bus connections between cities.

1.3 Feature Information

Graphs often include attribute or feature information:
• Node Features: A feature matrix X ∈ Rn×m where each row xi is an m-dimensional feature
vector for node vi . These features can be categorical, numerical, or textual. Example: User
profiles in a social network (age, interests).
• Edge Features: Attributes associated with edges, such as weights, categories, or time stamps.
These features provide additional information about the relationships.
• Graph-Level Features: Properties of the entire graph, such as average degree, diameter,
number of connected components, or global clustering coefficient. These features describe the
overall structure of the graph.

1.4 Importance of Graph-Structured Data

Graphs model complex relationships in diverse domains:
• Social Networks: Nodes are users, edges are friendships, interactions, or communication
patterns.
• Biological Networks: Nodes are proteins, genes, or metabolites, edges are interactions,
regulatory relationships, or metabolic pathways.

2
• Molecular Graphs: Nodes are atoms, edges are chemical bonds, representing molecular
structure and properties.
• Citation Networks: Nodes are documents, edges are citations, showing the flow of information
and influence between publications.
• Transportation Networks: Nodes are locations, edges are roads, routes, or connections,
representing infrastructure and flow.
• Knowledge Graphs: Nodes are entities (e.g., people, places, concepts), edges are relationships
(e.g., “born in,” “is a”).

2 Machine Learning Tasks on Graphs

Machine learning on graphs involves tasks that leverage both the graph’s structure (edges) and
features (node or edge attributes) to make predictions or uncover patterns. Unlike traditional
machine learning, graph data is non-i.i.d. (independent and identically distributed) due to node
dependencies, requiring specialized methods that exploit relational structure. These tasks are
crucial for extracting insights from networked data and have broad applications across domains.

2.1 Node Classification

Goal: Predict labels for nodes given a partially labeled graph. Formally, given a graph G = (V, E),
a feature matrix X ∈ Rn×m , and labels ytrain ∈ R|Vtrain |×c for a subset of nodes Vtrain ⊂ V, predict
labels ytest ∈ R|Vtest |×c for Vtest = V \ Vtrain , where c is the number of classes.
Applications:
• Classifying users as bots or humans in social networks.
• Predicting protein functions in interactomes (protein-protein interaction networks).
• Categorizing documents in citation networks.
• Identifying fraudulent accounts in financial transaction networks.
Assumptions:
• Homophily: Connected nodes tend to have similar labels (e.g., friends share political views,
papers in the same field cite each other).
• Structural Equivalence: Nodes with similar neighborhood structures have similar labels
(e.g., nodes with similar connectivity patterns in a network, even if not directly connected).
• Heterophily: Connected nodes have different labels (less common, e.g., predator-prey rela-
tionships in ecological networks).
Setup: Often semi-supervised, as the structure and features of unlabeled nodes (Vtest ) are
available during training. This allows models to leverage the full graph structure.
Learn a classifier f : V → Rc that maps nodes to a probability distribution over c classes,
leveraging A, X, and ytrain . The objective is to minimize a loss, e.g., cross-entropy:
X X
L=− yi,c log ŷi,c , (1)
vi ∈Vtrain c∈Y

where ŷi,c = f (vi )c is the predicted probability for class c, and yi,c is the true label (1 if vi belongs
to class c, 0 otherwise).

3
2.2 Relation Prediction (Link Prediction, Graph Completion)
Goal: Predict missing edges given a partial graph. Formally, given G = (V, Etrain ), predict edges
in E \ Etrain . This task aims to infer relationships that are not explicitly observed.
Applications:
• Recommending friendships or connections in social networks.
• Predicting drug-target interactions or drug side effects in biomedical knowledge graphs.
• Completing missing links in citation networks or co-authorship networks.
• Inferring new facts or relationships in knowledge bases.
Challenges: Varies by graph type.
• Simple graphs: Heuristics based on common neighbors may suffice.
• Multi-relational graphs: Require modeling complex, relation-specific patterns.
Setup: Learn a scoring function s : V × V → R that predicts the likelihood of an edge (vi , vj ).
The function can be based on node embeddings, graph statistics, or learned neural networks.
The objective is to maximize the likelihood of observed edges while minimizing the likelihood of
unobserved ones:
X X
L= log s(vi , vj ) + log(1 − s(vi , vj )). (2)
(vi ,vj )∈Etrain (vi ,vj )∈E
/ train

2.3 Clustering and Community Detection

Goal: Identify groups of densely connected nodes (communities) without labels. Formally,
partition V into subsets C1 , C2 , . . . , Ck such that intra-community edges are maximized and
inter-community edges are minimized. The number of communities k may be pre-defined or
inferred.
Applications:
• Detecting research groups in collaboration networks.
• Identifying functional modules in genetic interaction networks.
• Uncovering fraud rings in financial transaction networks.
• Analyzing social structures and polarization in online communities.
Communities reflect latent structures where nodes share similar roles, attributes, or functions.
Nodes within the same community are more likely to interact with each other.

2.4 Graph Classification, Regression, and Clustering

Goal: Predict labels or properties for entire graphs, or cluster similar graphs. Formally, given
a dataset of graphs {G1 , G2 , . . . , GN } with labels {y1 , y2 , . . . , yN } (for classification/regression),
learn a model f : G → Y or cluster graphs into groups.
Applications:
• Classifying molecules as toxic or non-toxic based on their molecular structure.
• Predicting properties of network traffic graphs (e.g., congestion levels).
• Clustering graphs by structural similarity (e.g., grouping similar program control flow graphs).

4
• Identifying different types of social networks or biological networks.
Setup: Each graph is treated as an i.i.d. datapoint, unlike node-level tasks where nodes within
a graph are interdependent. The model learns a mapping from entire graphs to labels or values.
• For classification, minimize a loss over graphs:
N X
X
L=− yi,c log f (Gi )c . (3)
i=1 c∈Y

• For regression, minimize a loss function such as Mean Squared Error:

N
1 X
L= (yi − f (Gi ))2 . (4)
N
i=1

Challenge: Define graph-level features that capture the relational structure of each graph, such
as graphlets, graph kernels, or summary statistics.

2.5 Challenges with Non-I.I.D. Data

• Non-I.I.D. Nature: Nodes are interdependent due to edges, violating the i.i.d. assumption
of traditional machine learning. The value or label of one node is often correlated with the
values or labels of its neighbors, introducing complex dependencies. For example, a node’s
label may depend on its neighbors’ labels (homophily).
• Relational Inductive Biases: Models must exploit graph structure, such as:
– Homophily: Connected nodes tend to have similar attributes or labels.
– Structural Equivalence: Nodes with similar local network structures have similar at-
tributes or labels.
– Community Structure: Nodes cluster into groups with dense intra-connections and sparse
inter-connections.
– Hierarchical Organization: Graphs may exhibit multi-scale organization, with nodes
organized into clusters, which are further organized into larger groups.
These biases are crucial for generalization and require models to consider relationships, rather
than treating nodes as isolated data points.
• Scalability Issues: Real-world graphs (e.g., social networks, the web) can have millions or
billions of nodes and edges. Traditional methods often struggle with memory and computational
costs due to the sparse, high-dimensional adjacency matrices. Efficient algorithms and data
structures are needed.
• Dynamic Graphs: Graphs can evolve over time, with nodes and edges added or removed.
Models must adapt to these changes, which is challenging for static methods.

Lecture 1 Scribe
No ratings yet
Lecture 1 Scribe
13 pages
Unit I Graph Theory and Concepts
No ratings yet
Unit I Graph Theory and Concepts
35 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
124 pages
GML Tutorial II
No ratings yet
GML Tutorial II
20 pages
Representation Learning On Graphs: Methods and Applications
No ratings yet
Representation Learning On Graphs: Methods and Applications
23 pages
Graph Based Data Science
No ratings yet
Graph Based Data Science
37 pages
GNNS
No ratings yet
GNNS
7 pages
10 Graph Neural Networks v2.2
No ratings yet
10 Graph Neural Networks v2.2
61 pages
A Gentle Introduction To Graph Neural Networks
No ratings yet
A Gentle Introduction To Graph Neural Networks
14 pages
Menendez Llorente
No ratings yet
Menendez Llorente
22 pages
Defence Transcription
No ratings yet
Defence Transcription
4 pages
A Gentle Introduction To Graph Neural Networks
No ratings yet
A Gentle Introduction To Graph Neural Networks
31 pages
Overview of Graph Representation Learning
No ratings yet
Overview of Graph Representation Learning
10 pages
Machine Learning with Graphs Course
No ratings yet
Machine Learning with Graphs Course
88 pages
Graph Neural Networks Explained
No ratings yet
Graph Neural Networks Explained
22 pages
09 Hetero
No ratings yet
09 Hetero
62 pages
Machine Learning with Graphs Course Overview
No ratings yet
Machine Learning with Graphs Course Overview
69 pages
Deep Learning On Graphs: A Survey: Ziwei Zhang, Peng Cui and Wenwu Zhu, Fellow, IEEE
No ratings yet
Deep Learning On Graphs: A Survey: Ziwei Zhang, Peng Cui and Wenwu Zhu, Fellow, IEEE
24 pages
Seminar Presentation
No ratings yet
Seminar Presentation
19 pages
Graph Learning Optimization Methods
No ratings yet
Graph Learning Optimization Methods
79 pages
2024 - Introduction To Graph Neural Networks A Starting
No ratings yet
2024 - Introduction To Graph Neural Networks A Starting
49 pages
Bacciu 2020
No ratings yet
Bacciu 2020
62 pages
Week 3 4 SNA+Recommender
No ratings yet
Week 3 4 SNA+Recommender
92 pages
07 Hetero
No ratings yet
07 Hetero
62 pages
The Graph Neural Network Model
No ratings yet
The Graph Neural Network Model
20 pages
19MAM81-GRLmidsem 1 Answer Key
No ratings yet
19MAM81-GRLmidsem 1 Answer Key
14 pages
17 - Dynamic Graph Convolutional Networks
No ratings yet
17 - Dynamic Graph Convolutional Networks
18 pages
Graph-Based Pattern Recognition Techniques
No ratings yet
Graph-Based Pattern Recognition Techniques
48 pages
Graph Convolutional Networks Review
No ratings yet
Graph Convolutional Networks Review
23 pages
Lec 32
No ratings yet
Lec 32
25 pages
09 Hetero
No ratings yet
09 Hetero
72 pages
Graph ML Exam
No ratings yet
Graph ML Exam
3 pages
Graph Mining: Techniques & Applications
No ratings yet
Graph Mining: Techniques & Applications
8 pages
Week 16
No ratings yet
Week 16
47 pages
Lec1 Graph
No ratings yet
Lec1 Graph
42 pages
Graph Representation Learning
No ratings yet
Graph Representation Learning
141 pages
Graph Learning A Survey
No ratings yet
Graph Learning A Survey
19 pages
A Survey of Graph Neural Networks in Various Learning Paradigms Methods, Applications, and Challenges
No ratings yet
A Survey of Graph Neural Networks in Various Learning Paradigms Methods, Applications, and Challenges
70 pages
Foundations and Frontiers of Graph Learning Theory
No ratings yet
Foundations and Frontiers of Graph Learning Theory
35 pages
02 Tradition ML
No ratings yet
02 Tradition ML
68 pages
A Community Detection and Graph Neural Network Based Link Prediction Approach For Scientific Literature
No ratings yet
A Community Detection and Graph Neural Network Based Link Prediction Approach For Scientific Literature
21 pages
Yang 20 A
No ratings yet
Yang 20 A
16 pages
Understanding Graph Databases - A Comprehensive Introduction
No ratings yet
Understanding Graph Databases - A Comprehensive Introduction
29 pages
Lecture 12 Graph
No ratings yet
Lecture 12 Graph
57 pages
Social Network Analysis (SNA) Is A Quantitative Method To Examine and Measure
No ratings yet
Social Network Analysis (SNA) Is A Quantitative Method To Examine and Measure
10 pages
06 GNN3
No ratings yet
06 GNN3
73 pages
Chapter 2. Graph Theory and Concepts: Figure 2-1
No ratings yet
Chapter 2. Graph Theory and Concepts: Figure 2-1
18 pages
GML 1
No ratings yet
GML 1
32 pages
Introduction to Graph Neural Networks
100% (1)
Introduction to Graph Neural Networks
122 pages
Biol Sistemas 2 Redes
No ratings yet
Biol Sistemas 2 Redes
82 pages
Graph Neural Networks
No ratings yet
Graph Neural Networks
124 pages
Graph Theory Basics for Beginners
No ratings yet
Graph Theory Basics for Beginners
89 pages
02 SNA Network Measures Basic 2
No ratings yet
02 SNA Network Measures Basic 2
19 pages
Enhanced Graph Representations For Graph Convolutional Network Models
No ratings yet
Enhanced Graph Representations For Graph Convolutional Network Models
18 pages
Automated Unsupervised Graph Representation Learning
No ratings yet
Automated Unsupervised Graph Representation Learning
14 pages
Gnns
No ratings yet
Gnns
75 pages
Graph Neural Networks for Molecular Prediction
No ratings yet
Graph Neural Networks for Molecular Prediction
84 pages
Understanding the Delta Rule in Learning
No ratings yet
Understanding the Delta Rule in Learning
10 pages
Lecture2 Introduction To 2D Graphics
No ratings yet
Lecture2 Introduction To 2D Graphics
35 pages
Interior Designing - Elements
No ratings yet
Interior Designing - Elements
22 pages
Samson Dbms (R23) FULL NOTES-1 - Removed
No ratings yet
Samson Dbms (R23) FULL NOTES-1 - Removed
25 pages
Power Electronics Course Overview
No ratings yet
Power Electronics Course Overview
5 pages
Physics Practical Angle Minimum Deviation
No ratings yet
Physics Practical Angle Minimum Deviation
3 pages
Lecture-4.1. Representing Knowledge Using Rules
No ratings yet
Lecture-4.1. Representing Knowledge Using Rules
29 pages
Backpropagation A Peek Into The Mathematics of Optimization
No ratings yet
Backpropagation A Peek Into The Mathematics of Optimization
4 pages
Pre-Test Mathematics 6
No ratings yet
Pre-Test Mathematics 6
8 pages
Silk Drive: Traction Drives for EVs
No ratings yet
Silk Drive: Traction Drives for EVs
11 pages
Divisibility Tests
No ratings yet
Divisibility Tests
22 pages
Abstract Algebra Abstract Algebra: Reviews Reviews
No ratings yet
Abstract Algebra Abstract Algebra: Reviews Reviews
3 pages
Name - Chapter 4 - Functions Review 1
No ratings yet
Name - Chapter 4 - Functions Review 1
4 pages
Maths Paper 1 Form 2 End Term 1 2020
No ratings yet
Maths Paper 1 Form 2 End Term 1 2020
17 pages
Statistical Formulas (Business Stats)
No ratings yet
Statistical Formulas (Business Stats)
4 pages
Year 9 Maths Specimen Paper 3
No ratings yet
Year 9 Maths Specimen Paper 3
11 pages
Expected Costs of Phone Warranty and Train Ticket
No ratings yet
Expected Costs of Phone Warranty and Train Ticket
2 pages
Civil Engineering Surveying II
No ratings yet
Civil Engineering Surveying II
52 pages
Chapter-3 Control Flow and Function
No ratings yet
Chapter-3 Control Flow and Function
36 pages
Math's Role in Everyday Life
No ratings yet
Math's Role in Everyday Life
39 pages
Tutorial 5
No ratings yet
Tutorial 5
2 pages
Math Language for College Students
No ratings yet
Math Language for College Students
31 pages
AI/ML Interview Prep Guide
No ratings yet
AI/ML Interview Prep Guide
50 pages
Regression Methods for Initial Margin
No ratings yet
Regression Methods for Initial Margin
19 pages
Trigonometry Basics for Students
No ratings yet
Trigonometry Basics for Students
25 pages
Value Engineering VA - VE
100% (2)
Value Engineering VA - VE
146 pages
Develop Computer Program Assignment
No ratings yet
Develop Computer Program Assignment
2 pages
Mathematical Literacy P2 Memo May 2024
No ratings yet
Mathematical Literacy P2 Memo May 2024
7 pages
Introduction to Metrology & Measurement
No ratings yet
Introduction to Metrology & Measurement
38 pages

GML Tutorial I

Uploaded by

GML Tutorial I

Uploaded by

Tutorial on Graph ML: Part I

May 19, 2025

1 Introduction to Graphs and Graph-Structured Data

1.1 Definition and Properties of Graphs

1.3 Feature Information

1.4 Importance of Graph-Structured Data

2 Machine Learning Tasks on Graphs

2.1 Node Classification

2.3 Clustering and Community Detection

2.4 Graph Classification, Regression, and Clustering

• For regression, minimize a loss function such as Mean Squared Error:

2.5 Challenges with Non-I.I.D. Data

You might also like