0% found this document useful (0 votes)

19 views46 pages

Graph Mining

Graph mining involves discovering repetitive subgraphs within input graphs to identify interesting patterns and compress data. It applies to both small and large graphs, such as chemical compounds and social networks, and focuses on mining substructures, similarities, and communities. Key challenges include graph isomorphism, candidate generation, and the potential explosion of frequent subgraphs due to the Apriori property.

Uploaded by

dumi dlam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views46 pages

Graph Mining

Uploaded by

dumi dlam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Graph mining

Graph mining
◼ Graph Mining (GM) is essentially the
problem of discovering repetitive
subgraphs occurring in the input graphs

◼ Motivation
◼ Finding subgraphs capable of
compressing the data by abstracting
instances of the substructures
◼ Identifying conceptually interesting
patterns
Graph mining
Data may consist of

1. multiple small graphs → today

e.g., chemical compounds, biological pathways,
program control flows, consumer behaviour, ...,
even (html) documents can be presented by
graphs!
2. one large graph → later
e.g., internet, social network

Information to mine: interesting substructures, similarities,

communities, clusters
What graphs are good for?
• Most of existing data mining algorithms are based on
transaction representation, i.e., sets of items.
• Datasets with structures, layers, hierarchy and/or
geometry often do not fit well in this transaction
setting. For example:

• Numerical simulations
• 3D protein structures
• Chemical Compounds
• Generic XML files.
Graph, Graph, Everywhere
Relationships
Interactions
Connections

Aspirin

Internet Co-author network

Social Network Analysis
•Network (Graph)
•Nodes: Things (people, places, etc.)
•Edges: Relationships (friends, visited, etc.)
Graphs
• A graph G = (V,E) is a set of vertices V
and a set (possibly empty) E
of pairs of vertices e1 = (v1, v2),
where e1 ∈ E and v1, v2 ∈ V.
• Edges may contain weights or labels and have
direction
• Nodes may contain additional labels
Graphs
Vertex (node)

Cycle
Edge

-5

Directed Edge
10 Weighted Edge
7

Molecular interaction networks are mapped as graphs

The protein protein interaction network
Modeling Data With Graphs…
Going Beyond Transactions
Data Instance Graph Instance
Graphs are suitable for Element Vertex
capturing arbitrary
relations between the Element’s Attributes Vertex Label
various elements.
Relation Between Edge
Two Elements

Type Of Relation Edge Label

Relation between a Hyper Edge

Set of Elements

Provide enormous flexibility for modeling the underlying data as

they allow the modeler to decide on what the elements should be
and the type of relations to be modeled
Graph Definitions
a a a
q p p p
p
b a a a
s s s
r
r
p t t t
r t
r
c c c c
p q p p
b b b

(a) Labeled Graph (b) Subgraph (c) Induced Subgraph

Graph Pattern Mining

Discover structural patterns in the underlying graph

Standard approach: Iterative expansion

0 0 1 0 2 0 3
0
1 1 2 1 3 1 0
1
2 2 3 2 0 2 1
2 3
3 3 0 3 1 3 2

Challenging to mine patterns in large graphs

Frequent Subgraphs
•Given a graph dataset 𝐷 = {𝐺1, 𝐺2, …, 𝐺𝑛}, find
subgraph(s) 𝐺 such that:

𝑠𝑢𝑝𝑝𝑜𝑟𝑡 (𝐺) ≥ 𝑚𝑖𝑛𝑆𝑢𝑝

where 𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝐺) is the frequency (or the

percentage) of graphs in 𝐷 containing 𝐺 and
𝑚𝑖𝑛𝑆𝑢𝑝 is a selected threashold.

•Frequent graph: satisfies 𝑚𝑖𝑛𝑆𝑢𝑝 (a minimum

support threshold).
Frequent Subgraph Mining

Graph
Database

• Enumerate all subgraphs occurring more

than 3 times

Patterns
Frequent Subgraph Example

(1) (2) (3)

A A C C
A B
B
B A
B C A
A
C A
Support 1 3 3
subgraph
Finding frequent subgraphs
Elementary edit operations
Graph edit distance (example 1)
Graph edit distance (example 2)
Key Challenges in Subgraph
Mining
• Graph isomorphism: two graphs are identical in
structure
• Graph representation (Canonical Labeling)
• A canonical label is a unique code of a given
graph.
• Canonical label should be the same no matter
how graphs are represented, as long as graphs
have the same topological structure and the
same labeling of edges and vertices.
• Subgraph candidate generation
• generate candidate frequent subgraphs from
datasets
Graph Isomorphism
•A graph is isomorphic if it is topologically
equivalent to another graph
B
A A
B A

B B

B B
A

A A
B A

B
Graph Isomorphism
Graph Isomorphism

ƒ(a ) = 1

ƒ(b ) = 6

ƒ(c ) = 8

ƒ(d ) = 3

ƒ(g ) = 5

ƒ(h ) = 2

ƒ(i ) = 4

ƒ(j ) = 7
Graph Isomorphism
• Use canonical labeling to handle isomorphism
• Map each graph into an ordered string
representation (known as its code) such that two
isomorphic graphs will be mapped to the same
canonical encoding
• Example:
• Lexicographically largest adjacency matrix
• Find the permutations of the vertices so that
the adjacency matrix is lexicographically
maximized when read off from left to right,
one row at a time.
Canonical label of graph
Lexicographically largest (or smallest) string obtained by
concatenating upper triangular entries of adj. matrix (after
symmetric permutation)
Uniquely identifies a graph and its isomorphs
Two isomorphic graphs will get same canonical label
Graph representation: adjacency matrix
Candidate Generation

•In Apriori:
•Merging two frequent k-itemsets will
produce a candidate (k+1)-itemset

•In frequent subgraph mining (vertex/edge

growing)
•Merging two frequent k-subgraphs may
produce more than one candidate (k+1)-
subgraph
Multiplicity of Candidates

a
• Case 1: identical vertex labels a e

e b e
a a b
a
b
e + b
e
a
a
a a
e
b
a e
Multiplicity of Candidates
• Case 2: Core contains identical labels a
b
c
a
a
a
a a
b c
b a
c
a
a
+ a
a
a
a
a a a

a
c
a
Core: The (k-1) subgraph that is common a
b
between the joint graphs a
Multiplicity of Candidates
• Case 3: Core multiplicity a a a
a a
b a a b a a
b a

+
a a a b a a

b a a b a b a
Candidates generation (join)
based on core detection
+

+
Candidate Generation Based On
Core Detection (cont. )
First Core

SecondCore

Multiple cores
between two
(k-1)-subgraphs
First Core SecondCore
Vertex Growing
a a a
q q
e e
p p p
p p
p
a a a

r
+ r
d
r r d
r
a a a

G1 G2 G3 = join(G1,G2)
0 p 0 q 
p
0 p p q 0 p p 0 
    p 0 r 0 0
 p 0 r 0  p 0 r 0
M = M = M G3 =  p r 0 r 0 
G1
p r 0 0
G2
p r 0 r 
    0 0 r 0 0
q 0 0 0 0 0 r 0 q
 0 0 0 0 
Edge Growing

a
a a
q
q
p p
f p f
p p f p
a

r
+ a
r
a
r
r r
a a a
G1 G2 G3 = join(G1,G2)
Apriori-Based, Breadth-First Search
◼ Methodology: breadth-search, joining two graphs

• AGM (Inokuchi, et al.)

• generates new graphs with one more node

◼ FSG (Kuramochi and Karypis)

◼ generates new graphs with one more edge
Apriori-based method
➢ Graph candidate generated example
Apriori-Based Approach
(k+1)-edge
k-edge
G1
Pruning
G
G1 Support
G2
Counting
G’ G2 G1 Elimination
…
G1

G’’ Gn G2

JOIN
Pattern Growth Approach
(k+2)-edge
(k+1)-edge
…
G1

k-edge
G2
duplicate
G
graph
…

Gn …
Pattern Growth Approach
➢ Pattern Growth (free extension)
Pattern Growth Approach
•Duplicate Graphs
Pattern Growth Approach
•Free extension
Pattern Growth Approach
•Right most extension
Pattern Growth Approach
•Exmaples (cont.)
gSpan
gSpan Advantages
•Lower memory requirements.
•Faster than naïve FSG by an order of
magnitude.
•No candidate generation.
•Lexicographic ordering minimizes search
tree.
•False positives pruning.
Graph Pattern Explosion Problem

•If a graph is frequent, all of its subgraphs

are frequent ─ the Apriori property.
•An n-edge frequent graph may have 2n
subgraphs.

Unit 4
No ratings yet
Unit 4
78 pages
Graph Mining: Anuraj Mohan 13MZ01, CSED
No ratings yet
Graph Mining: Anuraj Mohan 13MZ01, CSED
50 pages
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
No ratings yet
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
37 pages
Data Mining-Graph Mining
No ratings yet
Data Mining-Graph Mining
9 pages
GraphMining 04 FrequentSubgraph
No ratings yet
GraphMining 04 FrequentSubgraph
61 pages
11 Graph Pattern Mining
No ratings yet
11 Graph Pattern Mining
71 pages
Graph Pattern Mining, Search and OLAP
No ratings yet
Graph Pattern Mining, Search and OLAP
14 pages
Introduction to Graph Mining Techniques
No ratings yet
Introduction to Graph Mining Techniques
59 pages
A Comparative Study of Frequent Subgraph Mining Algorithms
No ratings yet
A Comparative Study of Frequent Subgraph Mining Algorithms
17 pages
Data Mining Graphs and Networks
No ratings yet
Data Mining Graphs and Networks
5 pages
Co So Du Lieu Do Thi
No ratings yet
Co So Du Lieu Do Thi
46 pages
Graph Based Clustering
No ratings yet
Graph Based Clustering
78 pages
Graph Mining Applications in Bioinformatics
No ratings yet
Graph Mining Applications in Bioinformatics
22 pages
A Graph Mining Approach For Ranking and Discovering The Interesting Frequent Subgraph Patterns
No ratings yet
A Graph Mining Approach For Ranking and Discovering The Interesting Frequent Subgraph Patterns
17 pages
4 IJAEST Vol No.4 Issue No.2 Classification of Approaches and Challenges of Frequent Subgraphs Mining in Biological Networks 014 017
No ratings yet
4 IJAEST Vol No.4 Issue No.2 Classification of Approaches and Challenges of Frequent Subgraphs Mining in Biological Networks 014 017
4 pages
Grami-2014-Elseidy
No ratings yet
Grami-2014-Elseidy
12 pages
Scalable Maximal Subgraph Mining With Backbone-Preserving Graph Convolutions
No ratings yet
Scalable Maximal Subgraph Mining With Backbone-Preserving Graph Convolutions
22 pages
Graph Mining: Techniques & Applications
No ratings yet
Graph Mining: Techniques & Applications
8 pages
Mining Frequent Subgraph Patterns From Uncertain Graph Data
No ratings yet
Mining Frequent Subgraph Patterns From Uncertain Graph Data
16 pages
An Efficient Algorithm For Discovering Frequent Subgraphs
No ratings yet
An Efficient Algorithm For Discovering Frequent Subgraphs
13 pages
Graph-Based Pattern Recognition Techniques
No ratings yet
Graph-Based Pattern Recognition Techniques
48 pages
Graph Based Data Science
No ratings yet
Graph Based Data Science
37 pages
Community Detection Using Statistically Significant Subgraph Mining
No ratings yet
Community Detection Using Statistically Significant Subgraph Mining
10 pages
L21 Mining Social Network Graphs
No ratings yet
L21 Mining Social Network Graphs
30 pages
DM Unit 2 Topics
No ratings yet
DM Unit 2 Topics
12 pages
NGDM07 Philip Yu
No ratings yet
NGDM07 Philip Yu
22 pages
Shervashidze 11 A
No ratings yet
Shervashidze 11 A
23 pages
Social Network Analysis Unit-3
No ratings yet
Social Network Analysis Unit-3
28 pages
(AAAI2023) Learning To Count Isomorphisms
No ratings yet
(AAAI2023) Learning To Count Isomorphisms
9 pages
Graph Subpath Mining Algorithm
No ratings yet
Graph Subpath Mining Algorithm
12 pages
Community Structure Identification in Graphs
No ratings yet
Community Structure Identification in Graphs
10 pages
Graph Mining Techniques Overview
No ratings yet
Graph Mining Techniques Overview
23 pages
2324018
No ratings yet
2324018
1 page
Week 16
No ratings yet
Week 16
47 pages
Poster
No ratings yet
Poster
1 page
L4-GraphAlgorithms v4
No ratings yet
L4-GraphAlgorithms v4
56 pages
Graph Mining: A Survey of Graph Mining Techniques: August 2012
No ratings yet
Graph Mining: A Survey of Graph Mining Techniques: August 2012
6 pages
Mining Concepts Apriori Frequent Pattern
No ratings yet
Mining Concepts Apriori Frequent Pattern
6 pages
Graph Mining Techniques and Tools Overview
No ratings yet
Graph Mining Techniques and Tools Overview
3 pages
Arabesque
No ratings yet
Arabesque
16 pages
Pattern Vectors From Algebraic Graph Theory
No ratings yet
Pattern Vectors From Algebraic Graph Theory
14 pages
Community Detection and Evaluation
No ratings yet
Community Detection and Evaluation
46 pages
Unit I Graph Theory and Concepts
No ratings yet
Unit I Graph Theory and Concepts
35 pages
Topic 01 - Course Introduction
No ratings yet
Topic 01 - Course Introduction
38 pages
F - ON : A E A G I P G Q P: AST N Xtended Lgorithm For Raph Somorphism Roblem and Raph Uery Rocessing
No ratings yet
F - ON : A E A G I P G Q P: AST N Xtended Lgorithm For Raph Somorphism Roblem and Raph Uery Rocessing
15 pages
Introduction to Network Theory
No ratings yet
Introduction to Network Theory
66 pages
Menendez Llorente
No ratings yet
Menendez Llorente
22 pages
Pattern Mining Current Challenges and Op
No ratings yet
Pattern Mining Current Challenges and Op
16 pages
Graph Mining Algos in Object Tracking
No ratings yet
Graph Mining Algos in Object Tracking
2 pages
A New Method For Subgraph Detection - SubGraD
No ratings yet
A New Method For Subgraph Detection - SubGraD
8 pages
Graph Relational Data
No ratings yet
Graph Relational Data
1 page
Efficient Frequent Connected Induced Subgraph Mining in Graphs of Bounded Tree-Width
No ratings yet
Efficient Frequent Connected Induced Subgraph Mining in Graphs of Bounded Tree-Width
16 pages
Lecture 03
No ratings yet
Lecture 03
23 pages
Continuous Subgraph Pattern Search Over Certain and Uncertain Graph Streams
No ratings yet
Continuous Subgraph Pattern Search Over Certain and Uncertain Graph Streams
18 pages
Graph Theory Cambridge U
No ratings yet
Graph Theory Cambridge U
75 pages
SADMJ12
No ratings yet
SADMJ12
19 pages
Analysis of Large Graph Partitioning and Frequent Subgraph Mining On Graph Data
No ratings yet
Analysis of Large Graph Partitioning and Frequent Subgraph Mining On Graph Data
12 pages
Ecotourism: A Comprehensive Bibliography
No ratings yet
Ecotourism: A Comprehensive Bibliography
23 pages
Deterritorializing The New German Cinema
100% (3)
Deterritorializing The New German Cinema
216 pages
Harrah Case
100% (1)
Harrah Case
4 pages
Experiment On A Bird
No ratings yet
Experiment On A Bird
4 pages
2022 Findings-Naacl 19
No ratings yet
2022 Findings-Naacl 19
16 pages
Assignment No. 1 - Deadline 03.10.22
No ratings yet
Assignment No. 1 - Deadline 03.10.22
5 pages
WePower's Renewable Energy Platform
No ratings yet
WePower's Renewable Energy Platform
24 pages
Reflection Essay
No ratings yet
Reflection Essay
3 pages
The 23rd Annual International Conference On
No ratings yet
The 23rd Annual International Conference On
296 pages
Communication Theory and Practice
No ratings yet
Communication Theory and Practice
237 pages
Beginner's Guide to Exploratory Factor Analysis
No ratings yet
Beginner's Guide to Exploratory Factor Analysis
16 pages
Basic Factory Dynamics: HAL Case - Science?
No ratings yet
Basic Factory Dynamics: HAL Case - Science?
9 pages
Beattie, Kenny Et Al. (2014) The Effect of Strength Training On Performance in Endurance Athletes
No ratings yet
Beattie, Kenny Et Al. (2014) The Effect of Strength Training On Performance in Endurance Athletes
24 pages
Service Quality Model and Research Insights
No ratings yet
Service Quality Model and Research Insights
11 pages
Productivity and Its Measurement
No ratings yet
Productivity and Its Measurement
9 pages
Kami Export - Project-Informative Presentation-Student Guide
No ratings yet
Kami Export - Project-Informative Presentation-Student Guide
3 pages
Efficient Irrigation for Farmers
100% (2)
Efficient Irrigation for Farmers
50 pages
Southern - Isopluvial Map
100% (2)
Southern - Isopluvial Map
32 pages
Axe Hair Products Marketing Strategy
No ratings yet
Axe Hair Products Marketing Strategy
4 pages
Report On Performance Management by Gelase Mutahaba 2 November 2010
No ratings yet
Report On Performance Management by Gelase Mutahaba 2 November 2010
73 pages
-تحديات وضرورة تحسين وسائل الدفع الإلكترونية لأداء البنوك في ظل جائحة كورونا -دراسة حالة الجزائر
100% (1)
-تحديات وضرورة تحسين وسائل الدفع الإلكترونية لأداء البنوك في ظل جائحة كورونا -دراسة حالة الجزائر
35 pages
Hydraulic Cylinder Seals Guide
No ratings yet
Hydraulic Cylinder Seals Guide
7 pages
Listening Skill For Young Children by Jalongo
No ratings yet
Listening Skill For Young Children by Jalongo
45 pages
AS-32 Anti-Sludge Agent Overview
No ratings yet
AS-32 Anti-Sludge Agent Overview
1 page
LMS11
No ratings yet
LMS11
7 pages
Spink A. (Ed), Cole Ch. (Ed) - New Directions in Cognitive Information Retrieval (2005)
No ratings yet
Spink A. (Ed), Cole Ch. (Ed) - New Directions in Cognitive Information Retrieval (2005)
249 pages
Economic Development Brazil Case Study
No ratings yet
Economic Development Brazil Case Study
6 pages
CBSE Results 2025
No ratings yet
CBSE Results 2025
1 page
Qualification Standards
No ratings yet
Qualification Standards
3 pages
Gel Permeation Chromatography Guide
No ratings yet
Gel Permeation Chromatography Guide
3 pages

Graph Mining

Uploaded by

Graph Mining

Uploaded by

Graph mining

1. multiple small graphs → today

Information to mine: interesting substructures, similarities,

Internet Co-author network

Molecular interaction networks are mapped as graphs

Type Of Relation Edge Label

Relation between a Hyper Edge

Provide enormous flexibility for modeling the underlying data as

(a) Labeled Graph (b) Subgraph (c) Induced Subgraph

Discover structural patterns in the underlying graph

Standard approach: Iterative expansion

Challenging to mine patterns in large graphs

𝑠𝑢𝑝𝑝𝑜𝑟𝑡 (𝐺) ≥ 𝑚𝑖𝑛𝑆𝑢𝑝

where 𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝐺) is the frequency (or the

•Frequent graph: satisfies 𝑚𝑖𝑛𝑆𝑢𝑝 (a minimum

• Enumerate all subgraphs occurring more

(1) (2) (3)

•In frequent subgraph mining (vertex/edge

• AGM (Inokuchi, et al.)

◼ FSG (Kuramochi and Karypis)

•If a graph is frequent, all of its subgraphs

You might also like