0% found this document useful (0 votes)

31 views61 pages

GraphMining 04 FrequentSubgraph

The document outlines a course on Graph Mining, focusing on frequent subgraph mining techniques and their applications. It discusses various approaches including Apriori-based and pattern-growth methods, as well as algorithms like FSG and gSpan for mining frequent subgraphs. The document also addresses challenges in candidate generation, pruning, and frequency counting, emphasizing the complexity of subgraph isomorphism.

Uploaded by

nguyenhoangthuong02082003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views61 pages

GraphMining 04 FrequentSubgraph

Uploaded by

nguyenhoangthuong02082003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Mining graph patterns

Davide Mottin, Konstantina Lazaridou

Hasso Plattner Institute

Graph Mining course Winter Semester 2016

Infos
§ Who attends one of the following courses:
• IT-Recht
• Randomisierte Algorithmen
• Live and reactive programming

§ Are you fine in presenting also on January 24?

• Only for whom that do not fit the other two days

§ The project can be made in group of 1-2 people [as opposed to

what said in the first lecture]

GRAPH MINING WS 2016 2

Lecture road

Subgraph mining

Mining Frequent Subgraphs

Small graphs

Mining Frequent Subgraphs

Large graphs

GRAPH MINING WS 2016 3

Why Frequent patterns
§ Frequent pattern: a pattern (a set of items, subsequences,
substructures, etc.) that occurs frequently in a data set
§ Motivation: Finding inherent regularities in data
• What products were often purchased together?
• What are the subsequent purchases after buying a PC?
• What kinds of DNA are sensitive to this new drug?
• Can we classify web documents using frequent patterns?

GRAPH MINING WS 2016 4

The Apriori principle
§ Also called Downward closure Property
§ All subsets of a frequent itemset must also be frequent
• Because any transaction that contains X must also contains subset of X.
§ If we have already verified that X is infrequent, there is no need
to count X supersets because they MUST be infrequent too.

GRAPH MINING WS 2016 5

Graph Pattern Mining
§ Frequent subgraphs
• A (sub)graph is frequent if its support (occurrence frequency)
in a given dataset is no less than a minimum support threshold
§ Applications of graph pattern mining:
• Mining biochemical structures
• Program control flow analysis
• Mining XML structures or Web communities
• Building blocks for graph classification, clustering,
compression, comparison, and correlation analysis

GRAPH MINING WS 2016 6

Frequent Subgraph Mining
a
Problem
Find all subgraphs of G that appear at least
a c c 𝜎 times

a
Suppose 𝜎 = 2, the frequent subgraphs are
b (only edge labels)
c • a, b, c
• a-a, a-c, b-c, c-c
• a-c-a …

b Exponential number of patterns!!!

GRAPH MINING WS 2016 7

How to mine frequent subgraphs?
§ Apriori-based approaches
• Start with small-size subgraphs and proceeds in a bottom-up manner
• Join two patterns to create bigger size patterns (through Apriori principle)
• Several approaches
⁃ AGM/AcGM [Inokuchi et al.,PAKDD’00]
⁃ FSG [Kuramochi et al., ICDM’01]
⁃ PATH# [Vanetik et al., ICDM’02/ICDM’04]

§ Pattern-growth approaches
• Extends existing frequent graphs by adding one edge
• Several approaches:
⁃ MoFa [Borgelt et al., ICDM’02]
⁃ gSpan [Yan et al., ICDM’02]
⁃ Gaston [Nijssen et al., KDD’04]
⁃ FFSM [Huan et al., ICDM’03]
⁃ SPIN [Huan et al., KDD’04]

GRAPH MINING WS 2016 8

Apriori-Based Approach
(k+1)-graph
k-graph
Problems?
G1
G Join operation among
G2 graphs is extremely
expensive
G’
…

G’’ Gn

Join 2 patterns from the previous level

GRAPH MINING WS 2016 9

Pattern Growth Method
Generate patterns
expanding existing ones (k+2)-graph
(k+1)-graph Problems?

G1 …
duplicate
k-graph
G2 graphs
G
…

Gn …

GRAPH MINING WS 2016 10

Other Mining Functions
§ Maximal frequent subgraph mining
• A subgraph is maximal, if none of its super-graphs is frequent

§ Closed frequent subgraph mining

• A frequent subgraph is closed, if all its supergraphs have a (strictly) smaller
frequency
• Algorithms: CloseGraph, SPIN, MARGIN

§ Significant subgraph mining

• Mining subgraphs using some significant test (e.g., G-test, p-value)
• Algorithms: gBoost , gPLS, Leap, GraphSig

§ Representative orthogonal graphs mining

• Mining subgraphs with bounded similarity and overlap with respect to other
patterns
• Algorithms: ORIGAMI

GRAPH MINING WS 2016 11

Lecture road

Subgraph mining

Mining Frequent Subgraphs

Small graphs

Mining Frequent Subgraphs

Large graphs

GRAPH MINING WS 2016 12

Graph Pattern Mining - Set of graphs

G1 G2 G3 G4

Support: number (or fraction)

of times a subgraph appear in a
Frequent subgraph
Min support = 3 set of graphs

Apriori principle (for graphs):

If a graph is frequent, all of its subgraphs are frequent

GRAPH MINING WS 2016 13

On undirected, labeled
Subgraph Mining problem set of graphs

§ Support: Given a set of labeled graphs 𝐷 =

𝐺&, 𝐺(, … , 𝐺* , 𝐺+ = 〈𝑉+ , 𝐸+ , ℓ+ 〉 and a subgraph G, the
supporting set of G is 𝐷1 = 𝐺+ 𝐺 ⊑ 𝐺+ , 𝐺+ ∈ 𝐷 , where 𝐺 ⊑ 𝐺+
indicates that G is subgraph isomorphic to 𝐺+ . The support is
|5 |
defined as 𝜎 𝐺 = 6
|5|
§ Input
• Set of labeled-graphs 𝐷 = 𝐺& , 𝐺( , … , 𝐺* , 𝐺+ = 〈𝑉+ , 𝐸+ , ℓ+ 〉
• Minimum support min_sup

§ Output:
• A subgraph G is frequent if 𝜎 𝐺 ≥ min_𝑠𝑢𝑝
• Each subgraph is connected.

Support is anti-monotone: for all 𝐺 ? ⊑ 𝐺, 𝜎 𝐺 ? ≥ 𝜎(𝐺)

GRAPH MINING WS 2016 14

Finding Frequent Subgraphs: Input and
Output
§ Input
• Set of graphs (graph database)
• Undirected simple graph Input: Graph Database Output: Frequent Connected Subgraphs

(no loops, no multiples edges).

• Each graph has labels associated Support = 100%

with its vertices and edges.

• Graphs may not be connected.
• Minimum support threshold
min_supp. Support = 66%

§ Output
Support = 66%
• Frequent subgraphs that satisfy the
minimum support constraint.
• Each frequent subgraph is
connected.

GRAPH MINING WS 2016 15

Mining approaches: agenda
§ Apriori-based approaches:
• FSG
§ Pattern-growth approaches:
• gSpan
§ Greedy appraoch:
• Subdue

GRAPH MINING WS 2016 16

FSG Algorithm
Notation: k-subgraph is a subgraph with k edges.
Init: Scan the transactions to find F1, the set of all frequent
1-subgraphs and 2-subgraphs, together with their counts;
For (k=3; Fk-1 ¹ Æ ; k++)
1. Candidate Generation - Ck, the set of candidate k-subgraphs, from Fk-1,
the set of frequent (k-1)-subgraphs;
2. Candidates pruning - a necessary condition of candidate to be
frequent is that each of its (k-1)-subgraphs is frequent.
3. Frequency counting - Scan the graph database to count the
occurrences of subgraphs in Ck;
4. Fk = { c ÎCK | c has counts no less than min_sup }
5. Return F1 È F2 È ……È Fk (= F )

Kuramochi, M. and Karypis, G., 2001. Frequent subgraph discovery. ICDM 2001.

GRAPH MINING WS 2016 17

Simple operations?
§ Candidate generation
• To determine two candidates for joining, we need to check for
graph isomorphism.
§ Candidate pruning
• To check downward closure property, we need graph
isomorphism.
§ Frequency counting
• Subgraph isomorphism for checking containment of a frequent
subgraph.

Recall that subgraph isomorphism is NP-complete!!!

GRAPH MINING WS 2016 18

Candidates generation (join) based on
core detection - Issues
Same core different k+1 patterns

+ By Vertex labeling

By multiple automorphisms (different traversal order for the same graph) on a single core
GRAPH MINING WS 2016 19
Candidate Generation Based On Core
Detection - Issues (2)

First Core

Second Core

Multiple cores
between two
First Core Second Core
(k-1)-subgraphs

GRAPH MINING WS 2016 20

Candidate pruning: downward closure
property
§ Every (k-1)-subgraph must be frequent.
§ For all the (k-1)-subgraphs of a given k-candidate, check if
downward closure property holds

3-candidates:

4-candidates:

GRAPH MINING WS 2016 21

frequent
1-subgraphs

frequent
2-subgraphs

3-candidates

frequent
3-subgraphs

4-candidates

... ...
frequent
4-subgraphs
GRAPH MINING WS 2016 22
Candidate Pruning: Canonical Labeling
Idea: Use the adjacency matrix to construct a hashable string representation of
the graph
• Concatenate the columns (if undirected only of the upper left matrix)

a a b
Code(M1) = “aabyzx” M1 :
Code(M2) = “abaxyz” a y z
a
Graph G:
a y x
y z b z x
a b a
M2 :
a b
a x y
x
Canonical-Code(G) = min{ code(M) | M is adj. Matrix}
b x z
String comparison! a y z
GRAPH MINING WS 2016 23
Canonical labeling NP-hard

§ Intuitively: Find a unique canonical form (relabeling) or

automorphism (a mapping among nodes in the same graph)
such that the canonical form for two isomorphic graphs is the
same.
§ The problem is as complex as graph isomorphism, but FSG
suggests some heuristics to speed it up such as:
• Vertex Invariants (e.g. degree)
• Neighbor lists
• Iterative partitioning

GRAPH MINING WS 2016 24

Frequency counting
Idea: For each pattern 𝑓& , … , 𝑓D take a list of graphs TID that contain the pattern
• When joining two patterns compute the intersection between the lists
• If the size of the intersection < min_supp remove the pattern

𝐺& = 𝑓& , 𝑓( , 𝑓C 𝑇𝐼𝐷(𝑓& ) = 𝐺& , 𝐺(

𝐺( = 𝑓& 𝑇𝐼𝐷(𝑓( ) = 𝐺& , 𝐺C
𝐺C = 𝑓(
Candidate
𝑐 = 𝑗𝑜𝑖𝑛 𝑓& , 𝑓(

𝑇𝐼𝐷 𝑐 = 𝑠𝑢𝑏𝑠𝑒𝑡(𝑇𝐼𝐷 𝑓& ∩ 𝑇𝐼𝐷(𝑓( )

Perform subgraph isomorphism only on 𝐺&

GRAPH MINING WS 2016 25

Simple operations?
§ Candidate generation
• To determine two candidates for joining, we need to check for
graph isomorphism.
• Solution: use Core detection
§ Candidate pruning
• To check downward closure property, we need graph
isomorphism.
• Solution: use Canonical labeling
§ Frequency counting
• Subgraph isomorphism for checking containment of a frequent
subgraph.
• Solution: use TID lists

Simpler? Yes with some graphs but in general still NP-complete

GRAPH MINING WS 2016 26

Mining approaches: agenda
§ Apriori-based approaches:
• FSG
§ Pattern-growth approaches:
• gSpan
§ Greedy approach:
• Subdue

GRAPH MINING WS 2016 27

gSpan
§ One of the earliest and most used approaches for subgraph
mining
§ Makes use of the properties of (Depth First Search) DFS
traversals to define canonical codes called DFS-codes
§ Reduces the redundancy in the generation of the patterns

Yan, X. and Han, J., 2002. gSpan: Graph-based substructure pattern mining. ICDM 2003

GRAPH MINING WS 2016 28

Motivation: DFS exploration wrt. itemsets.
Itemset search space –> prefix based
Prefix based exploration in graphs?
DFS
abcde

abcd abce abde acde bcde

abc abd abe acd ace ade bcd bce bde cde

ab ac ad ae bc bd be cd ce de

a b c d e
GRAPH MINING WS 2016 29
Why using a prefix tree?
§ Canonical representation of itemset is obtained by a complete
order over the items.
§ Each possible itemset appear in the tree exactly once - no
duplications or omissions.
• The algorithm is complete and correct

§ Properties of Tree search space

• for each k-label, its parent is the k-1 prefix of the given k-label
• The relation among siblings is in ascending lexicographic order.

GRAPH MINING WS 2016 30

DFS Code representation
§ Map each graph (2-Dim) to a sequential DFS Code (1-Dim)
§ Lexicographically ordered codes
§ Construct Tree Search Space based on the lexicographic order.

Edge# Code
0 X 0 (0,1,X,a,Y) DFS-edge: 𝑖, 𝑗, 𝐿+ , 𝐿+,W , 𝐿W
a i,j – vertices by discovery time
1 (1,2,Y,b,X)
1 Y a
2 (2,0,X,a,X) 𝐿+ , 𝐿W - vertex labels of 𝑣+ , 𝑣W
b
X
d 3 (2,3,X,c,Z) 𝐿+,W - edge label between 𝑣+ , 𝑣W
b
c 2 Z 4 4 (3,1,Z,b,Y) 𝑖 < 𝑗 - forward edge
Z v4 5 (1,4,Y,d,Z) 𝑖 > 𝑗 - backward edge
3

Vertex discovery
time

GRAPH MINING WS 2016 31

DFS Code construction
§ Given a graph G, for each Depth First Search over graph G,
construct the corresponding DFS-Code.

(a) (b) (c) (d) (e) (f) (g)

v0 X v0 X v0 X v0 X v0 X v0 X v0 X
a a a a a a a
a v1 a v1 a v1 a v1 a v1 a v1 a
Y Y Y Y Y Y Y
b b b b b b b
d d d d b d d d
b X X b X b X X b X b X
b
c Z c Z c v2 Z c v2 Z c v2 Z c v2 Z c v2 Z
Z Z Z Z Z v Zv Zv v4
3 3 3

(0,1,X,a,Y) (1,2,Y,b,X) (2,0,X,a,X) (2,3,X,c,Z) (3,1,Z,b,Y) (1,4,Y,d,Z)

GRAPH MINING WS 2016 32

Single graph, several DFS-Codes
(a) (b) (c) G X v0 X
a a
1 (0, 1, X, a, Y) (0, 1, Y, a, X) (0, 1, X, a, X) a v1 a
Y Y
b b
d d
2 (1, 2, Y, b, X) (1, 2, X, a, X) (1, 2, X, a, Y) b X b X v4
c Z c
v 2 Z
Z Z v3
3 (2, 0, X, a, X) (2, 0, X, b, Y) (2, 0, Y, b, X)
(a)
4 (2, 3, X, c, Z) (2, 3, X, c, Z) (2, 3, Y, b, Z)
v0 v0
Y d v4 X
a Z a
5 (3, 1, Z, b, Y) (3, 0, Z, b, Y) (3, 0, Z, c, X)
v1 X v1 X b
b
b a c a
6 (1, 4, Y, d, Z) (0, 4, Y, d, Z) (2, 4, Y, d, Z) v2 X v2 Y d
c b
v4
v3 v3 Z
Z Z

(b) (c)

GRAPH MINING WS 2016 33

Valid DFS-Code Edge ordering
Define a specific order on edges corresponding to the DFS
traversal
§ 𝑒& = 𝑖&, 𝑗& , 𝑒( = 𝑖(, 𝑗(
§ 𝑒& ≺ 𝑒( ⟹ 𝑒& appears before 𝑒( in the code
Ordering rules
1. If 𝑖& = 𝑖( and 𝑗& < 𝑗( ⟹ 𝑒& ≺ 𝑒( (backward edge)
2. If 𝑖& < 𝑗& and 𝑗& = 𝑖( ⟹ 𝑒& ≺ 𝑒( (forward edge)
3. If 𝑒& ≺ 𝑒( and 𝑒( ≺ 𝑒C ⟹ 𝑒& ≺ 𝑒C (transitive property)

Why enforcing an edge ordering in the DFS code?

You want to ensure that the code (and so the DFS exploration) is
produced in a certain order for later comparisons

GRAPH MINING WS 2016 34

Multiple codes?
§ Having multiple codes for the same graph is not good, since we
don’t have an effective way to compare two graphs
§ Idea: find (define) a total order between DFS-codes, so we can
find the minimum (or the maximum)
• If we compare the minimum DFS-codes of two graphs we are sure that if they are
equal than the graphs are isomorphic

GRAPH MINING WS 2016 35

Single graph - single Min DFS-code
Min
DFS-Code G X v0 X
a a
(a) (b) (c) a v1 a
Y Y
1 (0, 1, X, a, Y) (0, 1, Y, a, X) (0, 1, X, a, X) b b
d b d
X X
v2 Zv4
b
c Z c
2 (1, 2, Y, b, X) (1, 2, X, a, X) (1, 2, X, a, Y) Z Z v3

(a)
3 (2, 0, X, a, X) (2, 0, X, b, Y) (2, 0, Y, b, X)
v0 v0
Y d v4 X
a Z a
4 (2, 3, X, c, Z) (2, 3, X, c, Z) (2, 3, Y, b, Z) v1 X b v1 X b
b a c a
v2 X v2 Y d
5 (3, 1, Z, b, Y) (3, 0, Z, b, Y) (3, 0, Z, c, X) c b
v4
v3 v3 Z
Z Z
6 (1, 4, Y, d, Z) (0, 4, Y, d, Z) (2, 4, Y, d, Z)
(b) (c)

GRAPH MINING WS 2016 36

What is minimum?
§ In order to define a minimum we have to define an order on the
DFS codes.
§ Assume you have an order on the edge/node labels
§ Given two DFS codes 𝛼, 𝛽 for two graphs, how can we
determine if 𝛼 ≺ 𝛽?
• 𝛼 = 𝑎` , 𝑎& , … , 𝑎a , 𝛽 = 𝑏` , 𝑏& , … , 𝑏*
• Assume 𝑛 ≥ 𝑚

§ 𝛼 ≤ 𝛽 iff either of the following are true:

• ∃𝑡, 0 ≤ 𝑡 ≤ min(𝑛, 𝑚) such that 𝑎f = 𝑏f for 𝑘 < 𝑡 and 𝑎h ≺i 𝑏h
• 𝑎f = 𝑏f for 0 ≤ 𝑘 ≤ 𝑚

How do we define
an order on the
DFS-edges?

GRAPH MINING WS 2016 37

Defining 𝑎h ≺i 𝑏h
§ 𝑎h = 𝑖j , 𝑗j , 𝐿+k , 𝐿+k ,Wk , 𝐿Wk
§ 𝑏h = (𝑖l , 𝑗l , 𝐿+m , 𝐿+m ,Wm , 𝐿Wm )
§ 𝑎h ≺i 𝑏h if If they are forward edges 𝑗j = 𝑗l because since the previous
1. Both are forward edges and edges are equal you have discovered a new node
⁃ 𝑖l < 𝑖j (edge starts from a later visited vertex, why?)
⁃ 𝑖l = 𝑖j and the labels of a are lexicographically less than labels of b, in order of tuple
2. Both are backward edges (𝑖j = 𝑖l for the same reason)
⁃ 𝑗j < 𝑗l (edge connected to a earlier discovered vertex)
⁃ 𝑗j = 𝑗l and the edge label of a is lexicographically less than the one of b
▪ Question: Why not the node labels?
3. 𝑎h is backward and 𝑏h is forward

GRAPH MINING WS 2016 38

Minimum DFS-Code

The minimum DFS code min-DFS(G) for a graph G in DFG lexicographic order, is a
canonical representation of graph G

Theorem
Graphs G1 and G2 are isomorphic iff
min-DFS(G1) = min-DFS(G2)

GRAPH MINING WS 2016 39

DFS-Code Tree: parent-child relation
§ If min(𝐺1) = { 𝑎0, 𝑎1, … . . , 𝑎* }
and min(𝐺2) = { 𝑎0, 𝑎1, … . . , 𝑎* , 𝑏}
• G1 is parent of G2
• G2 is child of G1
§ A valid DFS code requires that b (because b by
definition is larger than any 𝑎+ ) grows from a vertex on
the rightmost path (inherited property from the DFS
search).
• Forward edge extensions to a DFS code must occur from a vertex on the rightmost
path
• Backward edge extensions must occur from the rightmost vertex.
• Why?
If it is NOT in the rightmost path then it has been already discovered by the DFS

GRAPH MINING WS 2016 40

v0
X
Graph G1: a
a
Y v1
b
d
b X
c v2 Z
Z v4
v3

Min(g) = (0,1,X,a,Y) (1,2,Y,b,X) (2,0,X,a,X) (2,3,X,c,Z) (3,1,Z,b,Y) (1,4,Y,d,Z)

A child of graph G1 must grow edge from rightmost path of

G1 (necessary condition)
v0 v5 v0
Graph G2: X ?
? X ?
a a
a a
wrong Y v1 ? ? v5 Y v1
?
b b
d d
b X v5 b X
?
c v2 Z ? c v2 Z
Z v4 Z v4
v5 v3 v3
?
?

Forward EDGE Backward EDGE

GRAPH MINING WS 2016 41
Search space: DFS code Tree
§ Organize DFS Code nodes as parent-child rules
§ Nodes are DFS codes except for the first level of the tree in
which a vertex represent a (frequent) vertex label
§ Each level of the tree adds an edge to the DFS code
§ Sibling nodes organized in ascending DFS lexicographic order.
§ InOrder traversal follows DFS lexicographic order!
§ Backward edges are expanded first, why? Think about the rules
for the DFS-codes ;)

GRAPH MINING WS 2016 42

0 Two pruning rules:
A A
1. The code is not minimum
1 C 2. The support is < min_support …
2 C 3
Min Not Min
DFS-Code DFS-Code
0 0 0 0
A A A C

PRUNED
0 0 0 0
A C 1 B 1 C 1 C B B 1 C
A A
1 1 1 C 1 C
C B 2 C 2 B 2 C B 2 A
3 C 3
2 2 3 3 S 3 2 2 3
S’

0 0 0 0 0
A A B B C
0 A
1 1 1 1 1 C
A B C B C
1 2 2 2 2 2

0 0 0
A B C
1 1 1

GRAPH MINING WS 2016 43

gSpan Algorithm
§ Traverse DFS code tree for given label sets
• Prune using support, minimality of codes
§ Input: Graph database D, min_sup
§ Output: frequent subgraphs set S
§ Procedure:
1. S ß frequent one-edge subgraphs in D (using DFS code)
2. Sort S in lexicographic order
3. N ßS (S will be modified)
4. for each 𝑛 ∈ 𝑁 do:
⁃ gSpan_Extend (D,n,min_sup, S)
5. Remove n from all graphs in D (consider subgraphs not already enumerated)
§ Strategy: grow minimal DFS codes that occur frequently in D

GRAPH MINING WS 2016 44

gSpan_Extend
§ Input: Graph database D, min_sup, DFS code n
§ Output: frequent subgraph set S
§ Procedure
1. If n not minimal end
2. Otherwise
1. Add n to S
2. for each single-edge rightmost expansion e of n
1. If 𝜎 𝑒 ≥min_sup then gSpan_Extend(D,e, min_sup,S)

GRAPH MINING WS 2016 45

Mining approaches: agenda
§ Apriori-based approaches:
• FSG
§ Pattern-growth approaches:
• gSpan
§ Greedy approach (in brief):
• Subdue

GRAPH MINING WS 2016 46

Subdue algorithm
§ A greedy algorithm for finding some of the most prevalent
subgraphs.
§ This method is not complete, i.e. it may not obtain all frequent
subgraphs, although it pays in fast execution.
§ Based on the description length: compresses graphs using graph
patterns:
• Use the patterns that give the maximum compression

§ Based on Beam Search - like BFS it progresses level by level.

Unlike BFS, however, beam search moves downward only
through the best W (beam width) nodes at each level. The
other nodes are ignored.
Lecture road

Subgraph mining

Mining Frequent Subgraphs

Small graphs

Mining Frequent Subgraphs

Large graphs

GRAPH MINING WS 2016 48

Single large graph - Motivation
§ Often the input is a single large graph.
§ Examples:
• The web or portions of it.
• A social network (e.g. a network of users communicating by email at BGU).
• A large XML database such as DBLP or Movies database.

§ Mining large graph databases is very useful.

GRAPH MINING WS 2016 49

Support in a large graph
§ A support measure is admissible if for any pattern P and any
sub-pattern 𝑄 ⊏ 𝑃 the support 𝜎(𝑃) is not larger than support
of Q, i.e., 𝜎 𝑄 ≥ 𝜎(𝑃). This is also referred as anti-monotonicity
property.

Problem: The number of occurrences of a pattern is not an admissible support measure

P2
G

𝑃1 ⊏ 𝑃2, but 1 = 𝜎 𝑃1 < 𝜎 𝑃2 = 2 How can we solve this

problem?

GRAPH MINING WS 2016 50

Admissible support measures
A number of admissible support measures have been proposed in
the last years:
1. Maximum Independent Set support (MIS)
• Based on maximum number of non-overlapping matches

2. Harmful overlap support (HO)

• Based on the number of patterns for which no subgraph is identical

3. Minimum Image Based support (MNI)

• Based on the number of times a node in the pattern is mapped into a distinct node
in the graph

GRAPH MINING WS 2016 51

Maximum independent set support (MIS)
§ Idea: Count how many times a pattern is mapped in a non-
overlapping subgraph
MIS computation
1. Construct an overlap graph G‡: V‡, E‡ , in which the set of
nodes V‡ is the set of embeddings (matches) of a pattern P
into a graph G and 𝐸Œ = 𝑓&, 𝑓( 𝑓&, 𝑓( ∈ 𝑉Œ ∧ 𝑓& ≠ 𝑓( ∧ 𝑓& ⊓
𝑓( ≠ ∅}, i.e., E‡ has an an edge to all the overlapping
embeddings.
2. 𝜎‘’“ = size of the maximum independent set of nodes in the
overlap graph.
• An independent set of nodes in a graph is a subset of non-adjacent nodes of the
graph, i.e. G: 𝑉, 𝐸 , 𝐼 ⊆ 𝑉 s.t. ∀ 𝑢, 𝑣 ∈ 𝐼, 𝑢, 𝑣 ∉ 𝐸
• The biggest possible independent set is called the maximum independent set

Finding a maximum independent set is NP-hard

GRAPH MINING WS 2016 52

Maximum independent set support (MIS)

Overlap!!
Compute the maximum
number of non adjacent Overlap graph
nodes

𝜎‘’“ = 2

GRAPH MINING WS 2016 53

Harmful overlap support (HO)
§ MIS support can be very restrictive and consider overlap among
single nodes
Overlap
Pattern Graph

§ Harmful overlap support 𝜎—Œ consider harmful those

embeddings that share a common subgraph

Non harmful
Overlap

Pattern Graph

Harmful overlap

Fiedler, M. and Borgelt, C., 2007, October. Subgraph support in a single large graph. ICDMW 2007.

GRAPH MINING WS 2016 54

Minimum Image Based support (MNI)
§ Simpler support based on the minimum number of times a
node of the pattern is mapped to the graph

Let 𝑓& , … , 𝑓a be the set of isomorphisms of a pattern 𝑃: 𝑉˜ , 𝐸˜ , ℓ˜ in a graph G. Let

𝐹 𝑣 = 𝑓& 𝑣 , … , 𝑓a 𝑣 the set that contains the (distinct) nodes in G whose
functions 𝑓& , … , 𝑓a map a node 𝑣 ∈ 𝑉˜ . The Minimum Image based support (MNI)
𝜎‘š’ (𝑃) of P in G is defined as 𝜎‘š’ 𝑃 = min{𝑡|𝑡 = 𝐹 𝑣 for all 𝑣 ∈ 𝑉˜ }

Pattern Graph

𝑣& 𝑣( 𝑣C
𝐹 𝑣& = 1
𝐹 𝑣( = 1
𝐹 𝑣C = 3 𝜎‘š’ 𝑃 = min 𝐹 𝑣& , 𝐹 𝑣( , 𝐹 𝑣C =1
Chen, C., Yan, X., Zhu, F. and Han, J. gapprox: Mining frequent approximate patterns from a massive network. ICDM, 2007.

GRAPH MINING WS 2016 55

Properties of the support measures
§ MIS, HO, and MNI are anti-monotone (proof omitted)
§ Computation time:
• MIS and HO are NP-hard to compute
• MNI can be computed in polynomial time!
§ What is the relationship among the support sets? And
among the support values?
• 𝜎‘’“ 𝑃 ≤ 𝜎—Œ 𝑃 ≤ 𝜎‘š’ 𝑃
• The same relationship holds for the support sets (MIS support
set is a subset of HO, which is a subset of MNI support set)

GRAPH MINING WS 2016 56

Frequent subgraph mining in large graphs

§ Having an anti-monotonic support measure means

that we can apply any previous technique (e.g., gSpan)
just changing how the support is computed and
leaving the rest unchanged.
§ Steps to find frequent subgraphs (Grow and store
approaches):
1. Add all frequent edges to the list of frequent subgraphs.
2. Extend frequent subgraphs to larger candidates.
3. Compute candidate frequency (find all occurrences)
4. Repeat step 2 until no more frequent subgraphs found

GRAPH MINING WS 2016 57

Approaches
§ Maximum Independent Set support (MIS)
• HSIGRAM, VSIGRAM [Kuramochi et al., 2005]

§ Harmful overlap support (HO)

• Fiedler et al., 2007

§ Minimum Image Based support (MNI)

• GrowStore – based on gSpan
• GraMi [Elseidy et al., 2014]

GRAPH MINING WS 2016 58

In the next episode …

Graph indexing and query optimization

Node classification

And much more …

GRAPH MINING WS 2016 59

Questions?

GRAPH MINING WS 2016 60

References
§ T. Washio A. Inokuchi and H.Motoda, An Apriori-Based Algorithm for Mining Frequent
Substructures from Graph Data, Proceedings of the 4th PKDD'00, 2000, pages 13-23.
§ M. Kuramochi and G. Karypis, An Efficient Algorithm for Discovering Frequent Subgraphs,
Tech. report, Department of Computer Science/Army HPC Research Center, 2002.
§ N. Vanetik, E.Gudes, and S. E. Shimony, Computing Frequent Graph Patterns from
Semistructured Data, Proceedings of the 2002 IEEE ICDM'02Y.
§ Xifeng and H. Jiawei, gspan: Graph-Based Substructure Pattern Mining, Tech. report,
University of Illinois at Urbana-Champaign, 2002.
§ W. Wang J. Huan and J. Prins, Efficient Mining of Frequent Subgraphs in the Presence of
Isomorphism, Proceedings of the 3rd IEEE ICDM'03 p.~549.
§ Moti Cohen, Ehud Gudes, Diagonally Subgraphs Pattern Mining. DMKD 2004, pages 51-58,
2004D. J. Cook and L. B. Holder, Graph-Based Data Mining, Tech. report, Department of CS
Engineering, 1998.
§ Kuramochi, M. and Karypis, G., 2005. Finding frequent patterns in a large sparse
graph. Data mining and knowledge discovery, 11(3), pp.243-271.
§ Fiedler, M. and Borgelt, C., 2007, October. Subgraph support in a single large graph.
ICDMW 2007.

GRAPH MINING WS 2016 61

Data Mining-Graph Mining
No ratings yet
Data Mining-Graph Mining
9 pages
Unit 4
No ratings yet
Unit 4
78 pages
11 Graph Pattern Mining
No ratings yet
11 Graph Pattern Mining
71 pages
Graph Mining: Anuraj Mohan 13MZ01, CSED
No ratings yet
Graph Mining: Anuraj Mohan 13MZ01, CSED
50 pages
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
No ratings yet
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
37 pages
Graph Mining
No ratings yet
Graph Mining
46 pages
Graph Pattern Mining, Search and OLAP
No ratings yet
Graph Pattern Mining, Search and OLAP
14 pages
Introduction to Graph Mining Techniques
No ratings yet
Introduction to Graph Mining Techniques
59 pages
Grami-2014-Elseidy
No ratings yet
Grami-2014-Elseidy
12 pages
Data Mining Graphs and Networks
No ratings yet
Data Mining Graphs and Networks
5 pages
Co So Du Lieu Do Thi
No ratings yet
Co So Du Lieu Do Thi
46 pages
Graph Mining Applications in Bioinformatics
No ratings yet
Graph Mining Applications in Bioinformatics
22 pages
4 IJAEST Vol No.4 Issue No.2 Classification of Approaches and Challenges of Frequent Subgraphs Mining in Biological Networks 014 017
No ratings yet
4 IJAEST Vol No.4 Issue No.2 Classification of Approaches and Challenges of Frequent Subgraphs Mining in Biological Networks 014 017
4 pages
Mining Frequent Subgraph Patterns From Uncertain Graph Data
No ratings yet
Mining Frequent Subgraph Patterns From Uncertain Graph Data
16 pages
Pattern Mining Current Challenges and Op
No ratings yet
Pattern Mining Current Challenges and Op
16 pages
A Graph Mining Approach For Ranking and Discovering The Interesting Frequent Subgraph Patterns
No ratings yet
A Graph Mining Approach For Ranking and Discovering The Interesting Frequent Subgraph Patterns
17 pages
DM Unit 2 Topics
No ratings yet
DM Unit 2 Topics
12 pages
Arabesque
No ratings yet
Arabesque
16 pages
Scalable Maximal Subgraph Mining With Backbone-Preserving Graph Convolutions
No ratings yet
Scalable Maximal Subgraph Mining With Backbone-Preserving Graph Convolutions
22 pages
GraphMining 01 Introduction
No ratings yet
GraphMining 01 Introduction
50 pages
Graph Mining: Techniques & Applications
No ratings yet
Graph Mining: Techniques & Applications
8 pages
Graph Mining Techniques and Tools Overview
No ratings yet
Graph Mining Techniques and Tools Overview
3 pages
NGDM07 Philip Yu
No ratings yet
NGDM07 Philip Yu
22 pages
An Efficient Algorithm For Discovering Frequent Subgraphs
No ratings yet
An Efficient Algorithm For Discovering Frequent Subgraphs
13 pages
Analysis of Large Graph Partitioning and Frequent Subgraph Mining On Graph Data
No ratings yet
Analysis of Large Graph Partitioning and Frequent Subgraph Mining On Graph Data
12 pages
Continuous Subgraph Pattern Search Over Certain and Uncertain Graph Streams
No ratings yet
Continuous Subgraph Pattern Search Over Certain and Uncertain Graph Streams
18 pages
2.3. Mining Graph Data
No ratings yet
2.3. Mining Graph Data
6 pages
Graph Mining: A Survey of Graph Mining Techniques: August 2012
No ratings yet
Graph Mining: A Survey of Graph Mining Techniques: August 2012
6 pages
Mining Concepts Apriori Frequent Pattern
No ratings yet
Mining Concepts Apriori Frequent Pattern
6 pages
Data Structure Notes
No ratings yet
Data Structure Notes
7 pages
Top-K Graph Mining Algorithm
No ratings yet
Top-K Graph Mining Algorithm
18 pages
Graph Subpath Mining Algorithm
No ratings yet
Graph Subpath Mining Algorithm
12 pages
L21 Mining Social Network Graphs
No ratings yet
L21 Mining Social Network Graphs
30 pages
Topic 01 - Course Introduction
No ratings yet
Topic 01 - Course Introduction
38 pages
Graph Mining Algos in Object Tracking
No ratings yet
Graph Mining Algos in Object Tracking
2 pages
Lecture 8 Graph Databases
No ratings yet
Lecture 8 Graph Databases
77 pages
Support Computation For Mining Frequent Subgraphs in A Single Graph
No ratings yet
Support Computation For Mining Frequent Subgraphs in A Single Graph
6 pages
Efficient Frequent Connected Induced Subgraph Mining in Graphs of Bounded Tree-Width
No ratings yet
Efficient Frequent Connected Induced Subgraph Mining in Graphs of Bounded Tree-Width
16 pages
An Updown Directed Acyclic Graph Approach For Sequential Pattern Mining
No ratings yet
An Updown Directed Acyclic Graph Approach For Sequential Pattern Mining
67 pages
Graph Algorithms & Data Mining
No ratings yet
Graph Algorithms & Data Mining
7 pages
Graph Mining Techniques Overview
No ratings yet
Graph Mining Techniques Overview
23 pages
Apriori Based Novel Frequent Itemset Mining Mechanism: Issn No
No ratings yet
Apriori Based Novel Frequent Itemset Mining Mechanism: Issn No
8 pages
Graph Cluster Analysis Guide
No ratings yet
Graph Cluster Analysis Guide
48 pages
Community Detection Using Statistically Significant Subgraph Mining
No ratings yet
Community Detection Using Statistically Significant Subgraph Mining
10 pages
Understanding Graphs in Data Structures
No ratings yet
Understanding Graphs in Data Structures
34 pages
2019 BDA TKG Top-K-Subgraphs
No ratings yet
2019 BDA TKG Top-K-Subgraphs
19 pages
Adaptive XML Tree Classification On Evolving Data Streams
No ratings yet
Adaptive XML Tree Classification On Evolving Data Streams
16 pages
Graph Theory and Patterns
No ratings yet
Graph Theory and Patterns
4 pages
Unit5-Dwdm
No ratings yet
Unit5-Dwdm
58 pages
Graph Based Clustering
No ratings yet
Graph Based Clustering
78 pages
Graph Relational Data
No ratings yet
Graph Relational Data
1 page
An Introduction To Graph Data: IBM T. J. Watson Research Center Hawthorne, NY 10532
No ratings yet
An Introduction To Graph Data: IBM T. J. Watson Research Center Hawthorne, NY 10532
11 pages
Quick Sorting
No ratings yet
Quick Sorting
19 pages
Graph Algorithms: Timothy Vismor June 11, 2011
No ratings yet
Graph Algorithms: Timothy Vismor June 11, 2011
30 pages
A Comparative Study of Frequent Subgraph Mining Algorithms
No ratings yet
A Comparative Study of Frequent Subgraph Mining Algorithms
17 pages
Data Mining - Mining Sequential Patterns
No ratings yet
Data Mining - Mining Sequential Patterns
10 pages
Data Structure Algo Graph Classroom - Discussion Notes
No ratings yet
Data Structure Algo Graph Classroom - Discussion Notes
10 pages
Concepts and Techniques: Mining Sequence Patterns in Transactional Databases
No ratings yet
Concepts and Techniques: Mining Sequence Patterns in Transactional Databases
26 pages
GRAPH
No ratings yet
GRAPH
22 pages
GMAT 700 Level Questions
No ratings yet
GMAT 700 Level Questions
7 pages
Math Test
No ratings yet
Math Test
3 pages
Graph Theory Concepts and Definitions
No ratings yet
Graph Theory Concepts and Definitions
60 pages
WAP in C To Implement Minimal Spanning Tree by (B) Prim's Algorithm
No ratings yet
WAP in C To Implement Minimal Spanning Tree by (B) Prim's Algorithm
7 pages
Asi Se Dice Level 3 Student Edition
100% (2)
Asi Se Dice Level 3 Student Edition
35 pages
Using Unit Nine in CS3551
No ratings yet
Using Unit Nine in CS3551
12 pages
AI Search Metods
No ratings yet
AI Search Metods
58 pages
Binary Division Algorithm Explained
No ratings yet
Binary Division Algorithm Explained
13 pages
Lectures On Topics in Algebraic Number Theory - Ghorpade PDF
No ratings yet
Lectures On Topics in Algebraic Number Theory - Ghorpade PDF
83 pages
On Radio Heronian Mean D-Distance Number of Cycle Related Graphs
No ratings yet
On Radio Heronian Mean D-Distance Number of Cycle Related Graphs
6 pages
Course Outline
No ratings yet
Course Outline
9 pages
Unit-2 MCQ & QB
No ratings yet
Unit-2 MCQ & QB
4 pages
Ipe Revision
No ratings yet
Ipe Revision
1 page
Whole Number 2
No ratings yet
Whole Number 2
3 pages
PreCalculus 2.6-2.7 - 20241025
100% (1)
PreCalculus 2.6-2.7 - 20241025
21 pages
Number System - I
No ratings yet
Number System - I
8 pages
MATH2617 2025 Elementary Number Theory II Exam
No ratings yet
MATH2617 2025 Elementary Number Theory II Exam
3 pages
Maths 1,2,3,4,13 Sol.
No ratings yet
Maths 1,2,3,4,13 Sol.
9 pages
Math Reviewer 3.2
No ratings yet
Math Reviewer 3.2
2 pages
Graph Theory Basics and Models
No ratings yet
Graph Theory Basics and Models
64 pages
Kaprekar's Demlo Numbers
100% (1)
Kaprekar's Demlo Numbers
10 pages
Multiply and Divide Fractions Chinese New Year
No ratings yet
Multiply and Divide Fractions Chinese New Year
1 page
New 9th Math Notes (CH 1-6)
No ratings yet
New 9th Math Notes (CH 1-6)
117 pages
Ordered Pairs Cartesian Product Relations and Functions
No ratings yet
Ordered Pairs Cartesian Product Relations and Functions
16 pages
Grade 8 Maths Exam Practice Questions
No ratings yet
Grade 8 Maths Exam Practice Questions
2 pages
Adsa QB
No ratings yet
Adsa QB
5 pages
Aops Community 2012 National Olympiad First Round
No ratings yet
Aops Community 2012 National Olympiad First Round
5 pages
Singapore Math Olympiad 2012 Instructions
No ratings yet
Singapore Math Olympiad 2012 Instructions
4 pages