Data Mining Graphs and Networks

The document discusses two primary methods for frequent substructure mining: the Apriori-based approach and the Pattern-growth approach, detailing their algorithms and processes. It also covers constraint-based substructure mining, network analysis, link mining, and community mining, emphasizing the challenges and techniques involved in analyzing relationships within large datasets. The text highlights the importance of links in networks for effective data mining and the complexities of handling multi-relational data.

Uploaded by

Priyam Dey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views5 pages

Data Mining Graphs and Networks

Uploaded by

Priyam Dey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Data Mining Graphs and Networks

There are two methods for frequent substructure mining.

The Apriori-based approach: The approach to find the frequent graphs begin from the graph
with a small size. The approach advances in a bottom-up way by creating candidates with extra
vertex or edge. This algorithm is called an Apriori Graph. Let us consider Q k as the frequent
sub-structure set with a size of k. This approach acquires a level-wise mining technique. Before
the Apriori Graph technique, the generation of candidates must be done. This is done by
combining two same but slightly varied frequent subgraphs. After the formation of new
substructures, the frequency of the graph is checked. Out of that, the graphs found frequently
are used to create the next candidate. This step to generate frequent substructure candidates is a
complex step. But, when it comes to generating candidates in itemset, it is easy and effortless.
Let’s consider an example of having two itemsets of size three such
that and So, the itemset derived using join would be pqrs. But when
it comes to substructures, there is more than one method to join two substructures.
Algorithm:
This approach is based on frequent substructure mining.
Input:
F= a graph data set.
min_support= minimum support threshold
Output:
Q1,Q2,Q3,....QK,
a frequent substructure set graphs with the size range from 1 to k.
Q1 <- all the frequent 1 subgraphs in F;

while Qk-1 ≠ ∅ do
k <- 2;

Qk <- ∅;

foreach candidate l ∈ Gk do
Gk <- candidate_generation(Qk-1);

foreach Fi ∈ F do
l.count <- 0;

if isomerism_subgraph(l,Fi) then
l.count <- l.count+1;
end

if l.count ≥ min_support(F) ∧ l∉Qk then

end

Qk = Qk U l;
end
end
k <- k+1;
end
It is an iterative method in which the first candidate generation takes place followed by the
support computation. The subgraphs are generated using subgraph isomorphism. Thus frequent
subgraphs are generated by efficiently using this approach which helps in FSM. Apriori
approach uses BFS(Breadth-First Search) due to the iterative level-wise generation of
candidates. This is necessary because if you want to mine the k+1 graph, you should have
already mined till k subgraphs.
The Pattern- growth approach: This pattern-growth approach can use both BFS and
DFS(Depth First Search). DFS is preferred for this approach due to its less memory
consumption nature. Let us consider a graph h. A new graph can be formed by adding an edge
e. The edge can introduce a vertex but it is not a need. If it introduces a vertex, it can be done
in two ways, forward and backward. The Pattern-growth graph is easy but it is not that
efficient. Because there is a possibility of creating a similar graph that is already created which
leads to computation inefficiency. The duplicate graphs generated can be removed but it
increases the time and work. To avoid the creation of duplicate graphs, the frequent graphs
should be introduced very carefully and conservatively which calls the need for other
algorithms.
Algorithm:
The below algorithm is a pattern-growth-based frequent substructure mining with a simplistic
approach. If you need to search without duplicating, you must go with a different algorithm
with gSpan.
Input:
q= a frequent graph
F= a graph data set.
min_support= minimum support threshold
Output:

P <- ∅;
P = the frequent graph set

Call patterngrow_graph(q, F, min_support, P);

if q ∈ P then return;
procedure patterngrow_graph(q, F, min_support, P)

else insert q into P;

scan F once, find all the edges e such that q can be extended to q -> e;
for each frequent q -> e do
PatternGrowthGraph(q -> e, D, min_support, P);
return;
An edge e is used to extend a new graph from the old one q. The newly extend graph is
denoted as The extension can either be backward or forward.
Constraint-based substructure mining
According to the request of the user, the constraints described changes in the mining process.
But, if we generalize and categorize them into specific constraints, the mining process would
be handled easily by pushing them into the given frameworks. constraint-pushing strategy is
used in pattern growth mining tasks. Let’s see some important constraint categories.
 Subgraph containment constraint: When a user requests a pattern with specified
subgraphs, this constraint is used. This constraint is also called a set containment
constraint. The given set of subgraphs is taken as a query and then mining is done based on
the chosen data by extending the patterns from the subgraph sets. This technique can be
used to mine when the user requests patterns with specific sets of edges or vertices.
 Value- sum constraint: Here, the constraint is the sum of weights on the edges. There are
two ranges high and low. The two constraints are designated as
and The first condition is called monotonic constraint because
once the condition is satisfied, still the extension can take place by adding edges until the
next condition is satisfied. But the latter condition is called anti-monotonic
constraint because once the condition becomes satisfied, further no more extension can be
made. By this method, the constraint-pushing technique will work out well.
 Geometric Constraint: In this constraint, the angle between pair of edges within a given
range that is connected is taken. Let us consider a graph h, such
that

where E1, E2 are the edges connected at the vertex V and connected to the other two
vertices at the other two ends V 1, V2. Ah is called the anti-monotonic constraint because if
any one of the angles formed by combining two edges didn’t satisfy, it does not move to
the next level and it will never satisfy A h. It can be pushed to the edge extension process
and eliminate any extension that doesn’t satisfy A h.
Network Analysis
In the concept of network analysis, the relationship between the units is called links in a graph.
From the data mining outlook, this is called link mining or link analysis. The network is a
diversional dataset with a multi-relational concept in form of a graph. The graph is very large
with nodes as objects, edges as links which in turn denote the relationship between the nodes
or objects. Telephone networking systems, WWW( World Wide Web) are very good examples.
It also helps in filtering the datasets and providing customer-preferred services. Every network
consists of numerous nodes. The datasets are widely enormous. Thus by studying and mining
useful information from a wide group of datasets would help in solving problems and effective
transmission of data.

Link Mining

There are some conventional methods of machine learning in which taking homogeneous
objects from one relationship is taken. But in networks, this is not applicable due to a large
number of nodes and its multi-relational, heterogeneous nature. Thus the link mining has
appeared as a new field after many types of research. Link mining is the convergence of
multiple research held in graph mining, networks, hypertexts, logic programming, link
analysis, predictive analysis, and modeling. Links are nothing but the relationship between
nodes in a network. With the help of links, the mining process can be held efficiently. This
calls for the various functions to be done.
 Link-based object classification: In link mining, only attributes are not enough. Here the
links and the traits of the linked nodes are also necessary. One best example is Web-based
classification. In web-based classification, the system predicts the categorization of a
webpage based on the presence of that specified word which means the searched word
occurs on that page. Anchor text is which the person clicks the hyperlink that opens while
searching. These two things act as attributes in web-based classification. The attributes can
be anything that relates to the link and network pages.
 Link type prediction: According to the resources of the object involved, the system
predicts the motive of that link. In organizations, it helps in suggesting interactive
communication sessions between employees if needed. In the online retail market, it helps
predict what a customer prefers to buy which can increase sales and recommendations.
 Object type prediction: Here the prediction is based on the type of the object involved, its
attributes and properties, links and traits of the object linked to it. For example in the
restaurant domain, a similar method is done to predict if a customer prefers ordering food
or directly visiting the restaurant. It also helps in predicting the method of communication a
customer prefers whether by phone or mail.
 Link Cardinality estimation: In this task, there are two types of estimation. The first one
is predicting the number of links linked to an object. For example, the percentage of the
authority of a web page can be calculated by finding the number of links linked to it which
is called in-links. Web pages that act as a hub which means a set of web pages denotes
other links which come under the same topic can be identified using out-links. For
example, when a pandemic strikes, finding the links of the affected patient can lead us to
the other patients which helps in the control of the transmission. The second one is done by
predicting the number of objects outreaching along a route from an object. This method is
crucial in estimating the object number returned as output by a query.
 Predicting link existence: In link type prediction, the type of the link is predicted. But,
here the system predicts whether a link exists between two objects. For an instance, this
task is used to predict if a link exists between two web pages.
 Object Reconciliation: In this method, the function is to predict if any two objects are the
same on the basis of their attributes or traits or links. This method is also called identity
uncertainty or record linkage. This task has it’s the same procedure in the matching of
citation, extraction of details, getting rid of duplicates, consolidating objects. For an
instance, this task is to help if one website is reflecting the other website like a mirror to
each other.

Challenges in Link Mining

 Statistical compared to logical dependencies: The logical relationship between objects is

denoted by graph-link structures. The statistical relationship is denoted by probabilistic
dependencies. The rational handling of these two dependencies is difficult in data mining
which is multi-relational. One must be careful enough to find the logical dependencies
between objects along with probabilistic relationships between attributes. These
dependencies take a large amount of space which complicates the mathematical model
deployed.
 Collective classification and consolidation: Let us consider a training model based on
objects that are class-labeled. In conventional classification, classification is only done
based on the attribute.

 If there is a chance that classification occurs after giving training with unlabeled objects,
the model becomes incapable of classification due to the complications of the correlations
of the objects. This calls for the need for another supplementary iterative step which
consolidates the labels of objects based on the labels of objects linked to it. Here collective
classification takes place.
 Constructive use of labeled and unlabeled data: One emerging technique is to merge
both labeled and unlabeled data. Unlabeled data assist in identifying the distribution of
attributes. The links that are present in unlabeled data help us in extracting the linked
object’s attributes. The links that are present between unlabeled and labeled data help in
establishing dependencies which increases the efficiency in interference.
 Open compared to closed-world assumptions: In the conventional method, it is assumed
that we know all the possible objects/ entities present in the domain which is closed-world
assumptions. But, closed world assumption is impractical in the application of reality. This
calls for the introduction of specific language for probability distributions with respect to
relational objects that contains a varied set of objects.

Community Mining

Network analysis includes the finding of objects which are in groups that share similar
attributes. This process is known as community mining. In the web page linkage, the
introduction of community where a group of web pages is made which follow a common
theme. Many community mining algorithms decide that there is only one network and it tries to
establish a homogeneous relationship. But in the real world web pages, there are multiple
networks with heterogeneous relationships. This proves the need for multi-relational
community mining.

Data Mining-Graph Mining
No ratings yet
Data Mining-Graph Mining
9 pages
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
No ratings yet
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
37 pages
Graph Pattern Mining, Search and OLAP
No ratings yet
Graph Pattern Mining, Search and OLAP
14 pages
Unit 4
No ratings yet
Unit 4
78 pages
A Graph Mining Approach For Ranking and Discovering The Interesting Frequent Subgraph Patterns
No ratings yet
A Graph Mining Approach For Ranking and Discovering The Interesting Frequent Subgraph Patterns
17 pages
11 Graph Pattern Mining
No ratings yet
11 Graph Pattern Mining
71 pages
DM Unit 2 Topics
No ratings yet
DM Unit 2 Topics
12 pages
GraphMining 04 FrequentSubgraph
No ratings yet
GraphMining 04 FrequentSubgraph
61 pages
Graph Mining Algos in Object Tracking
No ratings yet
Graph Mining Algos in Object Tracking
2 pages
Graph Mining: Anuraj Mohan 13MZ01, CSED
No ratings yet
Graph Mining: Anuraj Mohan 13MZ01, CSED
50 pages
Mining Concepts Apriori Frequent Pattern
No ratings yet
Mining Concepts Apriori Frequent Pattern
6 pages
Graph Mining
No ratings yet
Graph Mining
46 pages
Pattern Mining Current Challenges and Op
No ratings yet
Pattern Mining Current Challenges and Op
16 pages
Apriori Based Novel Frequent Itemset Mining Mechanism: Issn No
No ratings yet
Apriori Based Novel Frequent Itemset Mining Mechanism: Issn No
8 pages
DM 5th Unit
No ratings yet
DM 5th Unit
54 pages
4 IJAEST Vol No.4 Issue No.2 Classification of Approaches and Challenges of Frequent Subgraphs Mining in Biological Networks 014 017
No ratings yet
4 IJAEST Vol No.4 Issue No.2 Classification of Approaches and Challenges of Frequent Subgraphs Mining in Biological Networks 014 017
4 pages
Introduction to Graph Mining Techniques
No ratings yet
Introduction to Graph Mining Techniques
59 pages
Graph Mining Applications in Bioinformatics
No ratings yet
Graph Mining Applications in Bioinformatics
22 pages
U1 - Data Mining Task Primitives
No ratings yet
U1 - Data Mining Task Primitives
4 pages
Mining Frequent Subgraph Patterns From Uncertain Graph Data
No ratings yet
Mining Frequent Subgraph Patterns From Uncertain Graph Data
16 pages
Graph Mining Techniques and Tools Overview
No ratings yet
Graph Mining Techniques and Tools Overview
3 pages
Mobile Miner: Community Mining Algorithm
No ratings yet
Mobile Miner: Community Mining Algorithm
20 pages
DM Clustering UNIT4
No ratings yet
DM Clustering UNIT4
36 pages
NGDM07 Philip Yu
No ratings yet
NGDM07 Philip Yu
22 pages
Unit 4 - Part 2
No ratings yet
Unit 4 - Part 2
45 pages
Graph Subpath Mining Algorithm
No ratings yet
Graph Subpath Mining Algorithm
12 pages
Advanced Databases and Mining Unit 3
No ratings yet
Advanced Databases and Mining Unit 3
30 pages
Data Mining: Key Issues and Tasks
No ratings yet
Data Mining: Key Issues and Tasks
5 pages
Modern Association Rule Mining Methods
No ratings yet
Modern Association Rule Mining Methods
9 pages
Graph Mining: Techniques & Applications
No ratings yet
Graph Mining: Techniques & Applications
8 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
Graph Mining: A Survey of Graph Mining Techniques: August 2012
No ratings yet
Graph Mining: A Survey of Graph Mining Techniques: August 2012
6 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
52 pages
Data Mining Task Primitives and Major Issues
No ratings yet
Data Mining Task Primitives and Major Issues
18 pages
Introduction to Web Mining Techniques
No ratings yet
Introduction to Web Mining Techniques
12 pages
Clustering Is The Process of Organizing Objects Into Groups Whose Members Are
No ratings yet
Clustering Is The Process of Organizing Objects Into Groups Whose Members Are
6 pages
Lecture 5 - FP-Growth Algorithm
No ratings yet
Lecture 5 - FP-Growth Algorithm
26 pages
Modeling Relational Data As Graphs For Mining
No ratings yet
Modeling Relational Data As Graphs For Mining
6 pages
Graph Mining Techniques Overview
No ratings yet
Graph Mining Techniques Overview
23 pages
Data Mining
No ratings yet
Data Mining
35 pages
Data Mining Tasks & Architecture
No ratings yet
Data Mining Tasks & Architecture
5 pages
Unit - 5
No ratings yet
Unit - 5
12 pages
Dmbi Unit-4
No ratings yet
Dmbi Unit-4
18 pages
2 Unit DM K Raj Kuamr
No ratings yet
2 Unit DM K Raj Kuamr
26 pages
Data Mining vs. Traditional Databases
No ratings yet
Data Mining vs. Traditional Databases
36 pages
Unit3 Data Mining
No ratings yet
Unit3 Data Mining
3 pages
Graph-Based Methods For Transaction Databases: A Comparative Study
No ratings yet
Graph-Based Methods For Transaction Databases: A Comparative Study
10 pages
Co So Du Lieu Do Thi
No ratings yet
Co So Du Lieu Do Thi
46 pages
Unit5-Dwdm
No ratings yet
Unit5-Dwdm
58 pages
Data Mining Unitwise Imp Questions
No ratings yet
Data Mining Unitwise Imp Questions
3 pages
Unit-IV Cluster Outlier Analysis
No ratings yet
Unit-IV Cluster Outlier Analysis
21 pages
Generic Pattern Mining
No ratings yet
Generic Pattern Mining
17 pages
1.data Mining Functionalities
No ratings yet
1.data Mining Functionalities
14 pages
Frequent Pattern Based Clustering Methods
No ratings yet
Frequent Pattern Based Clustering Methods
23 pages
Top 5 Data Mining Techniques Explained
No ratings yet
Top 5 Data Mining Techniques Explained
3 pages
Advanced Pattern Mining Guide
No ratings yet
Advanced Pattern Mining Guide
62 pages
Understanding Data Mining Techniques
No ratings yet
Understanding Data Mining Techniques
3 pages
Clustering Algorithms Overview
No ratings yet
Clustering Algorithms Overview
6 pages
FastEtch A Fast Sketch-Based Assembler For Genomes
No ratings yet
FastEtch A Fast Sketch-Based Assembler For Genomes
16 pages
Chatbot and Image Recognization
No ratings yet
Chatbot and Image Recognization
8 pages
CA3 - Sample Question
No ratings yet
CA3 - Sample Question
3 pages
2386arrest Person FIR On 27.02.2025
No ratings yet
2386arrest Person FIR On 27.02.2025
1 page
Blockchain
No ratings yet
Blockchain
4 pages
SWSN Unit-5
No ratings yet
SWSN Unit-5
31 pages
10 1108 - JSM 05 2022 0164
No ratings yet
10 1108 - JSM 05 2022 0164
15 pages
Binmakhashen 2020
No ratings yet
Binmakhashen 2020
27 pages
The Relationship Between Project Risk, Complexity and Quality Management
No ratings yet
The Relationship Between Project Risk, Complexity and Quality Management
4 pages
Conway Et Al. (2016)
No ratings yet
Conway Et Al. (2016)
26 pages
3 s2.0 B9780080970868730558 Main
No ratings yet
3 s2.0 B9780080970868730558 Main
6 pages
Social Network Analysis Key Questions
No ratings yet
Social Network Analysis Key Questions
2 pages
Hagenbund Beitrag Mkaiserpdf Engl-1
No ratings yet
Hagenbund Beitrag Mkaiserpdf Engl-1
9 pages
Ethics in Marketing Understanding Past Trends and Future Prospects Through Bibliometric Analysis
No ratings yet
Ethics in Marketing Understanding Past Trends and Future Prospects Through Bibliometric Analysis
33 pages
Peters, Romero - 2019 - 09 - Lifelong Learning Ecologies in Online Higher Education Students' Engagement in The Continuum Between Formal
No ratings yet
Peters, Romero - 2019 - 09 - Lifelong Learning Ecologies in Online Higher Education Students' Engagement in The Continuum Between Formal
15 pages
Social Networks for Managers
0% (1)
Social Networks for Managers
8 pages
Actionable Knowledge and Collective Practice: Formal Organization
No ratings yet
Actionable Knowledge and Collective Practice: Formal Organization
5 pages
Aimlsyll - Syllabus For 7th Sem Aimlsyll - Syllabus For 7th Sem
No ratings yet
Aimlsyll - Syllabus For 7th Sem Aimlsyll - Syllabus For 7th Sem
50 pages
Scalable Graph Analytics For Social Network Analysis Using Spark Final
No ratings yet
Scalable Graph Analytics For Social Network Analysis Using Spark Final
9 pages
Using Organizational Network Analysis To Improve Integration Across Organizational Boundaries
100% (1)
Using Organizational Network Analysis To Improve Integration Across Organizational Boundaries
6 pages
Social Network Analysis Basics
No ratings yet
Social Network Analysis Basics
43 pages
Wegman Report
No ratings yet
Wegman Report
91 pages
Advanced AI and ML Course Overview
No ratings yet
Advanced AI and ML Course Overview
49 pages
Research Proposal KV 26-09-2017
100% (1)
Research Proposal KV 26-09-2017
14 pages
Course Title Course Number
No ratings yet
Course Title Course Number
15 pages
SNA Lesson Plan
No ratings yet
SNA Lesson Plan
6 pages
Telecom Social Network Analysis Using SNA
No ratings yet
Telecom Social Network Analysis Using SNA
18 pages
Summer Internship Reports DSA Using C++
No ratings yet
Summer Internship Reports DSA Using C++
40 pages
SUNBELT XXX Keynotes Speakers and Organizers 2010
No ratings yet
SUNBELT XXX Keynotes Speakers and Organizers 2010
789 pages
Mod 4 WAD
No ratings yet
Mod 4 WAD
39 pages
Social Network Analysis Quiz
No ratings yet
Social Network Analysis Quiz
3 pages
Wsma Unit 5 Notes
0% (1)
Wsma Unit 5 Notes
13 pages
Visualizer Overview
No ratings yet
Visualizer Overview
2 pages
Social Network Analysis Report
100% (1)
Social Network Analysis Report
51 pages
SNA Investigator Guide
No ratings yet
SNA Investigator Guide
56 pages