Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009
In the early years of data mining and knowledge discovery in databases, method development focused on rigidly and plainly structured data. Most often efforts were even confined to data that can be represented as a simple table, which describes a set of sample cases by attribute-value pairs. Recent years, however, have seen a constantly growing interest in the analysis of more complex data, with a less rigid and/or more sophisticated structure.
ACM SIGKDD Explorations Newsletter, 2003
The need for mining structured data has increased in the past few years. One of the best studied data structures in computer science and discrete mathematics are graphs. It can therefore be no surprise that graph based data mining has become quite popular in the last few years.This article introduces the theoretical basis of graph based data mining and surveys the state of the art of graph-based data mining. Brief descriptions of some representative approaches are provided as well.
IEEE Intelligent Systems, 2000
at Arlington THE LARGE AMOUNT OF DATA collected today is quickly overwhelming researchers' abilities to interpret the data and discover interesting patterns in it. In response to this problem, researchers have developed techniques and systems for discovering concepts in databases. 1-3 Much of the collected data, however, has an explicit or implicit structural component (spatial or temporal), which few discovery systems are designed to handle. 4 So, in addition to the need to accelerate data mining of large databases, there is an urgent need to develop scalable tools for discovering concepts in structural databases. One method for discovering knowledge in structural data is the identification of common substructures within the data. Substructure discovery is the process of identifying concepts describing interesting and repetitive substructures within structural data. The discovered substructure concepts allow abstraction from the detailed data structure and provide relevant attributes for interpreting the data. The substructure discovery method is the basis of Subdue, which performs data mining on databases represented as graphs. The system performs two key data-mining techniques: unsupervised pattern discovery and supervised concept learning from examples. Our test applications have demonstrated the scalability and effectiveness of these techniques on a variety of structural databases.
2016
Graph theory is becoming progressively important as it is applied to other fields of mathematics, science and technology. It is being actively used in areas as varied as biochemistry, electrical engineering, computer science and operations research. The main application of graph theory in data mining is graph mining. The need for mining structured data has increased in the past few years. Graphs are one of the best studied data structures in computer science and discrete mathematics. The relational aspect of data is explained by graph mining. The main aim of graph mining is to provide new principles and effective algorithms to mine topological substructures embedded in graph data. This article provides a brief review on four theoretical based approaches of graph based data mining. Brief description of application of graph mining is also provided.
Advanced Information and Knowledge Processing
We describe an approach to learning patterns in relational data represented as a graph. The approach, implemented in the Subdue system, searches for patterns that maximally compress the input graph. Subdue can be used for supervised learning, as well as unsupervised pattern discovery and clustering. Mining graph-based data raises challenges not found in linear attribute-value data. However, additional requirements can further complicate the problem. In particular, we describe how Subdue can incrementally process structured data that arrives as streaming data. We also employ these techniques to learn structural concepts from examples embedded in a single large connected graph.
Mathematical Problems in Engineering, 2014
Due to rapid development of the Internet technology and new scientific advances, the number of applications that model the data as graphs increases, because graphs have highly expressive power to model a complicated structure. Graph mining is a wellexplored area of research which is gaining popularity in the data mining community. A graph is a general model to represent data and has been used in many domains such as cheminformatics, web information management system, computer network, and bioinformatics, to name a few. In graph mining the frequent subgraph discovery is a challenging task. Frequent subgraph mining is concerned with discovery of those subgraphs from graph dataset which have frequent or multiple instances within the given graph dataset. In the literature a large number of frequent subgraph mining algorithms have been proposed; these included FSG, AGM, gSpan, CloseGraph, SPIN, Gaston, and Mofa. The objective of this research work is to perform quantitative comparison of the above listed techniques. The performances of these techniques have been evaluated through a number of experiments based on three different state-of-the-art graph datasets. This novel work will provide base for anyone who is working to design a new frequent subgraph discovery technique.
Proceedings of the 3rd workshop on Ph.D. students in information and knowledge management - PIKM '10, 2010
Frequent subgraph mining is an important problem in data mining with wide application in science. For instance, graphs can be used to represent structural relationships in problems related to network topology, chemical compound, protein structures, and so on. Searching for patterns from graph databases is difficult since graph-related operations generally have higher time complexity than equivalent operations on frequent itemsets. From a practical standpoint, databases keep growing with lots of opportunities and need to mine graphs. Even though there is a significant body of work on graph mining, most techniques work outside the database system. Programming frequent graph mining in SQL is more difficult than traditional approaches because the graph must be represented as a table and algorithmic steps must be written as relational queries. In our research, we study three fundamental problems under a database approach: graph storage and indexing, frequent subgraph search, and identifying subgraph isomorphism. We outline main research issues and our solution towards solving them. We also present preliminary experimental validation focusing on query optimizations and time complexity.
Proc. Workshop on Data Mining for …, 2007
Diverse types of data are associated with proteins, including network and categorical data. While graph mining techniques have long focused on data with no more than one label per node, generalizations have recently been developed. We show that existing generalizations are not well suited to typical biological networks and are likely to return few or no results on protein regulatory networks. They are, furthermore, ill-suited to graphs that are dense or show the small world property, which are typical features of biological networks. A graph-relational edge disjoint instance mining algorithm (GR-EDI) is presented that resolves these problems. Our algorithm treats bipartite edges separately and only constrains unipartite edges to be disjoint. We introduce a new pattern constraint that recovers the downward closure property. The algorithm uses a search lattice traversal strategy that allows more effective mining of graphs that cannot be considered as sparse due to hubs. Effectiveness is demonstrated for a real biological example. While existing techniques return few or no patterns, GR-EDI is able to extract many patterns.
Data Engineering, 2009. ICDE'09. IEEE …, 2009
Graphs are being increasingly used to model a wide range of scientific data. Such widespread usage of graphs has generated considerable interest in mining patterns from graph databases. While an array of techniques exists to mine frequent patterns, we still lack a scalable approach to mine statistically significant patterns, specifically patterns with low p-values, that occur at low frequencies. We propose a highly scalable technique, called GraphSig, to mine significant subgraphs from large graph databases. We convert each graph into a set of feature vectors where each vector represents a region within the graph. Domain knowledge is used to select a meaningful feature set. Prior probabilities of features are computed empirically to evaluate statistical significance of patterns in the feature space. Following analysis in the feature space, only a small portion of the exponential search space is accessed for further analysis. This enables the use of existing frequent subgraph mining techniques to mine significant patterns in a scalable manner even when they are infrequent. Extensive experiments are carried out on the proposed techniques, and empirical results demonstrate that GraphSig is effective and efficient for mining significant patterns. To further demonstrate the power of significant patterns, we develop a classifier using patterns mined by GraphSig. Experimental results show that the proposed classifier achieves superior performance, both in terms of quality and computation cost, over state-of-the-art classifiers.
2012
Association rule mining is a function of data mining research domain and frequent pattern mining is an essential part of it. Most of the previous studies on mining frequent patterns based on an Apriori approach, which required more number of database scans and operations for counting pattern supports in the database. Since the size of each set of transaction may be massive that it makes difficult to perform traditional data mining tasks. This research intends to propose a graph structure that captures only those itemsets that needs to define a sufficiently immense dataset into a submatrix representing important weights and does not give any chance to outliers. We have devised a strategy that covers significant facts of data by drilling down the large data into a succinct form of an Adjacency Matrix at different stages of mining process. The graph structure is so designed that it can be easily maintained and the trade off in compressing the large data values is reduced. Experimental ...
Seventh International Conference on Digital Information Management (ICDIM 2012), 2012
Data mining is comprised of many data analysis techniques. Its basic objective is to discover the hidden and useful data pattern from very large set of data. Graph mining, which
Advances in Database Systems, 2010
Graph mining and management has become an important topic of research recently because of numerous applications to a wide variety of data mining problems in computational biology, chemical data analysis, drug discovery and communication networking. Traditional data mining and management algorithms such as clustering, classification, frequent pattern mining and indexing have now been extended to the graph scenario. This book contains a number of chapters which are carefully chosen in order to discuss the broad research issues in graph management and mining. In addition, a number of important applications of graph mining are also covered in the book. The purpose of this chapter is to provide an overview of the different kinds of graph processing and mining techniques, and the coverage of these topics in this book.
Graphs become increasingly important in modeling complicated structures, such as circuits, images, chemical compounds, protein structures, biological networks, social networks, the web, workflows, and XML documents. Many graph search algorithms have been developed in chemical informatics, computer vision, video indexing and text retrieval with the increasing demand on the analysis of large amounts of structured data; graph mining has become an active and important theme in data mining.
INTERNATIONAL JOURNAL OF ENGINEERING DEVELOPMENT AND RESEARCH (IJEDR) (ISSN:2321-9939), 2014
Aim of Data Mining is to extract significant and Useful knowledge from the Data. Data Stored in the database may be any type such as text, images and videos so on. Due to increasing the size of the data and storing such data is becoming complex. Data mining algorithms are facing the challenges for storing such data. Graph become more important in storing and visualizing this complicated data (i.e. chemical datasets, biological dataset, XML datasets, Social networks datasets, the web datasets etc.) In this paper it is discussed about different Algorithm used for Graph Mining and different techniques used for Graph Mining.
2011
Abstract The Eighth Workshop on Mining and Learning with Graphs (MLG) 1was held at KDD 2010 in Washington DC. It brought together a variete of researchers interested in analyzing data that is best represented as a graph. Examples include the WWW, social networks, biological networks, communication networks, and many others. The importance of being able to effectively mine and learn from such data is growing, as more and more structured and semi-structured data is becoming available.
2008
This paper presents our investigation into graph mining methods to help users understand large graphs. Our approach is a two-step process: First calculate subgraph labels and then calculate distribution statistics on these labels. Our approach is flexible in that it can identify a range of patterns from very abstract to very specific (e.g., isomorphisms). The statistics that we calculate can be used to find rare and common patterns, patterns that are (dis)similar to the distribution of induced subgraphs of the same size, patterns that are (dis)similar to each other, as well as variance of graph patterns given a specific set of input node types. We also investigate a method to understand structural characteristics by analyzing clusters that are created by "collapsing" overlapping instances of user-specified patterns. We evaluated our approach on two publicly available networks-the Texas CS web-site from WebKB and the internet movie database.
International Journal on Artificial Intelligence Tools, 2005
Much of current data mining research is focused on discovering sets of attributes that discriminate data entities into classes, such as shopping trends for a particular demographic group. In contrast, we are working to develop data mining techniques to discover patterns consisting of complex relationships between entities. Our research is particularly applicable to domains in which the data is event-driven or relationally structured. In this paper we present approaches to address two related challenges; the need to assimilate incremental data updates and the need to mine monolithic datasets. Many realistic problems are continuous in nature and therefore require a data mining approach that can evolve discovered knowledge over time. Similarly, many problems present data sets that are too large to fit into dynamic memory on conventional computer systems. We address incremental data mining by introducing a mechanism for summarizing discoveries from previous data increments so that the g...
2012
Abstract Graph mining is an important research area within the domain of data mining. The field of study concentrates on the identification of frequent subgraphs within graph data sets. The research goals are directed at:(i) effective mechanisms for generating candidate subgraphs (without generating duplicates) and (ii) how best to process the generated candidate subgraphs so as to identify the desired frequent subgraphs in a way that is computationally efficient and procedurally effective.
We discuss the problem of extending data mining approaches to cases in which data points arise in the form of individual graphs. Being able to find the intrinsic low-dimensionality in ensembles of graphs can be useful in a variety of modeling contexts, especially when coarse-graining the detailed graph information is of interest. One of the main challenges in mining graph data is the definition of a suitable pairwise similarity metric in the space of graphs. We explore two practical solutions to solving this problem: one based on finding subgraph densities, and one using spectral information. The approach is illustrated on three test data sets (ensembles of graphs); two of these are obtained from standard graph generating algorithms, while the graphs in the third example are sampled as dynamic snapshots from an evolving network simulation.
IFIP The International Federation for Information Processing, 2007
A b s t r a c t . Mining frequent subgraphs is an area of research where we have a given set of graphs, and where we seeirch for (connected) subgraphs contained in many of these graphs. Each graph can be seen as a transaction, or as a molecule -as the techniques applied in this paper are used in (bio)chemical analysis. In this work we will discuss an application that enables the user to further explore the results from a frequent subgraph mining algorithm. Such an algorithm gives the frequent subgraphs, also referred to as fragments, in the graphs in the dataset. Next to frequent subgraphs the algorithm also provides a lattice that models sub-and supergraph relations among the fragments, which can be explored with our application. The lattice can also be used to group fragments by means of clustering algorithms, and the user can easily browse from group to group. The application can also display only a selection of groups that occur in almost the same set of molecules, or on the contrary in different molecules. This allows one to see which patterns cover different or similar parts of the dataset.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.