Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009, e-Science, 2009. e-Science'09. …
Complex networks from domains like Biology or Sociology are present in many e-Science data sets. Dealing with networks can often form a workflow bottleneck as several related algorithms are computationally hard. One example is detecting characteristic patterns or "network motifs" -a problem involving subgraph mining and graph isomorphism. This paper provides a review and runtime comparison of current motif detection algorithms in the field. We present the strategies and the corresponding algorithms in pseudo-code yielding a framework for comparison. We categorize the algorithms outlining the main differences and advantages of each strategy. We finally implement all strategies in a common platform to allow a fair and objective efficiency comparison using a set of benchmark networks. We hope to inform the choice of strategy and critically discuss future improvements in motif detection.
BMC …, 2009
Background: Complex networks are studied across many fields of science and are particularly important to understand biological processes. Motifs in networks are small connected sub-graphs that occur significantly in higher frequencies than in random networks. They have recently gathered much attention as a useful concept to uncover structural design principles of complex networks. Existing algorithms for finding network motifs are extremely costly in CPU time and memory consumption and have practically restrictions on the size of motifs.
Frontiers of Computer Science in China, 2009
Despite several algorithms for searching subgraphs in motif detection presented in the literature, no effort has been done for characterizing their performance till now. This paper presents a methodology to evaluate the performance of three algorithms: edge sampling algorithm (ESA), enumerate subgraphs (ESU) and randomly enumerate subgraphs (RAND-ESU). A series of experiments are performed to test sampling speed and sampling quality. The results show that RAND-ESU is more efficient and has less computational cost than other algorithms for large-size motif detection, and ESU has its own advantage in small-size motif detection.
Briefings in Bioinformatics, 2012
Network motifs are statistically overrepresented sub-structures (sub-graphs) in a network, and have been recognized as 'the simple building blocks of complex networks'. Study of biological network motifs may reveal answers to many important biological questions. The main difficulty in detecting larger network motifs in biological networks lies in the facts that the number of possible sub-graphs increases exponentially with the network or motif size (node counts, in general), and that no known polynomial-time algorithm exists in deciding if two graphs are topologically equivalent. This article discusses the biological significance of network motifs, the motivation behind solving the motif-finding problem, and strategies to solve the various aspects of this problem. A simple classification scheme is designed to analyze the strengths and weaknesses of several existing algorithms. Experimental results derived from a few comparative studies in the literature are discussed, with conclusions that lead to future research directions. Exact Search MAVisto [28, 42] Grochow and Kellis [21] NeMoFinder [43] Kavosh [20] Sampling MFinder [39, 44] MODA [27] FANMOD [40, 45]
Journal of Data Mining in Genomics & Proteomics, 2016
Network motif is a pattern of inter-connections occurring in complex network in numbers that are significantly higher than those in similar randomized network. The basic premise of finding network motifs lie in the ability to compute the frequency of the subgraphs. In order to discover network motif, one has to compute a subgraph census on the original network that calculates the frequency of all the subgraphs of certain type. Then there is a need to compute the frequency of a set of subgraphs on the randomized similar network. The bottleneck of the entire motif discovery process is therefore to compute the subgraph frequencies and this is the core computational problem. The proposed work is to present the Suffix-Graph, a data structure that store graphs efficiently and to design an algorithm to retrieve subgraph efficiently that detects network motifs and apply them to transcriptional interactions in Escherichia coli.
Proceedings of the 2010 ACM Symposium on …, 2010
In this paper we propose a novel specialized data structure that we call g-trie, designed to deal with collections of subgraphs. The main conceptual idea is akin to a prefix tree in the sense that we take advantage of common topology by constructing a multiway tree where the descendants of a node share a common substructure. We give algorithms to construct a g-trie, to list all stored subgraphs, and to find occurrences on another graph of the subgraphs stored in the g-trie. We evaluate the implementation of this structure and its associated algorithms on a set of representative benchmark biological networks in order to find network motifs. To assess the efficiency of our algorithms we compare their performance with other known network motif algorithms also implemented in the same common platform. Our results show that indeed, g-tries are a feasible, adequate and very efficient data structure for network motifs discovery, clearly outperforming previous algorithms and data structures.
—Algorithms for mining very large graphs, such as those representing online social networks, to discover the relative frequency of small subgraphs within them are of high interest to sociologists, computer scientists and marketeers alike. However, the computation of these network motif statistics via naive enumeration is infeasible for either its prohibitive computational costs or access restrictions on the full graph data. Methods to estimate the motif statistics based on random walks by sampling only a small fraction of the subgraphs in the large graph address both of these challenges. In this paper, we present a new algorithm, called the Waddling Random Walk (WRW), which estimates the concentration of motifs of any size. It derives its name from the fact that it sways a little to the left and to the right, thus also sampling nodes not directly on the path of the random walk. The WRW algorithm achieves its computational efficiency by not trying to enumerate subgraphs around the random walk but instead using a randomized protocol to sample subgraphs in the neighborhood of the nodes visited by the walk. In addition, WRW achieves significantly higher accuracy (measured by the closeness of its estimate to the correct value) and higher precision (measured by the low variance in its estimations) than the current state-of-the-art algorithms for mining subgraph statistics. We illustrate these advantages in speed, accuracy and precision using simulations on well-known and widely used graph datasets representing real networks.
Finding motifs in biological, social, technological, and other types of networks has become a widespread method to gain more knowledge about these networks' structure and function. However, this task is very computationally demanding, because it is highly associated with the graph isomorphism which is an NP problem (not known to belong to P or NPcomplete subsets yet). Accordingly, this research is endeavoring to decrease the need to call NAUTY isomorphism detection method, which is the most time-consuming step in many existing algorithms. The work provides an extremely fast motif detection algorithm called QuateXelero, which has a Quaternary Tree data structure in the heart. The proposed algorithm is based on the well-known ESU (FANMOD) motif detection algorithm. The results of experiments on some standard model networks approve the overal superiority of the proposed algorithm, namely QuateXelero, compared with two of the fastest existing algorithms, G-Tries and Kavosh. QuateXelero is especially fastest in constructing the central data structure of the algorithm from scratch based on the input network.
Advances in Science, Technology and Engineering Systems Journal, 2018
Network motif analysis has several applications in many different fields such as biological study and social network modeling, yet motif detection tools are still limited by the intensive computation. Currently, there are two categories for network motif detection method: network-centric and motif-centric approach. While most network-centric algorithms excel in enumerating all potential motifs of a given size, the runtime is infeasible for larger size of motifs. Researchers who are interested in larger motifs and have established a set of potential motif patterns could utilize motif-centric tools to check whether such patterns are truly network motifs by mapping them to the target network and counting their frequency. In the paper, we present NemoMap (Network Motif Mapping algorithm) which is an improvement of the motif-centric algorithm, GK (by Grochow and Kellis) and MODA (Motif Detection Algorithm). Experimental results on three different protein-protein interaction networks show that NemoMap is more efficient in mapping complex motif patterns, while GK and MODA is much faster in analyzing simpler patterns with fewer edges. We also compare the performance of NemoMap and ParaMODA (introduced previously to improve MODA), and the result shows that NemoMap yields better runtime due to the implementation of Grochow-Kellis' symmetry-breaking technique and the better node selection process.
Computational and Mathematical Methods in Medicine, 2012
Network motifs, overrepresented small local connection patterns, are assumed to act as functional meaningful building blocks of a network and, therefore, received considerable attention for being useful for understanding design principles and functioning of networks. We present an extension of the original approach to network motif detection in single, directed networks without vertex labeling to the case of a sample of directed networks with pairwise different vertex labels. A characteristic feature of this approach to network motif detection is that subnetwork counts are derived from the whole sample and the statistical tests are adjusted accordingly to assign significance to the counts. The associated computations are efficient since no simulations of random networks are involved. The motifs obtained by this approach also comprise the vertex labeling and its associated information and are characteristic of the sample. Finally, we apply this approach to describe the intricate topo...
BMC Bioinformatics, 2013
Background: Motif discovery is the problem of finding recurring patterns in biological data. Patterns can be sequential, mainly when discovered in DNA sequences. They can also be structural (e.g. when discovering RNA motifs). Finding common structural patterns helps to gain a better understanding of the mechanism of action (e.g. post-transcriptional regulation). Unlike DNA motifs, which are sequentially conserved, RNA motifs exhibit conservation in structure, which may be common even if the sequences are different. Over the past few years, hundreds of algorithms have been developed to solve the sequential motif discovery problem, while less work has been done for the structural case. Methods: In this paper, we survey, classify, and compare different algorithms that solve the structural motif discovery problem, where the underlying sequences may be different. We highlight their strengths and weaknesses. We start by proposing a benchmark dataset and a measurement tool that can be used to evaluate different motif discovery approaches. Then, we proceed by proposing our experimental setup. Finally, results are obtained using the proposed benchmark to compare available tools. To the best of our knowledge, this is the first attempt to compare tools solely designed for structural motif discovery. Results: Results show that the accuracy of discovered motifs is relatively low. The results also suggest a complementary behavior among tools where some tools perform well on simple structures, while other tools are better for complex structures. Conclusions: We have classified and evaluated the performance of available structural motif discovery tools. In addition, we have proposed a benchmark dataset with tools that can be used to evaluate newly developed tools.
2012 IEEE 12th International Conference on Data Mining Workshops, 2012
Unexpectedly frequent subgraphs, known as motifs, can help in characterizing the structure of complex networks. Most of the existing methods for finding motifs are designed for unweighted networks, where only the existence of connection between nodes is considered, and not their strength or capacity. However, in many real world networks, edges contain more information than just simple node connectivity. In this paper, we propose a new method to incorporate edge weight information in motif mining. We think of a motif as a subgraph that contains unexpected information, and we define a new significance measurement to assess this subgraph exceptionality. The proposed metric embeds the weight distribution in subgraphs and it is based on weight entropy. We use the gtrie data structure to find instances of k-sized subgraphs and to calculate its significance score. Following a statistical approach, the random entropy of subgraphs is then calculated, avoiding the time consuming step of random network generation. The discrimination power of the derived motif profile by the proposed method is assessed against the results of the traditional unweighted motifs through a graph classification problem. We use a set of labeled ego networks of co-authorship in the biology and mathematics fields. The new proposed method is shown to be feasible, achieving even slightly better accuracy. Since it does not require the generation of random networks, it is also computationally faster, and because we are able to use the weight information in computing the motif importance, we can avoid converting weighted networks into unweighted ones.
Briefings in bioinformatics, 2014
Network motif detection is the search for statistically overrepresented subgraphs present in a larger target network. They are thought to represent key structure and control mechanisms. Although the problem is exponential in nature, several algorithms and tools have been developed for efficiently detecting network motifs. This work analyzes 11 network motif detection tools and algorithms. Detailed comparisons and insightful directions for using these tools and algorithms are discussed. Key aspects of network motif detection are investigated. Network motif types and common network motifs as well as their biological functions are discussed. Applications of network motifs are also presented. Finally, the challenges, future improvements and future research directions for network motif detection are also discussed.
Lecture Notes in Computer Science, 2007
The study of biological networks and network motifs can yield significant new insights into systems biology. Previous methods of discovering network motifs -network-centric subgraph enumeration and sampling -have been limited to motifs of 6 to 8 nodes, revealing only the smallest network components. New methods are necessary to identify larger network sub-structures and functional motifs.
Bioinformatics and Biology Insights, 2015
The detection of network motifs has recently become an important part of network analysis across all disciplines. In this work, we detected and analyzed network motifs from undirected and directed networks of several different disciplines, including biological network, social network, ecological network, as well as other networks such as airlines, power grid, and co-purchase of political books networks. Our analysis revealed that undirected networks are similar at the basic three and four nodes, while the analysis of directed networks revealed the distinction between networks of different disciplines. The study showed that larger motifs contained the three-node motif as a subgraph. Topological analysis revealed that similar networks have similar small motifs, but as the motif size increases, differences arise. Pearson correlation coefficient showed strong positive relationship between some undirected networks but inverse relationship between some directed networks. The study suggests that the three-node motif is a building block of larger motifs. It also suggests that undirected networks share similar low-level structures. Moreover, similar networks share similar small motifs, but larger motifs define the unique structure of individuals. Pearson correlation coefficient suggests that protein structure networks, dolphin social network, and co-authorships in network science belong to a superfamily. In addition, yeast protein-protein interaction network, primary school contact network, Zachary's karate club network, and copurchase of political books network can be classified into a superfamily.
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2014
Network motif algorithms have been a topic of research mainly after the 2002-seminal paper from Milo et al. [1], which provided motifs as a way to uncover the basic building blocks of most networks. Motifs have been mainly applied in Bioinformatics, regarding gene regulation networks. Motif detection is based on induced subgraph counting. This paper proposes an algorithm to count subgraphs of size k + 2 based on the set of induced subgraphs of size k. The general technique was applied to detect 3, 4 and 5-sized motifs in directed graphs. Such algorithms have time complexity O(a(G)m), O(m 2) and O(nm 2), respectively, where a(G) is the arboricity of G(V, E). The computational experiments in public datasets show that the proposed technique was one order of magnitude faster than Kavosh and FANMOD. When compared to NetMODE, acc-Motif had a slightly improved performance.
2011
Many natural structures can be naturally represented by complex networks. Discovering network motifs, which are overrepresented patterns of inter-connections, is a computationally hard task related to graph isomorphism. Sequential methods are hindered by an exponential execution time growth when we increase the size of motifs and networks.
2018
Network motif provides a way to uncover the basic building blocks of most complex networks. This task usually demands high computer processing, specially for motif with 5 or more vertices. This paper presents an extended methodology with the following features: (i) search for motifs up to 6 vertices, (ii) multithread processing, and a (iii) new enumeration algorithm with lower complexity. The algorithm to compute motifs solve isomorphism in $O(1)$ with the use of hash table. Concurrent threads evaluates distinct graphs. The enumeration algorithm has smaller computational complexity. The experiments shows better performance with respect to other methods available in literature, allowing bioinformatic researchers to efficiently identify motifs of size 3, 4, 5, and 6.
Physical Review X
Many real world networks contain a statistically surprising number of certain subgraphs, called network motifs. In the prevalent approach to motif analysis, network motifs are detected by comparing subgraph frequencies in the original network with a statistical null model. In this paper we propose an alternative approach to motif analysis where network motifs are defined to be connectivity patterns that occur in a subgraph cover that represents the network using minimal total information. A subgraph cover is defined to be a set of subgraphs such that every edge of the graph is contained in at least one of the subgraphs in the cover. Some recently introduced random graph models that can incorporate significant densities of motifs have natural formulations in terms of subgraph covers and the presented approach can be used to match networks with such models. To prove the practical value of our approach we also present a heuristic for the resulting NP-hard optimization problem and give results for several real world networks.
Data Mining and Knowledge Discovery, 2017
Network motif discovery is the problem of finding subgraphs of a network that occur more frequently than expected, according to some reasonable null hypothesis. Such subgraphs may indicate small scale interaction features in genomic inter-Responsible editor: Srinivasan Parthasarathy.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.