Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2014, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
AI
Tree data structures are vital in numerous applications ranging from programming languages to computational biology. The efficient extraction of repeating patterns in trees is a significant computational problem. Recent work has built on algorithms to compute subtree repeats in linear time and space by focusing on full subtrees. This paper presents an optimal algorithm tailored for computing all subtree repeats in trees with performance guarantees.
Proceedings. 20th International Conference on Data Engineering
Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data, such as trees and graphs. FSM finds many applications in bioinformatics, XML processing, Web log analysis, and so on. In this paper we present a new FSM technique for finding patterns in rooted unordered labeled trees. The patterns of interest are cousin pairs in these trees. A cousin pair is a pair of nodes sharing the same parent, the same grandparent, or the same great-grandparent, etc.
Sequence Data and Phylogenetic Trees Molecular Phylogeny Understanding evolutionary relationships between different organisms is a fundamental aspect of modern day biology. Trees structures are generally used to depict these relationships. In the days of Charles Darwin rough tree sketches were based on fossil records, morphology and geographical distribution [1]. This is no longer the case. With the advent of sequencing technologies [2] and the realization that both DNA and amino acid sequences could be used to accurately determine the relationship between different organisms [3] a plethora a tree producing algorithms has emerged [4] along with a branch of science referred to as molecular phylogeny. Molecular phylogeny is the science of estimating evolutionary histories using DNA and amino acid sequences. The first step in producing an evolutionary history is the identification of homologous sequences. These are sequences that share a common ancestry [5]. There are different types of homology which include orthology and paralogy. Orthologous sequences share similarities because they originated from a common ancestor. Paralogous sequences on the other hand share similarities due to gene duplication events within and individual species. To infer the evolutionary history between different organism's orthologous sequences are required. These can be aligned after which trees representing the evolutionary relationships between the sequences can be inferred. To improve the accuracy of the evolutionary relationships within the tree, models of sequence evolution are incorporated. Once a tree has been created there are many programs available for viewing and analysing the tree topology. In this chapter a few of the many aspects of aspects of molecular phylogeny will be discussed. Global Alignments Orthologous HIV sequences can be obtained from the Los Alamos HIV sequence database using the search interface provided at http://www.hiv.lanl.gov. Before a tree can be the sequences must be aligned alignment. In 1970 Needleman and Wunsch published an progressive alignment algorithm for performing a global pairwise alignment on two sequences [6]. The algorithm matches together as many characters as possible between two input sequences regardless of their lengths. It uses a process referred to as dynamic programming and is guaranteed to find the alignment with the highest score. The score between two sequences provides information about their evolutionary relationship to each other. When more than two sequences are present the scores between all combinations of sequence pairs form the starting point for producing a multiple alignment. The most famous programs implementing this algorithm are the Clustal series of programs [7-10] and the more recent Muscle [11]. In Home Publications Presentations Software Resources Contact
Proceedings of The National Academy of Sciences, 1979
Evolutionary trees are usually calculated from comparisons of protein or nucleic acid sequences from present-day organisms by use of algorithms that use only the difference matrix, where the difference matrix is constructed from the sequence differences between pairs of sequences from the organisms. The difference matrix alone cannot define uniquely the correct position of the ancestor of the present-day organisms (root of the tree). Furthermore, methods using the difference matrix alone often fail to give the correct pattern of tree branching (topology) when the different sequences evolve at different rates. Only for equal rates of evolution can the difference matrix (when used with the so-called matrix method) yield exactly the correct topology and root. In this paper we present a method for calculating evolutionary trees from sequence data that uses, along with the difference matrix, the rate of evolution of the various sequences from their common ancestor. It is proven analytically that this method uniquely determines both the correct tree topology and root in theory for unequal rates of sequence evolution. How one would estimate an ancestral sequence to be used in the method is discussed in particular for the 5S RNA sequences from prokaryotes and eukaryotes and for ferredoxin sequences.
Article CITATIONS 0 READS 13 3 authors: Some of the authors of this publication are also working on these related projects: Pathogenomics View project Firas Swidan 18 PUBLICATIONS 95 CITATIONS SEE PROFILE Michal Ziv-Ukelson Ben-Gurion University of the Negev 66 PUBLICATIONS 1,080 CITATIONS SEE PROFILE
Algorithms for Molecular Biology
Background The supertree problem, i.e., the task of finding a common refinement of a set of rooted trees is an important topic in mathematical phylogenetics. The special case of a common leaf set L is known to be solvable in linear time. Existing approaches refine one input tree using information of the others and then test whether the results are isomorphic. Results An O(k|L|) algorithm, , for constructing the common refinement T of k input trees with a common leaf set L is proposed that explicitly computes the parent function of T in a bottom-up approach. Conclusion is simpler to implement than other asymptotically optimal algorithms for the problem and outperforms the alternatives in empirical comparisons. Availability An implementation of in Python is freely available at https://github.com/david-schaller/tralda.
ArXiv, 2021
The problem of finding a common refinement of a set of rooted trees with common leaf set L appears naturally in mathematical phylogenetics whenever poorly resolved information on the same taxa from different sources is to be reconciled. This constitutes a special case of the wellstudied supertree problem, where the leaf sets of the input trees may differ. Algorithms that solve the rooted tree compatibility problem are of course applicable to this special case. However, they require sophisticated auxiliary data structures and have a running time of at least O(k|L| log(k|L|)) for k input trees. Here, we show that the problem can be solved in O(k|L|) time using a simple bottom-up algorithm called LinCR. An implementation of LinCR in Python is freely available at https://github.com/david-schaller/tralda.
Zeitschrift für Naturforschung C, 1979
An algorithm for phylogenetic trees’ construction is analyzed.
SIAM Journal on Discrete Mathematics, 2016
Manuscript distributed under the terms of the GNU Free Documentation License. 31 pp., 2005
1. Introduction 2. Implicit enumeration for nt terminals (nt >=2) 3. Find optimal binary trees using branch and bound, for nt terminals (nt >=2) 4. Build a tree by stepwise addition (n terminals, n >= 3) 5. Branch swapping 5.a. Introduction 5.b. A tree search strategy using SPR rearrangements of given trees 5.c. A tree search strategy using TBR rearrangements of given trees 6. Ratcheting 7. Tree drifting 8. Tree fusing 9. Static approximation 10. An integrated approach 11. Some quick comments on time complexity References
PLoS Computational Biology, 2013
Phylogenetic trees are used to analyze and visualize evolution. However, trees can be imperfect datatypes when summarizing multiple trees. This is especially problematic when accommodating for biological phenomena such as horizontal gene transfer, incomplete lineage sorting, and hybridization, as well as topological conflict between datasets. Additionally, researchers may want to combine information from sets of trees that have partially overlapping taxon sets. To address the problem of analyzing sets of trees with conflicting relationships and partially overlapping taxon sets, we introduce methods for aligning, synthesizing and analyzing rooted phylogenetic trees within a graph, called a tree alignment graph (TAG). The TAG can be queried and analyzed to explore uncertainty and conflict. It can also be synthesized to construct trees, presenting an alternative to supertrees approaches. We demonstrate these methods with two empirical datasets. In order to explore uncertainty, we constructed a TAG of the bootstrap trees from the Angiosperm Tree of Life project. Analysis of the resulting graph demonstrates that areas of the dataset that are unresolved in majority-rule consensus tree analyses can be understood in more detail within the context of a graph structure, using measures incorporating node degree and adjacency support. As an exercise in synthesis (i.e., summarization of a TAG constructed from the alignment trees), we also construct a TAG consisting of the taxonomy and source trees from a recent comprehensive bird study. We synthesized this graph into a tree that can be reconstructed in a repeatable fashion and where the underlying source information can be updated. The methods presented here are tractable for large scale analyses and serve as a basis for an alternative to consensus tree and supertree methods. Furthermore, the exploration of these graphs can expose structures and patterns within the dataset that are otherwise difficult to observe.
2004
The aim of the conference was to provide a common forum between academic and private research institutions to present and discuss their latest results and developments on Bioinformatics and Computational Biology. The scope of the conference covered nearly all aspects of basic and applied research on the field, such as functional and structural genomics and proteomics, comparative genomics and molecular evolution, algorithmic development and computational methods. The conference program included seven sessions of contributed papers, on Functional Analysis (chaired by Joaquín Dopazo), Comparative Genomics (chaired by Andrés Moya), Molecular Evolution (chaired by José Castresana), Structural Analysis and Modeling (chaired by Xavier Avilés), Biomedical Informatics (chaired by Ferran Sanz), Computational Methods (chaired by Gabriel Valiente), and Sequence Analysis (chaired by Roderic Guigó). The Proceedings of the 5th Annual Spanish Bioinformatics Conference consist of two parts. The first part comprises the 22 contributed papers, out of 45 submissions, that were alloted a 20-minute presentation at the conference. The second part comprises the remaining 23 contributed papers (in submission order) together with the 15 contributed posters (also in submission order) that were presented at the poster session. The acceptance ratio was 22/45 = 49%. We would like to thank the members of the program committee for their help in the selection process. Moreover, we would like to express our gratitude to the local committee members Rosa Badia, Marc Cid, Juan José Porta, and Romà Roset, as well as to the organizing committee members M. Mar Albà, Xavier Avilés, and Julio Rozas.
Journal of Computational Biology, 2006
A new problem in phylogenetic inference is presented, based on recent biological findings indicating a strong association between reversals (aka inversions) and repeats. These biological findings are formalized here in a new mathematical model, called repeatannotated phylogenetic trees (RAPT). We show that, under RAPT, the evolutionary process -including both the tree-topology as well as internal node genome orders -is uniquely determined, a property that is of major significance both in theory and in practice. Furthermore, the repeats are employed to provide linear-time algorithms for reconstructing both the genomic orders and the phylogeny, which are NP-hard problems under the classical model of sorting by reversals (SBR).
2006
Abstract: Phylogenetic trees are graph-like structures whose topology describes the inferred pattern of relationships among a set of biological entities, such as species or DNA sequences. Inference of these phylogenies typically involves evaluating large numbers of possible solutions and choosing the optimal topology, or set of topologies, from among all evaluated solutions. Such analyses are computationally intensive, especially when the pattern of relationships among a large number of entities is being sought.
1997
Abstract Phylogenetics is the study and identification of evolutionary patterns and structures in nature; this thesis explores the mathematics of these structures. The basic objects of study are the leaf labelled tree and its substructures: quartets, splits, clusters and rooted triples, among others. We present fundamental theorems and characterisations, as well as efficient algorithms for a range of phylogenetic problems. It is often possible to deduce phylogenetic information not in the original data.
Lecture Notes in Computer Science, 2009
Gene trees are leaf-labeled trees inferred from molecular sequences. Due to duplication events arising in genome evolution, gene trees usually have multiple copies of some labels, i.e. species. Inferring a species tree from a set of multi-labeled gene trees (MUL trees) is a wellknown problem in computational biology. We propose a novel approach to tackle this problem, mainly to transform a collection of MUL trees into a collection of evolutionary trees, each containing single copies of labels. To that aim, we provide several algorithmic building stones and describe how they fit within a general species tree inference process. Most algorithms have a linear-time complexity, except for an FPT algorithm proposed for a problem that we show to be intractable.
1989
ferring evolutionary trees from DNA sequence data was developed by Felsenstein (1 98 l). In evaluating the extent to which the maximum likelihood tree is a significantly better representation of the true tree, it is important to estimate the variance of the difference between log likelihood of different tree topologies. Bootstrap resampling can be used for this purpose (Hasegawaet al. 1 988; Hasegawa and Kishino 1989), but it imposes a great computation burden. To overcome this difficulty, we developed a new method for estimating the variance by expressing it explicitly. The method was applied to DNA sequence data from primates in order to evaluate the maximum likelihood branching order among Hominoidea. It was shown that, although the orangutan is convincingly placed as an outgroup of a human andAfrican apes clade, the branching order among human, chimpanzee, and gorilla cannot be determined confidently from the DNAsequence data presently available when the evolutionary rate constancy is not assumed.
bioRxiv (Cold Spring Harbor Laboratory), 2018
Over the last 20 years, TreeBASE has acquired a substantial body of phylogenetic data, including more than 20,000 published phylogenies. Given latency issues and limited options when it comes to querying the database remotely, a simplified and consolidated version of the database, here called TreeBASEdmp, is made available for download, allowing biologists to design custom analyses of the data on their local computers. The database is indexed to support searching for phylogenetic topologies using nested sets and closure tables. Here we propose a new approach to find broadly-defined phylogenetic patterns, a method we call Generic Topological Querying, which allows the user to find hypotheses of relationship without being constrained to use particular sets of specific taxa. Additionally, we normalize as many leaf nodes as possible to an equivalent species rank identifier to assist in supertree synthesis. Our example script rapidly assembles sets of trees and generates a matrix representation of them for subsequent supertree generation.
Lecture Notes in Computer Science, 2015
Tree alignment graphs (TAGs) provide an intuitive data structure for storing phylogenetic trees that exhibits the relationships of the individual input trees and can potentially account for nested taxonomic relationships. This paper provides a theoretical foundation for the use of TAGs in phylogenetics. We provide a formal definition of TAG that-unlike previous definition-does not depend on the order in which input trees are provided. In the consensus case, when all input trees have the same leaf labels, we describe algorithms for constructing majority-rule and strict consensus trees using the TAG. When the input trees do not have identical sets of leaf labels, we describe how to determine if the input trees are compatible and, if they are compatible, to construct a supertree that contains the input trees.
2009
Abstract-In the past research efforts on computational phylogenetic analysis were dedicated to the design of heuristics which can quickly find near-optimal trees under a specific optimization criterion. However, all criteria are over-simplified and cannot realistically model the real evolution process. Thus all existing algorithms for phylogenetic analysis have their limitations. It has become a serious issue for many important real-life applications which often demand accurate results from phylogenetic analysis.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.