An optimal algorithm for computing all subtree repeats in trees

Tomáš Flouri

An optimal algorithm for computing all subtree repeats in trees

2014, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences

Abstract
AI

Tree data structures are vital in numerous applications ranging from programming languages to computational biology. The efficient extraction of repeating patterns in trees is a significant computational problem. Recent work has built on algorithms to compute subtree repeats in linear time and space by focusing on full subtrees. This paper presents an optimal algorithm tailored for computing all subtree repeats in trees with performance guarantees.

Sequence Data and Phylogenetic Trees Molecular Phylogeny Understanding evolutionary relationships between different organisms is a fundamental aspect of modern day biology. Trees structures are generally used to depict these relationships. In the days of Charles Darwin rough tree sketches were based on fossil records, morphology and geographical distribution [1]. This is no longer the case. With the advent of sequencing technologies [2] and the realization that both DNA and amino acid sequences could be used to accurately determine the relationship between different organisms [3] a plethora a tree producing algorithms has emerged [4] along with a branch of science referred to as molecular phylogeny. Molecular phylogeny is the science of estimating evolutionary histories using DNA and amino acid sequences. The first step in producing an evolutionary history is the identification of homologous sequences. These are sequences that share a common ancestry [5]. There are different types of homology which include orthology and paralogy. Orthologous sequences share similarities because they originated from a common ancestor. Paralogous sequences on the other hand share similarities due to gene duplication events within and individual species. To infer the evolutionary history between different organism's orthologous sequences are required. These can be aligned after which trees representing the evolutionary relationships between the sequences can be inferred. To improve the accuracy of the evolutionary relationships within the tree, models of sequence evolution are incorporated. Once a tree has been created there are many programs available for viewing and analysing the tree topology. In this chapter a few of the many aspects of aspects of molecular phylogeny will be discussed. Global Alignments Orthologous HIV sequences can be obtained from the Los Alamos HIV sequence database using the search interface provided at http://www.hiv.lanl.gov. Before a tree can be the sequences must be aligned alignment. In 1970 Needleman and Wunsch published an progressive alignment algorithm for performing a global pairwise alignment on two sequences [6]. The algorithm matches together as many characters as possible between two input sequences regardless of their lengths. It uses a process referred to as dynamic programming and is guaranteed to find the alignment with the highest score. The score between two sequences provides information about their evolutionary relationship to each other. When more than two sequences are present the scores between all combinations of sequence pairs form the starting point for producing a multiple alignment. The most famous programs implementing this algorithm are the Clustal series of programs [7-10] and the more recent Muscle [11]. In Home Publications Presentations Software Resources Contact

Log In

An optimal algorithm for computing all subtree repeats in trees

Sign up for access to the world's latest research

AbstractAI

Related papers

Abstract
AI