Skip to main content

Tandy Warnow

University of Illinois at Urbana-Champaign, Computer Science, Faculty Member

Followers

45

Following

18

Co-authors

17

Public Views

Margaret Heslewood

Alexandros Stamatakis

James McInerney

National University of Ireland, Maynooth

Duncan Irschick

Natural History Museum, London

InterestsView All (6)

Uploads

Papers by Tandy Warnow

MRL and SuperFine+MRL: new supertree methods

by Tandy Warnow and Siavash Mirarab

Algorithms for Molecular Biology, 2012

Background: Supertree methods combine trees on subsets of the full taxon set together to produce ... more Background: Supertree methods combine trees on subsets of the full taxon set together to produce a tree on the entire set of taxa. Of the many supertree methods, the most popular is MRP (Matrix Representation with Parsimony), a method that operates by first encoding the input set of source trees by a large matrix (the "MRP matrix") over {0,1, ?}, and then running maximum parsimony heuristics on the MRP matrix. Experimental studies evaluating MRP in comparison to other supertree methods have established that for large datasets, MRP generally produces trees of equal or greater accuracy than other methods, and can run on larger datasets. A recent development in supertree methods is SuperFine+MRP, a method that combines MRP with a divide-and-conquer approach, and produces more accurate trees in less time than MRP. In this paper we consider a new approach for supertree estimation, called MRL (Matrix Representation with Likelihood). MRL begins with the same MRP matrix, but then analyzes the MRP matrix using heuristics (such as RAxML) for 2-state Maximum Likelihood.

ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes

by Siavash Mirarab and Tandy Warnow

Bioinformatics (Oxford, England), Jan 15, 2015

The estimation of species phylogenies requires multiple loci, since different loci can have diffe... more The estimation of species phylogenies requires multiple loci, since different loci can have different trees due to incomplete lineage sorting, modeled by the multi-species coalescent model. We recently developed a coalescent-based method, ASTRAL, which is statistically consistent under the multi-species coalescent model and which is more accurate than other coalescent-based methods on the datasets we examined. ASTRAL runs in polynomial time, by constraining the search space using a set of allowed 'bipartitions'. Despite the limitation to allowed bipartitions, ASTRAL is statistically consistent. We present a new version of ASTRAL, which we call ASTRAL-II. We show that ASTRAL-II has substantial advantages over ASTRAL: it is faster, can analyze much larger datasets (up to 1000 species and 1000 genes) and has substantially better accuracy under some conditions. ASTRAL's running time is [Formula: see text], and ASTRAL-II's running time is [Formula: see text], where n is t...

Ultra-large alignments using phylogeny-aware profiles

by Siavash Mirarab, Keerthana Kumar, and Tandy Warnow

Genome biology, Jan 16, 2015

Many biological questions, including the estimation of deep evolutionary histories and the detect... more Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique - the Ensemble of Hidden Markov Models - that we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp .

Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses

by Tandy Warnow, Siavash Mirarab, and Bastien Boussau

PloS one, 2015

Because biological processes can result in different loci having different evolutionary histories... more Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have...

An Empirical Comparison of Phylogenetic Methods on Chloroplast Gene Order Data in Campanulaceae

by L. Wang, Tandy Warnow, and B. Moret

Computational Biology, 2000

The first heuristic for reconstructing phylogenetic trees from gene order data was introduced by ... more The first heuristic for reconstructing phylogenetic trees from gene order data was introduced by Blanchette et al.. It sought to reconstruct the breakpoint phylogeny and was applied to a variety of datasets. We present a new heuristic for estimating the breakpoint phylogeny which, although not polynomial-time, is much faster in practice than BP-Analysis. We use this heuristic to conduct a phylogenetic analysis of chloroplast genomes in the flowering plant family Campanulaceae. We also present and discuss the results of experimentation on this real dataset with three methods: our new method, BPAnalysis, and the neighbor-joining method, using breakpoint distances, inversion distances, and inversion plus transposition distances.

Reconstructing Optimal Phylogenetic Trees: A Challenge in Experimental Algorithmics

by Tandy Warnow and B. Moret

Lecture Notes in Computer Science, 2002

The benefits of experimental algorithmics and algorithm engineering need to be extended to applic... more The benefits of experimental algorithmics and algorithm engineering need to be extended to applications in the computational sciences. In this paper, we present on one such application: the reconstruction of evolutionary histories (phylogenies) from molecular data such as DNA sequences. Our presentation is not a survey of past and current work in the area, but rather a discussion of what we see as some of the important challenges in experimental algorithmics that arise from computational phylogenetics.

Rec-I-DCM3: a fast algorithmic technique for reconstructing large phylogenetic trees

by Usman Roshan, Tandy Warnow, and B. Moret

Proceedings / IEEE Computational Systems Bioinformatics Conference, CSB. IEEE Computational Systems Bioinformatics Conference, 2004

Phylogenetic trees are commonly reconstructed based on hard optimization problems such as maximum... more Phylogenetic trees are commonly reconstructed based on hard optimization problems such as maximum parsimony (MP) and maximum likelihood (ML). Conventional MP heuristics for producing phylogenetic trees produce good solutions within reasonable time on small datasets (up to a few thousand sequences), while ML heuristics are limited to smaller datasets (up to a few hundred sequences). However, since MP (and presumably ML) is NP-hard, such approaches do not scale when applied to large datasets. In this paper, we present a new technique called Recursive-Iterative-DCM3 (Rec-I-DCM3), which belongs to our family of Disk-Covering Methods (DCMs). We tested this new technique on ten large biological datasets ranging from 1,322 to 13,921 sequences and obtained dramatic speedups as well as significant improvements in accuracy (better than 99.99%) in comparison to existing approaches. Thus, high-quality reconstructions can be obtained for datasets at least ten times larger than was previously pos...

Advances in phylogeny reconstruction from gene order and content data

by Tandy Warnow and B. Moret

Methods in enzymology, 2005

Genomes can be viewed in terms of their gene content and the order in which the genes appear alon... more Genomes can be viewed in terms of their gene content and the order in which the genes appear along each chromosome. Evolutionary events that affect the gene order or content are "rare genomic events" (rarer than events that affect the composition of the nucleotide sequences) and have been advocated by systematists for inferring deep evolutionary histories. This chapter surveys recent developments in the reconstruction of phylogenies from gene order and content, focusing on their performance under various stochastic models of evolution. Because such methods are quite restricted in the type of data they can analyze, we also present research aimed at handling the full range of whole-genome data.

A new implementation and detailed study of breakpoint analysis

by Tandy Warnow and B. Moret

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 2001

Phylogenies derived from gene order data may prove crucial in answering some fundamental open que... more Phylogenies derived from gene order data may prove crucial in answering some fundamental open questions in biomolecular evolution. Yet very few techniques are available for such phylogenetic reconstructions. One method is breakpoint analysis, developed by Blanchette and Sankoff for solving the "breakpoint phylogeny." Our earlier studies confirmed the usefulness of this approach, but also found that BPAnalysis, the implementation developed by Sankoff and Blanchette, was too slow to use on all but very small datasets. We report here on a reimplementation of BPAnalysis using the principles of algorithmic engineering. Our faster (by 2 to 3 orders of magnitude) and flexible implementation allowed us to conduct studies on the characteristics of breakpoint analysis, in terms of running time, quality, and robustness, as well as to analyze datasets that had so far been considered out of reach. We report on these findings and also discuss future directions for our new implementation.

Toward New Software for Computational Phylogenetics

by L. Wang, Tandy Warnow, and B. Moret

IEEE Computer, 2002

Systematists study how a group of genes or organisms evolved. These biologists now have set their... more Systematists study how a group of genes or organisms evolved. These biologists now have set their sights on the Tree of Life challenge: to reconstruct the evolutionary history of all known living organisms. A typical phylogenetic reconstruction starts with biomolecular data, such as DNA sequences for modern organisms, and builds a tree, or phylogeny, for these sequences that represents a

A New Fast Heuristic for Computing the Breakpoint Phylogeny and Experimental Phylogenetic Analyses of Real and Synthetic Data

by L. Wang, Tandy Warnow, and B. Moret

Intelligent Systems in Molecular Biology, 2000

The breakpoint phylogeny is an optimization problem proposed by Blanchette et al. for reconstruct... more The breakpoint phylogeny is an optimization problem proposed by Blanchette et al. for reconstructing evolutionary trees from gene order data. These same authors also developed and implemented BPAnalysis (3), a heuristic method (based upon solving many instances of the travelling salesman problem) for estimating the breakpoint phylogeny. We present a new heuristic for this purpose; although not polynomial-time, our heuristic

Fast phylogenetic methods for the analysis of genome rearrangement data: an empirical study

by Tandy Warnow and B. Moret

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 2002

Evolution operates on whole genomes through mutations that change the order and strandedness of g... more Evolution operates on whole genomes through mutations that change the order and strandedness of genes within the genomes. Thus analyses of gene-order data present new opportunities for discoveries about deep evolutionary events, provided that sufficiently accurate methods can be developed to reconstruct evolutionary trees. In this paper we present two new methods of character coding for parsimony-based analysis of genomic rearrangements: one called MPBE-2, and a new parsimony-based method which we call MPME (based on an encoding of Bryant), both variants of the MPBE method. We then conduct computer simulations to compare this class of methods to distance-based methods (NJ under various distance measures). Our empirical results show that two of our new methods return highly accurate estimates of the true tree, outperforming the other methods significantly, especially when close to saturation.

Session Introduction

by Tandy Warnow and B. Moret

Pacific Symposium on Biocomputing, 2008

Multiple sequence alignment (MSA) has long been a mainstay of bioinformatics, particularly in the... more Multiple sequence alignment (MSA) has long been a mainstay of bioinformatics, particularly in the alignment of well conserved protein and DNA sequences and in phylogenetic reconstruction for such data. Sequence datasets with low percentage identity, on the other hand, typically yield poor alignments. Now that researchers want to produce alignments among widely divergent genomes, including both coding and noncoding sequences it is necessary to revisit sequence alignment and phylogenetic reconstruction under more ambitious models of sequence evolution that take into account the plethora of genomic events that have been observed.

Performance study of phylogenetic methods: (unweighted) quartet methods and neighbor-joining

by Tandy Warnow and B. Moret

ACM-SIAM Symposium on Discrete Algorithms, 2001

We present the results of a large-scale experimentalstudy of quartet-based methods (quartet clean... more We present the results of a large-scale experimentalstudy of quartet-based methods (quartet cleaning andpuzzling) for phylogeny reconstruction. Our experimentsinclude a broad range of problem sizes and evolutionaryrates, and were carefully designed to yield statisticallyrobust results despite the size of the samplespace. We measure outcomes in terms of numbers ofedges of the true tree correctly inferred by each method(true positives). Our

Distance-Based Genome Rearrangement Phylogeny

by L. Wang, Tandy Warnow, and B. Moret

Evolution operates on whole genomes through direct rearrangements of genes, such as inversions, t... more Evolution operates on whole genomes through direct rearrangements of genes, such as inversions, transpositions, and inverted transposi- tions, as well as through operations, such as dupli- cations, losses, and transfers, that also affect the gene content of the genomes. Because these events are rare relative to nucleotide substitutions, gene order data offer the possibility of resolving ancient branches in the

TIPP: taxonomic identification and phylogenetic profiling

Bioinformatics, 2014

Abundance profiling (also called &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp... more Abundance profiling (also called &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#39;phylogenetic profiling&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#39;) is a crucial step in understanding the diversity of a metagenomic sample, and one of the basic techniques used for this is taxonomic identification of the metagenomic reads. We present taxon identification and phylogenetic profiling (TIPP), a new marker-based taxon identification and abundance profiling method. TIPP combines SAT\&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#39;e-enabled phylogenetic placement a phylogenetic placement method, with statistical techniques to control the classification precision and recall, and results in improved abundance profiles. TIPP is highly accurate even in the presence of high indel errors and novel genomes, and matches or improves on previous approaches, including NBC, mOTU, PhymmBL, MetaPhyler and MetaPhlAn.

Statistically based postprocessing of phylogenetic analysis by clustering

by L. Wang and Tandy Warnow

Intelligent Systems in Molecular Biology, 2002

Motivation: Phylogenetic analyses often produce thou- sands of candidate trees. Biologists resolv... more Motivation: Phylogenetic analyses often produce thou- sands of candidate trees. Biologists resolve the conflict by computing the consensus of these trees. Single-tree con- sensus as postprocessing methods can be unsatisfactory due to their inherent limitations. Results: In this paper we present an alternative approach by using clustering algorithms on the set of candidate trees. We propose bicriterion problems, in particular

The Accuracy of Fast Phylogenetic Methods for Large Datasets

by Usman Roshan, Tandy Warnow, and B. Moret

Pacific Symposium on Biocomputing, 2002

Whole-genome phylogenetic studies require various sources of phylogenetic signals to produce an a... more Whole-genome phylogenetic studies require various sources of phylogenetic signals to produce an accurate picture of the evolutionary history of a group of genomes. In particular, sequence-based reconstruction will play an important role, especially in r esolving more recent events. But using sequences at the level of whole genomes means working with very large amounts of data—large numbers of sequences—as well

Designing fast converging phylogenetic methods

by Tandy Warnow and Usman Roshan

Intelligent Systems in Molecular Biology, 2001

Absolute fast converging phylogenetic reconstruction methods are provably guaranteed to recover t... more Absolute fast converging phylogenetic reconstruction methods are provably guaranteed to recover the true tree with high probability from sequences that grow only polynomially in the number of leaves, once the edge lengths are bounded arbitrarily from above and below. Only a few methods have been determined to be absolute fast converging; these have all been developed in just the last

Estimating Large Distances in Phylogenetic Reconstruction

Lecture Notes in Computer Science, 1999

A major computational problem in biology is the reconstruction of evolutionary (a.k.a. “phylogene... more A major computational problem in biology is the reconstruction of evolutionary (a.k.a. “phylogenetic”) trees from biomolecular sequences. Most polynomial time phylogenetic reconstruction methods are distance-based, and take as input an estimation of the evolutionary distance between every pair of biomolecular sequences in the dataset. The estimation of evolutionary distances is standardized except when the set of biomolecular sequences is “saturated”,

MRL and SuperFine+MRL: new supertree methods

by Tandy Warnow and Siavash Mirarab

Algorithms for Molecular Biology, 2012

Background: Supertree methods combine trees on subsets of the full taxon set together to produce ... more Background: Supertree methods combine trees on subsets of the full taxon set together to produce a tree on the entire set of taxa. Of the many supertree methods, the most popular is MRP (Matrix Representation with Parsimony), a method that operates by first encoding the input set of source trees by a large matrix (the "MRP matrix") over {0,1, ?}, and then running maximum parsimony heuristics on the MRP matrix. Experimental studies evaluating MRP in comparison to other supertree methods have established that for large datasets, MRP generally produces trees of equal or greater accuracy than other methods, and can run on larger datasets. A recent development in supertree methods is SuperFine+MRP, a method that combines MRP with a divide-and-conquer approach, and produces more accurate trees in less time than MRP. In this paper we consider a new approach for supertree estimation, called MRL (Matrix Representation with Likelihood). MRL begins with the same MRP matrix, but then analyzes the MRP matrix using heuristics (such as RAxML) for 2-state Maximum Likelihood.

ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes

by Siavash Mirarab and Tandy Warnow

Bioinformatics (Oxford, England), Jan 15, 2015

The estimation of species phylogenies requires multiple loci, since different loci can have diffe... more The estimation of species phylogenies requires multiple loci, since different loci can have different trees due to incomplete lineage sorting, modeled by the multi-species coalescent model. We recently developed a coalescent-based method, ASTRAL, which is statistically consistent under the multi-species coalescent model and which is more accurate than other coalescent-based methods on the datasets we examined. ASTRAL runs in polynomial time, by constraining the search space using a set of allowed 'bipartitions'. Despite the limitation to allowed bipartitions, ASTRAL is statistically consistent. We present a new version of ASTRAL, which we call ASTRAL-II. We show that ASTRAL-II has substantial advantages over ASTRAL: it is faster, can analyze much larger datasets (up to 1000 species and 1000 genes) and has substantially better accuracy under some conditions. ASTRAL's running time is [Formula: see text], and ASTRAL-II's running time is [Formula: see text], where n is t...

Ultra-large alignments using phylogeny-aware profiles

by Siavash Mirarab, Keerthana Kumar, and Tandy Warnow

Genome biology, Jan 16, 2015

Many biological questions, including the estimation of deep evolutionary histories and the detect... more Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique - the Ensemble of Hidden Markov Models - that we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp .

Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses

by Tandy Warnow, Siavash Mirarab, and Bastien Boussau

PloS one, 2015

Because biological processes can result in different loci having different evolutionary histories... more Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have...

An Empirical Comparison of Phylogenetic Methods on Chloroplast Gene Order Data in Campanulaceae

by L. Wang, Tandy Warnow, and B. Moret

Computational Biology, 2000

The first heuristic for reconstructing phylogenetic trees from gene order data was introduced by ... more The first heuristic for reconstructing phylogenetic trees from gene order data was introduced by Blanchette et al.. It sought to reconstruct the breakpoint phylogeny and was applied to a variety of datasets. We present a new heuristic for estimating the breakpoint phylogeny which, although not polynomial-time, is much faster in practice than BP-Analysis. We use this heuristic to conduct a phylogenetic analysis of chloroplast genomes in the flowering plant family Campanulaceae. We also present and discuss the results of experimentation on this real dataset with three methods: our new method, BPAnalysis, and the neighbor-joining method, using breakpoint distances, inversion distances, and inversion plus transposition distances.

Reconstructing Optimal Phylogenetic Trees: A Challenge in Experimental Algorithmics

by Tandy Warnow and B. Moret

Lecture Notes in Computer Science, 2002

The benefits of experimental algorithmics and algorithm engineering need to be extended to applic... more The benefits of experimental algorithmics and algorithm engineering need to be extended to applications in the computational sciences. In this paper, we present on one such application: the reconstruction of evolutionary histories (phylogenies) from molecular data such as DNA sequences. Our presentation is not a survey of past and current work in the area, but rather a discussion of what we see as some of the important challenges in experimental algorithmics that arise from computational phylogenetics.

Rec-I-DCM3: a fast algorithmic technique for reconstructing large phylogenetic trees

by Usman Roshan, Tandy Warnow, and B. Moret

Proceedings / IEEE Computational Systems Bioinformatics Conference, CSB. IEEE Computational Systems Bioinformatics Conference, 2004

Phylogenetic trees are commonly reconstructed based on hard optimization problems such as maximum... more Phylogenetic trees are commonly reconstructed based on hard optimization problems such as maximum parsimony (MP) and maximum likelihood (ML). Conventional MP heuristics for producing phylogenetic trees produce good solutions within reasonable time on small datasets (up to a few thousand sequences), while ML heuristics are limited to smaller datasets (up to a few hundred sequences). However, since MP (and presumably ML) is NP-hard, such approaches do not scale when applied to large datasets. In this paper, we present a new technique called Recursive-Iterative-DCM3 (Rec-I-DCM3), which belongs to our family of Disk-Covering Methods (DCMs). We tested this new technique on ten large biological datasets ranging from 1,322 to 13,921 sequences and obtained dramatic speedups as well as significant improvements in accuracy (better than 99.99%) in comparison to existing approaches. Thus, high-quality reconstructions can be obtained for datasets at least ten times larger than was previously pos...

Advances in phylogeny reconstruction from gene order and content data

by Tandy Warnow and B. Moret

Methods in enzymology, 2005

Genomes can be viewed in terms of their gene content and the order in which the genes appear alon... more Genomes can be viewed in terms of their gene content and the order in which the genes appear along each chromosome. Evolutionary events that affect the gene order or content are "rare genomic events" (rarer than events that affect the composition of the nucleotide sequences) and have been advocated by systematists for inferring deep evolutionary histories. This chapter surveys recent developments in the reconstruction of phylogenies from gene order and content, focusing on their performance under various stochastic models of evolution. Because such methods are quite restricted in the type of data they can analyze, we also present research aimed at handling the full range of whole-genome data.

A new implementation and detailed study of breakpoint analysis

by Tandy Warnow and B. Moret

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 2001

Phylogenies derived from gene order data may prove crucial in answering some fundamental open que... more Phylogenies derived from gene order data may prove crucial in answering some fundamental open questions in biomolecular evolution. Yet very few techniques are available for such phylogenetic reconstructions. One method is breakpoint analysis, developed by Blanchette and Sankoff for solving the "breakpoint phylogeny." Our earlier studies confirmed the usefulness of this approach, but also found that BPAnalysis, the implementation developed by Sankoff and Blanchette, was too slow to use on all but very small datasets. We report here on a reimplementation of BPAnalysis using the principles of algorithmic engineering. Our faster (by 2 to 3 orders of magnitude) and flexible implementation allowed us to conduct studies on the characteristics of breakpoint analysis, in terms of running time, quality, and robustness, as well as to analyze datasets that had so far been considered out of reach. We report on these findings and also discuss future directions for our new implementation.

Toward New Software for Computational Phylogenetics

by L. Wang, Tandy Warnow, and B. Moret

IEEE Computer, 2002

Systematists study how a group of genes or organisms evolved. These biologists now have set their... more Systematists study how a group of genes or organisms evolved. These biologists now have set their sights on the Tree of Life challenge: to reconstruct the evolutionary history of all known living organisms. A typical phylogenetic reconstruction starts with biomolecular data, such as DNA sequences for modern organisms, and builds a tree, or phylogeny, for these sequences that represents a

A New Fast Heuristic for Computing the Breakpoint Phylogeny and Experimental Phylogenetic Analyses of Real and Synthetic Data

by L. Wang, Tandy Warnow, and B. Moret

Intelligent Systems in Molecular Biology, 2000

The breakpoint phylogeny is an optimization problem proposed by Blanchette et al. for reconstruct... more The breakpoint phylogeny is an optimization problem proposed by Blanchette et al. for reconstructing evolutionary trees from gene order data. These same authors also developed and implemented BPAnalysis (3), a heuristic method (based upon solving many instances of the travelling salesman problem) for estimating the breakpoint phylogeny. We present a new heuristic for this purpose; although not polynomial-time, our heuristic

Fast phylogenetic methods for the analysis of genome rearrangement data: an empirical study

by Tandy Warnow and B. Moret

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 2002

Evolution operates on whole genomes through mutations that change the order and strandedness of g... more Evolution operates on whole genomes through mutations that change the order and strandedness of genes within the genomes. Thus analyses of gene-order data present new opportunities for discoveries about deep evolutionary events, provided that sufficiently accurate methods can be developed to reconstruct evolutionary trees. In this paper we present two new methods of character coding for parsimony-based analysis of genomic rearrangements: one called MPBE-2, and a new parsimony-based method which we call MPME (based on an encoding of Bryant), both variants of the MPBE method. We then conduct computer simulations to compare this class of methods to distance-based methods (NJ under various distance measures). Our empirical results show that two of our new methods return highly accurate estimates of the true tree, outperforming the other methods significantly, especially when close to saturation.

Session Introduction

by Tandy Warnow and B. Moret

Pacific Symposium on Biocomputing, 2008

Multiple sequence alignment (MSA) has long been a mainstay of bioinformatics, particularly in the... more Multiple sequence alignment (MSA) has long been a mainstay of bioinformatics, particularly in the alignment of well conserved protein and DNA sequences and in phylogenetic reconstruction for such data. Sequence datasets with low percentage identity, on the other hand, typically yield poor alignments. Now that researchers want to produce alignments among widely divergent genomes, including both coding and noncoding sequences it is necessary to revisit sequence alignment and phylogenetic reconstruction under more ambitious models of sequence evolution that take into account the plethora of genomic events that have been observed.

Performance study of phylogenetic methods: (unweighted) quartet methods and neighbor-joining

by Tandy Warnow and B. Moret

ACM-SIAM Symposium on Discrete Algorithms, 2001

We present the results of a large-scale experimentalstudy of quartet-based methods (quartet clean... more We present the results of a large-scale experimentalstudy of quartet-based methods (quartet cleaning andpuzzling) for phylogeny reconstruction. Our experimentsinclude a broad range of problem sizes and evolutionaryrates, and were carefully designed to yield statisticallyrobust results despite the size of the samplespace. We measure outcomes in terms of numbers ofedges of the true tree correctly inferred by each method(true positives). Our

Distance-Based Genome Rearrangement Phylogeny

by L. Wang, Tandy Warnow, and B. Moret

Evolution operates on whole genomes through direct rearrangements of genes, such as inversions, t... more Evolution operates on whole genomes through direct rearrangements of genes, such as inversions, transpositions, and inverted transposi- tions, as well as through operations, such as dupli- cations, losses, and transfers, that also affect the gene content of the genomes. Because these events are rare relative to nucleotide substitutions, gene order data offer the possibility of resolving ancient branches in the

TIPP: taxonomic identification and phylogenetic profiling

Bioinformatics, 2014

Abundance profiling (also called &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp... more Abundance profiling (also called &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#39;phylogenetic profiling&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#39;) is a crucial step in understanding the diversity of a metagenomic sample, and one of the basic techniques used for this is taxonomic identification of the metagenomic reads. We present taxon identification and phylogenetic profiling (TIPP), a new marker-based taxon identification and abundance profiling method. TIPP combines SAT\&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#39;e-enabled phylogenetic placement a phylogenetic placement method, with statistical techniques to control the classification precision and recall, and results in improved abundance profiles. TIPP is highly accurate even in the presence of high indel errors and novel genomes, and matches or improves on previous approaches, including NBC, mOTU, PhymmBL, MetaPhyler and MetaPhlAn.

Statistically based postprocessing of phylogenetic analysis by clustering

by L. Wang and Tandy Warnow

Intelligent Systems in Molecular Biology, 2002

Motivation: Phylogenetic analyses often produce thou- sands of candidate trees. Biologists resolv... more Motivation: Phylogenetic analyses often produce thou- sands of candidate trees. Biologists resolve the conflict by computing the consensus of these trees. Single-tree con- sensus as postprocessing methods can be unsatisfactory due to their inherent limitations. Results: In this paper we present an alternative approach by using clustering algorithms on the set of candidate trees. We propose bicriterion problems, in particular

The Accuracy of Fast Phylogenetic Methods for Large Datasets

by Usman Roshan, Tandy Warnow, and B. Moret

Pacific Symposium on Biocomputing, 2002

Whole-genome phylogenetic studies require various sources of phylogenetic signals to produce an a... more Whole-genome phylogenetic studies require various sources of phylogenetic signals to produce an accurate picture of the evolutionary history of a group of genomes. In particular, sequence-based reconstruction will play an important role, especially in r esolving more recent events. But using sequences at the level of whole genomes means working with very large amounts of data—large numbers of sequences—as well

Designing fast converging phylogenetic methods

by Tandy Warnow and Usman Roshan

Intelligent Systems in Molecular Biology, 2001

Absolute fast converging phylogenetic reconstruction methods are provably guaranteed to recover t... more Absolute fast converging phylogenetic reconstruction methods are provably guaranteed to recover the true tree with high probability from sequences that grow only polynomially in the number of leaves, once the edge lengths are bounded arbitrarily from above and below. Only a few methods have been determined to be absolute fast converging; these have all been developed in just the last

Estimating Large Distances in Phylogenetic Reconstruction

Lecture Notes in Computer Science, 1999

A major computational problem in biology is the reconstruction of evolutionary (a.k.a. “phylogene... more A major computational problem in biology is the reconstruction of evolutionary (a.k.a. “phylogenetic”) trees from biomolecular sequences. Most polynomial time phylogenetic reconstruction methods are distance-based, and take as input an estimation of the evolutionary distance between every pair of biomolecular sequences in the dataset. The estimation of evolutionary distances is standardized except when the set of biomolecular sequences is “saturated”,