Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2002
AI
This study presents a large-scale experimental investigation of quartet-based methods for phylogenetic reconstruction, particularly focusing on quartet cleaning and puzzling techniques. The research indicates a significant accuracy deficit in these methods compared to the widely used neighbor-joining (NJ) method, especially with short to medium length sequences. The findings suggest that quartet-based methods are unlikely to produce reliable phylogenetic trees with less than exponentially long sequences, emphasizing the need for initial comparisons to NJ before further exploration of new reconstruction methods.
ACM-SIAM Symposium on Discrete Algorithms, 2001
We present the results of a large-scale experimentalstudy of quartet-based methods (quartet cleaning andpuzzling) for phylogeny reconstruction. Our experimentsinclude a broad range of problem sizes and evolutionaryrates, and were carefully designed to yield statisticallyrobust results despite the size of the samplespace. We measure outcomes in terms of numbers ofedges of the true tree correctly inferred by each method(true positives). Our
2006
Abstract Recently we developed a new quartet-based algorithm for phylogenetic analysis [22]. This algorithm constructs a limited number of trees for a given set of DNA or protein sequences and the initial experimental results show that the probability for the correct tree to be included in this small set of trees is very high. In this paper we further extend the idea. We first discuss a revision to the original algorithm to reduce the number of trees generated, while keeping the high probability for the correct tree to be included.
2001
We present fast new algorithms for constructing phylogenetic trees from quartets (resolved trees on four leaves). The problem is central to divide-and-conquer approaches to phylogenetic analysis and has been receiving considerable attention from the computational biology community. Most formulations of the problem are NP-hard. Here we consider a number of constrained versions that have polynomial time solutions.
2005
In this paper we introduce a new quartet-based method. This method makes use of the Bayes (or quartet) weights of quartets as those used in the quartet puzzling. However, all the weights from the related quartets are accumulated to form a global quartet weight matrix. This matrix provides integrated information and can lead us to recursively merge small sub-trees to larger ones until the final single tree is obtained. The experimental results show that the probability for the correct tree to be among a very small number of trees constructed using our method is very high. These significant results open a new research direction to further investigate more efficient algorithms for phylogenetic inference. 1.
2007
The problem is to construct an optimal weight tree from the 3 n 4 weighted quartet topologies on n objects, where optimality means that the summed weight of the embedded quartet topologies is optimal (so it can be the case that the optimal tree embeds all quartets as nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized hill climbing, for approximating the optimal weight tree, given the quartet topology weights. The method repeatedly transforms a bifurcating tree, with all objects involved as leaves, achieving a monotonic approximation to the exact single globally optimal tree. The method has been extensively used, and is implemented and available, as part of the CompLearn package. We compare performance and running time with those of UPGMA, BioNJ, and NJ, as implemented in the SplitsTree package. Index Terms-evolutionary tree, global optimization, Monte Carlo method, quartet method, randomized hill-climbing, I. INTRODUCTION We present a quartet method for phylogenetic construction in biology, and more generally for hierarchical clustering in non-biological areas. It is a Monte Carlo method, as opposed to deterministic methods like local search. Our method is based on a novel fast randomized hill-climbing heuristic of a new global optimization criterion. The algorithm does not address the problem of how to obtain the quartet topology weights from sequence data [21], [27], Rudi Cilibrasi is with the Center for Mathematics and Computer Science (CWI). Address: CWI,
2007
Extending the idea of our previous algorithm [17, 18] we developed a new sequential quartet-based phylogenetic tree construction method. This new algorithm reconstructs the phylogenetic tree iteratively by examining at each merge step every possible super-quartet which is formed by four subtrees instead of simple quartet in our previous algorithm.
Lecture Notes in Computer Science, 1999
A critical step in all quartet methods for constructing evolutionary trees is the inference of the topology for each set of four sequences (i.e. quartet). It is a well-known fact that all quartet topology inference methods make mistakes that result in the incorrect inference of quartet topology. These mistakes are called quartet errors. In this paper, two efficient algorithms for correcting bounded numbers of quartet errors are presented. These "quartet cleaning" algorithms are shown to be optimal in that no algorithm can correct more quartet errors. An extensive simulation study reveals that sets of quartet topologies inferred by three popular methods (Neighbor Joining [15], Ordinal Quartet [14] and Maximum Parsimony [10]) almost always contain quartet errors and that a large portion of these quartet errors are corrected by the quartet cleaning algorithms.
Journal of Computational Biology, 2009
Despite the continued development of advanced algorithms for phylogeny reconstruction, the assessment of topological accuracy remains a challenging problem. New tools are needed to assist researchers in the prediction and evaluation of phylogenetic performance, particularly when short alignments are considered. We present a probabilistic analysis of quartet accuracy by the Four-Point-Method for the Jukes-Cantor model for nucleotide substitution, developing a sharp error estimate as a function of the quartet edge lengths and the number of nucleotide positions available. Our Multivariate Product (MVP) estimate offers significant improvements over existing bounds and performs well even for short sequence lengths.
Algorithms for Molecular Biology, 2011
Background: Supertree methods represent one of the major ways by which the Tree of Life can be estimated, but despite many recent algorithmic innovations, matrix representation with parsimony (MRP) remains the main algorithmic supertree method.
Proceedings of the third annual international conference on Computational molecular biology - RECOMB '99, 1999
We present fast new algorithms for phylogenetic reconstruction from distance data or weighted quartets. The methods are conservative-they will only return edges that are well supported by the input data. This approach is not only philosophically attractive; the conservative tree estimate can be used as a basis for further tree refinement or divide and conquer algorithms. The capability to process quartet data allows these algorithms to be used in tandem with ordinal or qualitative phylogenetic analysis methods. We provide algorithms for three standard conservative phylogenetic constructions: the Buneman tree, the Refined Buneman tree, and split decomposition. We introduce and exploit combinatorial formalisms involving trees, quartets, and splits, and make particular use of an attractive duality between unrooted trees, splits, and dissimilarities on one hand, and rooted trees, clusters, and similarity measures on the other. Using these techniques, we achieve O(n) improvements in the time complexity of the best previously published algorithms (where n is the number of studied species). Our algorithms will be included in the next edition of the popular Splitslkee software package.
Journal of Classification, 2010
We present a new distance based quartet method for phylogenetic tree reconstruction, called Minimum Tree Cost Quartet Puzzling. Starting from a distance matrix computed from natural data, the algorithm incrementally constructs a tree by adding one taxon at a time to the intermediary tree using a cost function based on the relaxed 4-point condition for weighting quartets. Different input orders of taxa lead to trees having distinct topologies which can be evaluated using a maximum likelihood or weighted least squares optimality criterion. Using reduced sets of quartets and a simple heuristic tree search strategy we obtain an overall complexity of O(n 5 log 2 n) for the algorithm. We evaluate the performances of the method through comparative tests and show that our method outperforms NJ when a weighted least squares optimality criterion is employed. We also discuss the theoretical boundaries of the algorithm.
2006
Abstract This paper describes a parallel implementation of our recently developed algorithm for phylogenetic analysis on the IBM BlueGene/L cluster. This algorithm constructs evolutionary trees for a given set of DNA or protein sequences based on the topological information of every possible quartet trees. Our experimental results showed that it has several advantages over many popular algorithms.
Molecular Biology and Evolution, 1999
A new quartet method is described for building phylogenetic trees, making use of a numerical measure of local inconsistency. For each quartet consisting of four species, the user chooses numbers indicating evidence for each of the three possible completely resolved trees. These numbers may be, for example, tree lengths or likelihoods. From these numbers, I describe how to measure the ''local inconsistency'' that results from placing a new species into a particular position in a phylogenetic tree. The best placements are those with low local inconsistency. A phylogenetic tree for a collection of taxa may be constructed by picking a random order of species and adding the species in this order, each time using the placement with the lowest local inconsistency. To summarize the results, one may select a majority-rule consensus tree or the tree most frequently obtained. Alternatively, taxa can be added in the order that maximizes the signal strength. Advantages of the method may include flexibility and better resolution. Studies are performed for artificial data sets for which long-branch attractions are a serious problem; comparisons show performance much superior to maximum parsimony and somewhat superior to quartet puzzling. A case study with real data also illustrates the method.
IEEE/ACM transactions on computational biology and bioinformatics, 2016
Quartet trees displayed by larger phylogenetic trees have long been used as inputs for species tree and supertree reconstruction. Computational constraints prevent the use of all displayed quartets in many practical problems with large numbers of taxa. We introduce the notion of an Efficient Quartet System (EQS) to represent a phylogenetic tree with a subset of the quartets displayed by the tree. We show mathematically that the set of quartets obtained from a tree via an EQS contains all of the combinatorial information of the tree itself. Using performance tests on simulated datasets, we also demonstrate that using an EQS to reduce the number of quartets in both summary method pipelines for species tree inference as well as methods for supertree inference results in only small reductions in accuracy.
Lecture Notes in Computer Science, 2002
The benefits of experimental algorithmics and algorithm engineering need to be extended to applications in the computational sciences. In this paper, we present on one such application: the reconstruction of evolutionary histories (phylogenies) from molecular data such as DNA sequences. Our presentation is not a survey of past and current work in the area, but rather a discussion of what we see as some of the important challenges in experimental algorithmics that arise from computational phylogenetics.
Molecular Biology and Evolution, 2001
From the DNA sequences for N taxa, the (generally unknown) phylogenetic tree T that gave rise to them is to be reconstructed. Various methods give rise, for each quartet J consisting of exactly four taxa, to a predicted tree L(J) based only on the sequences in J, and these are then used to reconstruct T. The author defines an ''error-correcting map'' (Ec), which replaces each L(J) with a new tree, Ec(L)(J), which has been corrected using other trees, L(K), in the list L. The ''quartet distance'' between two trees is defined as the number of quartets J on which the two trees differ, and two distinct trees are shown to always have quartet distance of at least N Ϫ 3. If L has quartet distance at most (N Ϫ 4)/2 from T, then Ec(L) will coincide with the correct list for T; and this result cannot be improved. In general, Ec can correct many more errors in L. Iteration of the map Ec may produce still more accurate lists. Simulations are reported which often show improvement even when the quartet distance considerably exceeds (N Ϫ 4)/2. Moreover, the Buneman tree for Ec(L) is shown to refine the Buneman tree for L, so that strongly supported edges for L remain strongly supported for Ec(L). Simulations show that if methods such as the C-tree or hypercleaning are applied to Ec(L), the resulting trees often have more resolution than when the methods are applied only to L. ''most'' choices of J and then corrects individual L(J)
New Achievements in Evolutionary Computation, 2010
Communications of The Korean Mathematical Society, 2010
Among the distance based algorithms in phylogenetic tree reconstruction, the neighbor-joining algorithm has been a widely used and effective method. We propose a new algorithm which counts the number of consistent quartets for cherry picking with tie breaking. We show that the success rate of the new algorithm is almost equal to that of neighbor-joining. This gives an explanation of the qualitative nature of neighbor-joining and that of dissimilarity maps from DNA sequence data. Moreover, the new algorithm always reconstructs correct trees from quartet consistent dissimilarity maps.
4OR, 2018
The minimum quartet tree cost (MQTC) problem is a graph combinatorial optimization problem where, given a set of n ≥ 4 data objects and their pairwise costs (or distances), one wants to construct an optimal tree from the 3 • n 4 quartet topologies on n, where optimality means that the sum of the costs of the embedded (or consistent) quartet topologies is minimal. The MQTC problem is the foundation of the quartet method of hierarchical clustering, a novel hierarchical clustering method for non tree-like (non-phylogeny) data in various domains, or for heterogeneous data across domains. The MQTC problem is NP-complete and some heuristics have been already proposed in the literature. The aim of this paper is to present a first exact solution approach for the MQTC problem. Although the algorithm is able to get exact solutions only for relatively small problem instances, due to the high problem complexity, it can be used as a benchmark for validating the performance of any heuristic proposed for the MQTC problem.
Molecular Biology and Evolution, 1998
The maximum-likelihood (ML) method for inferring molecular phylogenies (Felsenstein 1981) is being used extensively in the wide field of phylogenetics. The method has a sound statistical basis (e.g., Felsenstein 1983; Goldman 1990; Yang 1994) and has proved to be powerful in recovering correct tree topologies in computer simulation studies (e.g.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.