Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2005
In this paper we introduce a new quartet-based method. This method makes use of the Bayes (or quartet) weights of quartets as those used in the quartet puzzling. However, all the weights from the related quartets are accumulated to form a global quartet weight matrix. This matrix provides integrated information and can lead us to recursively merge small sub-trees to larger ones until the final single tree is obtained. The experimental results show that the probability for the correct tree to be among a very small number of trees constructed using our method is very high. These significant results open a new research direction to further investigate more efficient algorithms for phylogenetic inference. 1.
ACM-SIAM Symposium on Discrete Algorithms, 2001
We present the results of a large-scale experimentalstudy of quartet-based methods (quartet cleaning andpuzzling) for phylogeny reconstruction. Our experimentsinclude a broad range of problem sizes and evolutionaryrates, and were carefully designed to yield statisticallyrobust results despite the size of the samplespace. We measure outcomes in terms of numbers ofedges of the true tree correctly inferred by each method(true positives). Our
Molecular Biology and Evolution, 1999
A new quartet method is described for building phylogenetic trees, making use of a numerical measure of local inconsistency. For each quartet consisting of four species, the user chooses numbers indicating evidence for each of the three possible completely resolved trees. These numbers may be, for example, tree lengths or likelihoods. From these numbers, I describe how to measure the ''local inconsistency'' that results from placing a new species into a particular position in a phylogenetic tree. The best placements are those with low local inconsistency. A phylogenetic tree for a collection of taxa may be constructed by picking a random order of species and adding the species in this order, each time using the placement with the lowest local inconsistency. To summarize the results, one may select a majority-rule consensus tree or the tree most frequently obtained. Alternatively, taxa can be added in the order that maximizes the signal strength. Advantages of the method may include flexibility and better resolution. Studies are performed for artificial data sets for which long-branch attractions are a serious problem; comparisons show performance much superior to maximum parsimony and somewhat superior to quartet puzzling. A case study with real data also illustrates the method.
Molecular Biology and Evolution, 2001
From the DNA sequences for N taxa, the (generally unknown) phylogenetic tree T that gave rise to them is to be reconstructed. Various methods give rise, for each quartet J consisting of exactly four taxa, to a predicted tree L(J) based only on the sequences in J, and these are then used to reconstruct T. The author defines an ''error-correcting map'' (Ec), which replaces each L(J) with a new tree, Ec(L)(J), which has been corrected using other trees, L(K), in the list L. The ''quartet distance'' between two trees is defined as the number of quartets J on which the two trees differ, and two distinct trees are shown to always have quartet distance of at least N Ϫ 3. If L has quartet distance at most (N Ϫ 4)/2 from T, then Ec(L) will coincide with the correct list for T; and this result cannot be improved. In general, Ec can correct many more errors in L. Iteration of the map Ec may produce still more accurate lists. Simulations are reported which often show improvement even when the quartet distance considerably exceeds (N Ϫ 4)/2. Moreover, the Buneman tree for Ec(L) is shown to refine the Buneman tree for L, so that strongly supported edges for L remain strongly supported for Ec(L). Simulations show that if methods such as the C-tree or hypercleaning are applied to Ec(L), the resulting trees often have more resolution than when the methods are applied only to L. ''most'' choices of J and then corrects individual L(J)
2009
Abstract-In the past research efforts on computational phylogenetic analysis were dedicated to the design of heuristics which can quickly find near-optimal trees under a specific optimization criterion. However, all criteria are over-simplified and cannot realistically model the real evolution process. Thus all existing algorithms for phylogenetic analysis have their limitations. It has become a serious issue for many important real-life applications which often demand accurate results from phylogenetic analysis.
2006
Abstract Recently we developed a new quartet-based algorithm for phylogenetic analysis [22]. This algorithm constructs a limited number of trees for a given set of DNA or protein sequences and the initial experimental results show that the probability for the correct tree to be included in this small set of trees is very high. In this paper we further extend the idea. We first discuss a revision to the original algorithm to reduce the number of trees generated, while keeping the high probability for the correct tree to be included.
Communications of The Korean Mathematical Society, 2010
Among the distance based algorithms in phylogenetic tree reconstruction, the neighbor-joining algorithm has been a widely used and effective method. We propose a new algorithm which counts the number of consistent quartets for cherry picking with tie breaking. We show that the success rate of the new algorithm is almost equal to that of neighbor-joining. This gives an explanation of the qualitative nature of neighbor-joining and that of dissimilarity maps from DNA sequence data. Moreover, the new algorithm always reconstructs correct trees from quartet consistent dissimilarity maps.
American Journal of Bioinformatics Research, 2012
Phylogenetics enables us to use various techniques to extract evolutionary relationships from sequence analysis. Most of the phylogenetic analysis techniques produce phylogenetic trees that represent relationship between any set of species or their evolutionary history. This article presents a comprehensive survey of the applications and the algorithms for inference of huge phylogenetic trees and also gives the reader an overview of the methods currently employed for the inference of phylogenetic trees. A comprehensive comparison of the methods and algorithms is presented in this paper.
2006
Abstract This paper describes a parallel implementation of our recently developed algorithm for phylogenetic analysis on the IBM BlueGene/L cluster. This algorithm constructs evolutionary trees for a given set of DNA or protein sequences based on the topological information of every possible quartet trees. Our experimental results showed that it has several advantages over many popular algorithms.
2007
Extending the idea of our previous algorithm [17, 18] we developed a new sequential quartet-based phylogenetic tree construction method. This new algorithm reconstructs the phylogenetic tree iteratively by examining at each merge step every possible super-quartet which is formed by four subtrees instead of simple quartet in our previous algorithm.
1999
Abstract We present fast new algorithms for phylogenetic reconstruction from distance data or weighted quartets. The methods are conservative-they will only return edges that are well supported by the input data. This approach is not only philosophically attractive; the conservative tree estimate can be used as a basis for further tree refinement or divide and conquer algorithms. The capability to process quartet data allows these algorithms to be used in tandem with ordinal or qualitative phylogenetic analysis methods.
Molecular Phylogenetics and Evolution, 2009
We review recent models to estimate phylogenetic trees under the multispecies coalescent. Although the distinction between gene trees and species trees has come to the fore of phylogenetics, only recently have methods been developed that explicitly estimate species trees. Of the several factors that can cause gene tree heterogeneity and discordance with the species tree, deep coalescence due to random genetic drift in branches of the species tree has been modeled most thoroughly. Bayesian approaches to estimating species trees utilizes two likelihood functions, one of which has been widely used in traditional phylogenetics and involves the model of nucleotide substitution, and the second of which is less familiar to phylogeneticists and involves the probability distribution of gene trees given a species tree. Other recent parametric and nonparametric methods for estimating species trees involve parsimony criteria, summary statistics, supertree and consensus methods. Species tree approaches are an appropriate goal for systematics, appear to work well in some cases where concatenation can be misleading, and suggest that sampling many independent loci will be paramount. Such methods can also be challenging to implement because of the complexity of the models and computational time. In addition, further elaboration of the simplest of coalescent models will be required to incorporate commonly known issues such as deviation from the molecular clock, gene flow and other genetic forces.
Molecular Biology and Evolution, 1995
We used simulated data to investigate a number of properties of maximum-likelihood (ML) phylogenetic tree estimation for the case of four taxa. Simulated data were generated under a broad range of conditions, including wide variation in branch lengths, differences in the ratio of transition and transversion substitutions, and the absence or presence of gamma-distributed site-to-site rate variation. Data were analyzed in the ML framework with two different substitution models, and we compared the ability of the two models to reconstruct the correct topology. Although both models were inconsistent for some branch-length combinations in the presence of siteto-site variation, they models were efficient predictors of topology under most simulation conditions. We also examined the performance of the likelihood ratio (LR) test for significant positive interior branch length. This test was found to be misleading under many simulation conditions, rejecting too otten under some simulation conditions. Under the null hypothesis of zero length internal branch, LR statistics are assumed to be asymptotically distributed XT; with limited data, the distribution of LR statistics under the null hypothesis varies from x!.
2009
We review phylogenetic inference methods with a special emphasis on inference from molecular data. We begin with a general comment on phylogenetic inference using DNA sequences, followed by a clear statement of the relevance of a good alignment of sequences. Then we provide a general description of models of sequence evolution, including evolutionary models that account for rate heterogeneity along the DNA sequences or complex secondary structure (i.e., ribosomal genes). We then present an overall description of the most relevant inference methods, focusing on key concepts of general interest. We point out the most relevant traits of methods such as maximum parsimony (MP), distance methods, maximum likelihood (ML) and Bayesian inference (BI). Finally, we discuss different measures of support for the estimated phylogeny and discuss how this relates to confidence in particular nodes of a phylogeny reconstruction.
Journal of Computational Biology, 2009
Despite the continued development of advanced algorithms for phylogeny reconstruction, the assessment of topological accuracy remains a challenging problem. New tools are needed to assist researchers in the prediction and evaluation of phylogenetic performance, particularly when short alignments are considered. We present a probabilistic analysis of quartet accuracy by the Four-Point-Method for the Jukes-Cantor model for nucleotide substitution, developing a sharp error estimate as a function of the quartet edge lengths and the number of nucleotide positions available. Our Multivariate Product (MVP) estimate offers significant improvements over existing bounds and performs well even for short sequence lengths.
Molecular Biology and Evolution, 2006
We present QNet, a method for constructing split networks from weighted quartet trees. QNet can be viewed as a quartet analogue of the distance-based Neighbor-Net (NNet) method for network construction. Just as NNet, QNet works by agglomeratively computing a collection of circular weighted splits of the taxa set which is subsequently represented by a planar split network. To illustrate the applicability of QNet, we apply it to a previously published Salmonella data set. We conclude that QNet can provide a useful alternative to NNet if distance data are not available or a character-based approach is preferred. Moreover, it can be used as an aid for determining when a quartet-based tree-building method may or may not be appropriate for a given data set. QNet is freely available for download.
Journal of Molecular Evolution, 1991
The efficiency of obtaining the correct tree by the maximum likelihood method (Felsenstein 1981) for inferring trees from DNA sequence data was compared with trees obtained by distance methods. It was shown that the maximum likelihood method is superior to distance methods in the efficiency particularly when the evolutionary rate differs among lineages.
BMC Bioinformatics, 2012
Background: The frequent exchange of genetic material among prokaryotes means that extracting a majority or plurality phylogenetic signal from many gene families, and the identification of gene families that are in significant conflict with the plurality signal is a frequent task in comparative genomics, and especially in phylogenomic analyses. Decomposition of gene trees into embedded quartets (unrooted trees each with four taxa) is a convenient and statistically powerful technique to address this challenging problem. This approach was shown to be useful in several studies of completely sequenced microbial genomes. Results: We present here a web server that takes a collection of gene phylogenies, decomposes them into quartets, generates a Quartet Spectrum, and draws a split network. Users are also provided with various data download options for further analyses. Each gene phylogeny is to be represented by an assessment of phylogenetic information content, such as sets of trees reconstructed from bootstrap replicates or sampled from a posterior distribution. The Quartet Decomposition server is accessible at . Conclusions: The Quartet Decomposition server presented here provides a convenient means to perform Quartet Decomposition analyses and will empower users to find statistically supported phylogenetic conflicts.
Systematic Biology, 1993
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.