1
|
Smith MR. Using information theory to detect rogue taxa and improve consensus trees. Syst Biol 2021; 71:1088-1094. [PMID: 34951650 PMCID: PMC9366444 DOI: 10.1093/sysbio/syab099] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 11/29/2021] [Accepted: 12/17/2021] [Indexed: 11/28/2022] Open
Abstract
“Rogue” taxa of uncertain affinity can confound attempts to summarize the results of phylogenetic analyses. Rogues reduce resolution and support values in consensus trees, potentially obscuring strong evidence for relationships between other taxa. Information theory provides a principled means of assessing the congruence between a set of trees and their consensus, allowing rogue taxa to be identified more effectively than when using ad hoc measures of tree quality. A basic implementation of this approach in R recovers reduced consensus trees that are better resolved, more accurate, and more informative than those generated by existing methods. [Consensus trees; information theory; phylogenetic software; Rogue taxa.]
Collapse
Affiliation(s)
- Martin R Smith
- Department of Earth Sciences, Durham University, Lower Mountjoy, Durham, DH1 3LE, UK
| |
Collapse
|
2
|
Affiliation(s)
- Mark Wilkinson
- Department of Life Sciences, The Natural History Museum, London SW7 5BD, UK
| | - Marco Crotti
- Department of Life Sciences, The Natural History Museum, London SW7 5BD, UK
| |
Collapse
|
3
|
Akanni WA, Wilkinson M, Creevey CJ, Foster PG, Pisani D. Implementing and testing Bayesian and maximum-likelihood supertree methods in phylogenetics. ROYAL SOCIETY OPEN SCIENCE 2015; 2:140436. [PMID: 26361544 PMCID: PMC4555849 DOI: 10.1098/rsos.140436] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2014] [Accepted: 07/06/2015] [Indexed: 05/14/2023]
Abstract
Since their advent, supertrees have been increasingly used in large-scale evolutionary studies requiring a phylogenetic framework and substantial efforts have been devoted to developing a wide variety of supertree methods (SMs). Recent advances in supertree theory have allowed the implementation of maximum likelihood (ML) and Bayesian SMs, based on using an exponential distribution to model incongruence between input trees and the supertree. Such approaches are expected to have advantages over commonly used non-parametric SMs, e.g. matrix representation with parsimony (MRP). We investigated new implementations of ML and Bayesian SMs and compared these with some currently available alternative approaches. Comparisons include hypothetical examples previously used to investigate biases of SMs with respect to input tree shape and size, and empirical studies based either on trees harvested from the literature or on trees inferred from phylogenomic scale data. Our results provide no evidence of size or shape biases and demonstrate that the Bayesian method is a viable alternative to MRP and other non-parametric methods. Computation of input tree likelihoods allows the adoption of standard tests of tree topologies (e.g. the approximately unbiased test). The Bayesian approach is particularly useful in providing support values for supertree clades in the form of posterior probabilities.
Collapse
Affiliation(s)
- Wasiu A. Akanni
- Department of Biology, The National University of Ireland, Maynooth, Co. Kildare, Republic of Ireland
- Department of Life Science, The Natural History Museum, London SW7 5BD, UK
| | - Mark Wilkinson
- Department of Life Science, The Natural History Museum, London SW7 5BD, UK
| | - Christopher J. Creevey
- Institute of Biological, Environmental and Rural Sciences (IBERS), Aberystwyth University, Aberystwyth, Ceredigion SY23 3FG, UK
| | - Peter G. Foster
- Department of Life Science, The Natural History Museum, London SW7 5BD, UK
| | - Davide Pisani
- School of Biological Sciences and School of Earth Sciences, University of Bristol, Life Sciences Building, 24 Tyndall Avenue, Bristol BS8 1TG, UK
- Author for correspondence: Davide Pisani e-mail:
| |
Collapse
|
4
|
Akanni WA, Creevey CJ, Wilkinson M, Pisani D. L.U.St: a tool for approximated maximum likelihood supertree reconstruction. BMC Bioinformatics 2014; 15:183. [PMID: 24925766 PMCID: PMC4073192 DOI: 10.1186/1471-2105-15-183] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2014] [Accepted: 06/02/2014] [Indexed: 12/29/2022] Open
Abstract
Background Supertrees combine disparate, partially overlapping trees to generate a synthesis that provides a high level perspective that cannot be attained from the inspection of individual phylogenies. Supertrees can be seen as meta-analytical tools that can be used to make inferences based on results of previous scientific studies. Their meta-analytical application has increased in popularity since it was realised that the power of statistical tests for the study of evolutionary trends critically depends on the use of taxon-dense phylogenies. Further to that, supertrees have found applications in phylogenomics where they are used to combine gene trees and recover species phylogenies based on genome-scale data sets. Results Here, we present the L.U.St package, a python tool for approximate maximum likelihood supertree inference and illustrate its application using a genomic data set for the placental mammals. L.U.St allows the calculation of the approximate likelihood of a supertree, given a set of input trees, performs heuristic searches to look for the supertree of highest likelihood, and performs statistical tests of two or more supertrees. To this end, L.U.St implements a winning sites test allowing ranking of a collection of a-priori selected hypotheses, given as a collection of input supertree topologies. It also outputs a file of input-tree-wise likelihood scores that can be used as input to CONSEL for calculation of standard tests of two trees (e.g. Kishino-Hasegawa, Shimidoara-Hasegawa and Approximately Unbiased tests). Conclusion This is the first fully parametric implementation of a supertree method, it has clearly understood properties, and provides several advantages over currently available supertree approaches. It is easy to implement and works on any platform that has python installed. Availability: bitBucket page - https://afro-juju@bitbucket.org/afro-juju/l.u.st.git. Contact: Davide.Pisani@bristol.ac.uk.
Collapse
Affiliation(s)
| | | | | | - Davide Pisani
- Department of Biology, The National University of Ireland, Maynooth, Maynooth, Kildare, Ireland.
| |
Collapse
|
5
|
Berry V, Bininda-Emonds ORP, Semple C. Amalgamating source trees with different taxonomic levels. Syst Biol 2012. [PMID: 23179602 DOI: 10.1093/sysbio/sys090] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Supertree methods combine a collection of source trees into a single parent tree or supertree. For almost all such methods, the terminal taxa across the source trees have to be non-nested for the output supertree to make sense. Motivated by Page, the first supertree method for combining rooted source trees where the taxa can be hierarchically nested is called AncestralBuild. In addition to taxa labeling the leaves, this method allows the rooted source trees to have taxa labeling some of the interior nodes at a higher taxonomic level than their descendants (e.g., genera vs. species). However, the utility of AncestralBuild is somewhat restricted as it is mostly intended to decide if a collection of rooted source trees is compatible. If the initial collection is not compatible, then no tree is returned. To overcome this restriction, we introduce here the MultiLevelSupertree (MLS) supertree method whose input is the same as that for AncestralBuild, but which accommodates incompatibilities among rooted source trees using a MinCut-like procedure. We show that MLS has several desirable properties including the preservation of common subtrees among the source trees, the preservation of ancestral relationships whenever they are compatible, as well as running in polynomial time. Furthermore, application to a small test data set (the mammalian carnivore family Phocidae) indicates that the method correctly places nested taxa at different taxonomic levels (reflecting vertical signal), even in cases where the input trees harbor a significant level of conflict between their clades (i.e., in their horizontal signal).
Collapse
Affiliation(s)
- Vincent Berry
- Méthodes et Algorithmes pour la Bioinformatique MAB team, Université Montpellier 2, L.I.R.M.M. - C.N.R.S., 161 rue Ada, 34095 Montpellier Cedex 5, France
| | | | | |
Collapse
|
6
|
Lin HT, Burleigh JG, Eulenstein O. Consensus properties for the deep coalescence problem and their application for scalable tree search. BMC Bioinformatics 2012; 13 Suppl 10:S12. [PMID: 22759417 PMCID: PMC3382448 DOI: 10.1186/1471-2105-13-s10-s12] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Background To infer a species phylogeny from unlinked genes, phylogenetic inference methods must confront the biological processes that create incongruence between gene trees and the species phylogeny. Intra-specific gene variation in ancestral species can result in deep coalescence, also known as incomplete lineage sorting, which creates incongruence between gene trees and the species tree. One approach to account for deep coalescence in phylogenetic analyses is the deep coalescence problem, which takes a collection of gene trees and seeks the species tree that implies the fewest deep coalescence events. Although this approach is promising for phylogenetics, the consensus properties of this problem are mostly unknown and analyses of large data sets may be computationally prohibitive. Results We prove that the deep coalescence consensus tree problem satisfies the highly desirable Pareto property for clusters (clades). That is, in all instances, each cluster that is present in all of the input gene trees, called a consensus cluster, will also be found in every optimal solution. Moreover, we introduce a new divide and conquer method for the deep coalescence problem based on the Pareto property. This method refines the strict consensus of the input gene trees, thereby, in practice, often greatly reducing the complexity of the tree search and guaranteeing that the estimated species tree will satisfy the Pareto property. Conclusions Analyses of both simulated and empirical data sets demonstrate that the divide and conquer method can greatly improve upon the speed of heuristics that do not consider the Pareto consensus property, while also guaranteeing that the proposed solution fulfills the Pareto property. The divide and conquer method extends the utility of the deep coalescence problem to data sets with enormous numbers of taxa.
Collapse
Affiliation(s)
- Harris T Lin
- Department of Computer Science, Iowa State University, Ames, IA, USA
| | | | | |
Collapse
|
7
|
Swenson MS, Suri R, Linder CR, Warnow T. SuperFine: Fast and Accurate Supertree Estimation. Syst Biol 2011; 61:214-27. [DOI: 10.1093/sysbio/syr092] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- M. Shel Swenson
- Department of Computer Science, The University of Texas at Austin, Austin, TX, USA
| | - Rahul Suri
- Department of Computer Science, The University of Texas at Austin, Austin, TX, USA
| | - C. Randal Linder
- Section of Integrative Biology, School of Biological Sciences, The University of Texas at Austin, Austin, TX, USA
| | - Tandy Warnow
- Department of Computer Science, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
8
|
Kupczok A. Consequences of different null models on the tree shape bias of supertree methods. Syst Biol 2011; 60:218-25. [PMID: 21252387 DOI: 10.1093/sysbio/syq086] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Anne Kupczok
- Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, University of Vienna, Medical University of Vienna, University of Veterinary Medicine Vienna, Dr. Bohr-Gasse 9, A-1030 Vienna, Austria.
| |
Collapse
|
9
|
Affiliation(s)
- F R McMorris
- Department of Applied Mathematics, Illinois Institute of Technology, Chicago, IL 60616-3793, USA.
| | | |
Collapse
|
10
|
Baker WJ, Savolainen V, Asmussen-Lange CB, Chase MW, Dransfield J, Forest F, Harley MM, Uhl NW, Wilkinson M. Complete Generic-Level Phylogenetic Analyses of Palms (Arecaceae) with Comparisons of Supertree and Supermatrix Approaches. Syst Biol 2009; 58:240-56. [PMID: 20525581 DOI: 10.1093/sysbio/syp021] [Citation(s) in RCA: 152] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
| | - Vincent Savolainen
- Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AB, UK
- Imperial College London, Silwood Park Campus, Buckhurst Road, Ascot, Berkshire SL5 7PY, UK
| | - Conny B. Asmussen-Lange
- Department of Ecology, University of Copenhagen, Rolighedsvej 21, DK-1958 Frederiksberg C, Denmark
| | - Mark W. Chase
- Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AB, UK
| | | | - Félix Forest
- Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AB, UK
| | | | - Natalie W. Uhl
- Department of Plant Biology, Cornell University, 412 Mann Library Building, Ithaca, NY 14853, USA
| | - Mark Wilkinson
- Department of Zoology, Natural History Museum, Cromwell Road, London SW7 5BD, UK
| |
Collapse
|
11
|
Willson SJ. Robustness of topological supertree methods for reconciling dense incompatible data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2009; 6:62-75. [PMID: 19179699 DOI: 10.1109/tcbb.2008.51] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Given a collection of rooted phylogenetic trees with overlapping sets of leaves, a compatible supertree S is a single tree whose set of leaves is the union of the input sets of leaves and such that $S$ agrees with each input tree when restricted to the leaves of the input tree. Typically with trees from real data, no compatible supertree exists, and various methods may be utilized to reconcile the incompatibilities in the input trees. This paper focuses on a measure of robustness of a supertree method called its "radius" R. The larger the value of R is, the further the data set can be from a natural correct tree T and yet the method will still output T. It is shown that the maximal possible radius for a method is R = 1/2. Many familiar methods, both for supertrees and consensus trees, are shown to have R = 0, indicating that they need not output a tree T that would seem to be the natural correct answer. A polynomial-time method Normalized Triplet Supertree (NTS) with the maximal possible R = 1/2 is defined. A geometric interpretation is given, and NTS is shown to solve an optimization problem. Additional properties of NTS are described.
Collapse
Affiliation(s)
- Stephen J Willson
- Department of Mathematics, Iowa State University, Ames, IA 50011, USA.
| |
Collapse
|
12
|
Ranwez V, Berry V, Criscuolo A, Fabre PH, Guillemot S, Scornavacca C, Douzery EJP. PhySIC: A Veto Supertree Method with Desirable Properties. Syst Biol 2007; 56:798-817. [PMID: 17918032 DOI: 10.1080/10635150701639754] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open
Affiliation(s)
- Vincent Ranwez
- Institut des Sciences de l'Evolution (ISEM, UMR 5554 CNRS), Université Montpellier II Place E. Bataillon, CC 064, 34095, Montpellier, Cedex 5, France E-mail:
| | - Vincent Berry
- Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM,UMR 5506, CNRS), Université Montpellier II 161 rue Ada, 34392, Montpellier, Cedex 5, France
| | - Alexis Criscuolo
- Institut des Sciences de l'Evolution (ISEM, UMR 5554 CNRS), Université Montpellier II Place E. Bataillon, CC 064, 34095, Montpellier, Cedex 5, France E-mail:
- Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM,UMR 5506, CNRS), Université Montpellier II 161 rue Ada, 34392, Montpellier, Cedex 5, France
| | - Pierre-Henri Fabre
- Institut des Sciences de l'Evolution (ISEM, UMR 5554 CNRS), Université Montpellier II Place E. Bataillon, CC 064, 34095, Montpellier, Cedex 5, France E-mail:
| | - Sylvain Guillemot
- Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM,UMR 5506, CNRS), Université Montpellier II 161 rue Ada, 34392, Montpellier, Cedex 5, France
| | - Celine Scornavacca
- Institut des Sciences de l'Evolution (ISEM, UMR 5554 CNRS), Université Montpellier II Place E. Bataillon, CC 064, 34095, Montpellier, Cedex 5, France E-mail:
- Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM,UMR 5506, CNRS), Université Montpellier II 161 rue Ada, 34392, Montpellier, Cedex 5, France
| | - Emmanuel J. P. Douzery
- Institut des Sciences de l'Evolution (ISEM, UMR 5554 CNRS), Université Montpellier II Place E. Bataillon, CC 064, 34095, Montpellier, Cedex 5, France E-mail:
| |
Collapse
|
13
|
Wilkinson M, Cotton JA, Lapointe FJ, Pisani D. Properties of supertree methods in the consensus setting. Syst Biol 2007; 56:330-7. [PMID: 17464887 DOI: 10.1080/10635150701245370] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Affiliation(s)
- Mark Wilkinson
- Department of Zoology, The Natural History Museum, London, SW7 5BD, UK.
| | | | | | | |
Collapse
|
14
|
Cotton JA, Slater CSC, Wilkinson M. Discriminating supported and unsupported relationships in supertrees using triplets. Syst Biol 2006; 55:345-50. [PMID: 16611604 DOI: 10.1080/10635150500481556] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
Affiliation(s)
- James A Cotton
- Department of Zoology, The Natural History Museum, Cromwell Road, SW7 5BD, London, UK.
| | | | | |
Collapse
|
15
|
Wilkinson M, Pisani D, Cotton JA, Corfe I. Measuring support and finding unsupported relationships in supertrees. Syst Biol 2006; 54:823-31. [PMID: 16243766 DOI: 10.1080/10635150590950362] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
Affiliation(s)
- Mark Wilkinson
- Department of Zoology, The Natural History Museum, London SW7 5BD, UK.
| | | | | | | |
Collapse
|
16
|
Affiliation(s)
- Olaf R P Bininda-Emonds
- Lehrstuhl für Tierzucht, Technical University of Munich, Hochfeldweg 1, 85354 Freising-Weihenstephan, Germany.
| | | | | |
Collapse
|
17
|
Wilkinson M, Cotton JA, Creevey C, Eulenstein O, Harris SR, Lapointe FJ, Levasseur C, McInerney JO, Pisani D, Thorley JL. The Shape of Supertrees to Come: Tree Shape Related Properties of Fourteen Supertree Methods. Syst Biol 2005; 54:419-31. [PMID: 16012108 DOI: 10.1080/10635150590949832] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
Abstract
Using a simple example and simulations, we explore the impact of input tree shape upon a broad range of supertree methods. We find that input tree shape can affect how conflict is resolved by several supertree methods and that input tree shape effects may be substantial. Standard and irreversible matrix representation with parsimony (MRP), MinFlip, duplication-only Gene Tree Parsimony (GTP), and an implementation of the average consensus method have a tendency to resolve conflict in favor of relationships in unbalanced trees. Purvis MRP and the average dendrogram method appear to have an opposite tendency. Biases with respect to tree shape are correlated with objective functions that are based upon unusual asymmetric tree-to-tree distance or fit measures. Split, quartet, and triplet fit, most similar supertree, and MinCut methods (provided the latter are interpreted as Adams consensus-like supertrees) are not revealed to have any bias with respect to tree shape by our example, but whether this holds more generally is an open problem. Future development and evaluation of supertree methods should consider explicitly the undesirable biases and other properties that we highlight. In the meantime, use of a single, arbitrarily chosen supertree method is discouraged. Use of multiple methods and/or weighting schemes may allow practical assessment of the extent to which inferences from real data depend upon methodological biases with respect to input tree shape or size.
Collapse
Affiliation(s)
- Mark Wilkinson
- Department of Zoology, The Natural History Museum, London SW7 5BD, United Kingdom.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Wilkinson M, Cotton J, Thorley J. The Information Content of Trees and Their Matrix Representations. Syst Biol 2004; 53:989-1001. [PMID: 15764566 DOI: 10.1080/10635150490522737] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
Affiliation(s)
- Mark Wilkinson
- Department of Zoology, Natural History Museum, London SW7 5BD, United Kingdom
| | | | | |
Collapse
|
19
|
|
20
|
|
21
|
|
22
|
Using Supertrees to Investigate Species Richness in Grasses and Flowering Plants. COMPUTATIONAL BIOLOGY 2004. [DOI: 10.1007/978-1-4020-2330-9_22] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
|
23
|
|
24
|
|
25
|
|