1
|
Berling L, Klawitter J, Bouckaert R, Xie D, Gavryushkin A, Drummond AJ. Accurate Bayesian phylogenetic point estimation using a tree distribution parameterized by clade probabilities. PLoS Comput Biol 2025; 21:e1012789. [PMID: 39937844 PMCID: PMC11835378 DOI: 10.1371/journal.pcbi.1012789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 02/18/2025] [Accepted: 01/13/2025] [Indexed: 02/14/2025] Open
Abstract
Bayesian phylogenetic analysis with MCMC algorithms generates an estimate of the posterior distribution of phylogenetic trees in the form of a sample of phylogenetic trees and related parameters. The high dimensionality and non-Euclidean nature of tree space complicates summarizing the central tendency and variance of the posterior distribution in tree space. Here we introduce a new tractable tree distribution and associated point estimator that can be constructed from a posterior sample of trees. Through simulation studies we show that this point estimator performs at least as well and often better than standard methods of producing Bayesian posterior summary trees. We also show that the method of summary that performs best depends on the sample size and dimensionality of the problem in non-trivial ways.
Collapse
Affiliation(s)
- Lars Berling
- School of Mathematics and Statistics, University of Canterbury, Aotearoa, New Zealand
- Biomathematics Research Centre, University of Canterbury, Aotearoa, New Zealand
| | - Jonathan Klawitter
- Centre for Computational Evolution, University of Auckland, Aotearoa, New Zealand
| | - Remco Bouckaert
- Centre for Computational Evolution, University of Auckland, Aotearoa, New Zealand
| | - Dong Xie
- Centre for Computational Evolution, University of Auckland, Aotearoa, New Zealand
| | - Alex Gavryushkin
- School of Mathematics and Statistics, University of Canterbury, Aotearoa, New Zealand
- Biomathematics Research Centre, University of Canterbury, Aotearoa, New Zealand
| | - Alexei J Drummond
- Centre for Computational Evolution, University of Auckland, Aotearoa, New Zealand
| |
Collapse
|
2
|
Delaye L, Román-Padilla L. Untangling the Evolution of the Receptor-Binding Motif of SARS-CoV-2. J Mol Evol 2024; 92:329-337. [PMID: 38777906 PMCID: PMC11168982 DOI: 10.1007/s00239-024-10175-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 05/04/2024] [Indexed: 05/25/2024]
Abstract
The spike protein determines the host-range specificity of coronaviruses. In particular, the Receptor-Binding Motif in the spike protein from SARS-CoV-2 contains the amino acids involved in molecular recognition of the host Angiotensin Converting Enzyme 2. Therefore, to understand how SARS-CoV-2 acquired its capacity to infect humans it is necessary to reconstruct the evolution of this important motif. Early during the pandemic, it was proposed that the SARS-CoV-2 Receptor-Binding Domain was acquired via recombination with a pangolin infecting coronavirus. This proposal was challenged by an alternative explanation that suggested that the Receptor-Binding Domain from SARS-CoV-2 did not originated via recombination with a coronavirus from a pangolin. Instead, this alternative hypothesis proposed that the Receptor-Binding Motif from the bat coronavirus RaTG13, was acquired via recombination with an unidentified coronavirus. And as a consequence of this event, the Receptor-Binding Domain from the pangolin coronavirus appeared as phylogenetically closer to SARS-CoV-2. Recently, the genomes from coronaviruses from Cambodia (bat_RShST182/200) and Laos (BANAL-20-52/103/247) which are closely related to SARS-CoV-2 were reported. However, no detailed analysis of the evolution of the Receptor-Binding Motif from these coronaviruses was reported. Here we revisit the evolution of the Receptor-Binding Domain and Motif in the light of the novel coronavirus genome sequences. Specifically, we wanted to test whether the above coronaviruses from Cambodia and Laos were the source of the Receptor-Binding Domain from RaTG13. We found that the Receptor-Binding Motif from these coronaviruses is phylogenetically closer to SARS-CoV-2 than to RaTG13. Therefore, the source of the Receptor-Binding Domain from RaTG13 is still unidentified. In accordance with previous studies, our results are consistent with the hypothesis that the Receptor-Binding Motif from SARS-CoV-2 evolved by vertical inheritance from a bat-infecting population of coronaviruses.
Collapse
Affiliation(s)
- Luis Delaye
- Departamento de Ingeniería Genética, Cinvestav Unidad Irapuato, Km 9.6 Libramiento Norte Carretera Irapuato-León, C.P. 36824, Irapuato, Gto., Mexico.
| | - Lizbeth Román-Padilla
- Departamento de Ingeniería Genética, Cinvestav Unidad Irapuato, Km 9.6 Libramiento Norte Carretera Irapuato-León, C.P. 36824, Irapuato, Gto., Mexico
| |
Collapse
|
3
|
Bouckaert RR. Variational Bayesian phylogenies through matrix representation of tree space. PeerJ 2024; 12:e17276. [PMID: 38699195 PMCID: PMC11064865 DOI: 10.7717/peerj.17276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Accepted: 04/01/2024] [Indexed: 05/05/2024] Open
Abstract
In this article, we study the distance matrix as a representation of a phylogeny by way of hierarchical clustering. By defining a multivariate normal distribution on (a subset of) the entries in a matrix, this allows us to represent a distribution over rooted time trees. Here, we demonstrate tree distributions can be represented accurately this way for a number of published tree distributions. Though such a representation does not map to unique trees, restriction to a subspace, in particular one we call a "cube", makes the representation bijective at the cost of not being able to represent all possible trees. We introduce an algorithm "cubeVB" specifically for cubes and show through well calibrated simulation study that it is possible to recover parameters of interest like tree height and length. Although a cube cannot represent all of tree space, it is a great improvement over a single summary tree, and it opens up exciting new opportunities for scaling up Bayesian phylogenetic inference. We also demonstrate how to use a matrix representation of a tree distribution to get better summary trees than commonly used maximum clade credibility trees. An open source implementation of the cubeVB algorithm is available from https://github.com/rbouckaert/cubevb as the cubevb package for BEAST 2.
Collapse
Affiliation(s)
- Remco R. Bouckaert
- School of Computer Science, University of Auckland, Auckland, New Zealand
| |
Collapse
|
4
|
Pardo-De la Hoz CJ, Magain N, Piatkowski B, Cornet L, Dal Forno M, Carbone I, Miadlikowska J, Lutzoni F. Ancient Rapid Radiation Explains Most Conflicts Among Gene Trees and Well-Supported Phylogenomic Trees of Nostocalean Cyanobacteria. Syst Biol 2023; 72:694-712. [PMID: 36827095 DOI: 10.1093/sysbio/syad008] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 02/12/2023] [Accepted: 02/22/2023] [Indexed: 02/25/2023] Open
Abstract
Prokaryotic genomes are often considered to be mosaics of genes that do not necessarily share the same evolutionary history due to widespread horizontal gene transfers (HGTs). Consequently, representing evolutionary relationships of prokaryotes as bifurcating trees has long been controversial. However, studies reporting conflicts among gene trees derived from phylogenomic data sets have shown that these conflicts can be the result of artifacts or evolutionary processes other than HGT, such as incomplete lineage sorting, low phylogenetic signal, and systematic errors due to substitution model misspecification. Here, we present the results of an extensive exploration of phylogenetic conflicts in the cyanobacterial order Nostocales, for which previous studies have inferred strongly supported conflicting relationships when using different concatenated phylogenomic data sets. We found that most of these conflicts are concentrated in deep clusters of short internodes of the Nostocales phylogeny, where the great majority of individual genes have low resolving power. We then inferred phylogenetic networks to detect HGT events while also accounting for incomplete lineage sorting. Our results indicate that most conflicts among gene trees are likely due to incomplete lineage sorting linked to an ancient rapid radiation, rather than to HGTs. Moreover, the short internodes of this radiation fit the expectations of the anomaly zone, i.e., a region of the tree parameter space where a species tree is discordant with its most likely gene tree. We demonstrated that concatenation of different sets of loci can recover up to 17 distinct and well-supported relationships within the putative anomaly zone of Nostocales, corresponding to the observed conflicts among well-supported trees based on concatenated data sets from previous studies. Our findings highlight the important role of rapid radiations as a potential cause of strongly conflicting phylogenetic relationships when using phylogenomic data sets of bacteria. We propose that polytomies may be the most appropriate phylogenetic representation of these rapid radiations that are part of anomaly zones, especially when all possible genomic markers have been considered to infer these phylogenies. [Anomaly zone; bacteria; horizontal gene transfer; incomplete lineage sorting; Nostocales; phylogenomic conflict; rapid radiation; Rhizonema.].
Collapse
Affiliation(s)
| | - Nicolas Magain
- Evolution and Conservation Biology, InBioS Research Center, Université de Liège, Liège 4000, Belgium
| | - Bryan Piatkowski
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830, USA
| | - Luc Cornet
- Evolution and Conservation Biology, InBioS Research Center, Université de Liège, Liège 4000, Belgium
- BCCM/IHEM, Mycology and Aerobiology, Sciensano, Brussels, Belgium
| | | | - Ignazio Carbone
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC 27606, USA
| | | | | |
Collapse
|
5
|
Dornburg A, Mallik R, Wang Z, Bernal MA, Thompson B, Bruford EA, Nebert DW, Vasiliou V, Yohe LR, Yoder JA, Townsend JP. Placing human gene families into their evolutionary context. Hum Genomics 2022; 16:56. [PMID: 36369063 PMCID: PMC9652883 DOI: 10.1186/s40246-022-00429-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 10/12/2022] [Indexed: 11/13/2022] Open
Abstract
Following the draft sequence of the first human genome over 20 years ago, we have achieved unprecedented insights into the rules governing its evolution, often with direct translational relevance to specific diseases. However, staggering sequence complexity has also challenged the development of a more comprehensive understanding of human genome biology. In this context, interspecific genomic studies between humans and other animals have played a critical role in our efforts to decode human gene families. In this review, we focus on how the rapid surge of genome sequencing of both model and non-model organisms now provides a broader comparative framework poised to empower novel discoveries. We begin with a general overview of how comparative approaches are essential for understanding gene family evolution in the human genome, followed by a discussion of analyses of gene expression. We show how homology can provide insights into the genes and gene families associated with immune response, cancer biology, vision, chemosensation, and metabolism, by revealing similarity in processes among distant species. We then explain methodological tools that provide critical advances and show the limitations of common approaches. We conclude with a discussion of how these investigations position us to gain fundamental insights into the evolution of gene families among living organisms in general. We hope that our review catalyzes additional excitement and research on the emerging field of comparative genomics, while aiding the placement of the human genome into its existentially evolutionary context.
Collapse
Affiliation(s)
- Alex Dornburg
- Department of Bioinformatics and Genomics, UNC-Charlotte, Charlotte, NC, USA.
| | - Rittika Mallik
- Department of Bioinformatics and Genomics, UNC-Charlotte, Charlotte, NC, USA
| | - Zheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Moisés A Bernal
- Department of Biological Sciences, College of Science and Mathematics, Auburn University, Auburn, AL, USA
| | - Brian Thompson
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, CT, USA
| | - Elspeth A Bruford
- Department of Haematology, University of Cambridge School of Clinical Medicine, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Daniel W Nebert
- Department of Environmental Health, Center for Environmental Genetics, University of Cincinnati Medical Center, P.O. Box 670056, Cincinnati, OH, 45267, USA
- Department of Pediatrics and Molecular Developmental Biology, Division of Human Genetics, Cincinnati Children's Hospital, Cincinnati, OH, 45229, USA
| | - Vasilis Vasiliou
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, CT, USA
| | - Laurel R Yohe
- Department of Bioinformatics and Genomics, UNC-Charlotte, Charlotte, NC, USA
| | - Jeffrey A Yoder
- Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, NC, USA
| | - Jeffrey P Townsend
- Department of Bioinformatics and Genomics, UNC-Charlotte, Charlotte, NC, USA
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA
| |
Collapse
|
6
|
Porto DS, Dahdul WM, Lapp H, Balhoff JP, Vision TJ, Mabee PM, Uyeda J. Assessing Bayesian Phylogenetic Information Content of Morphological Data Using Knowledge from Anatomy Ontologies. Syst Biol 2022; 71:1290-1306. [PMID: 35285502 PMCID: PMC9558846 DOI: 10.1093/sysbio/syac022] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 02/09/2022] [Accepted: 03/05/2022] [Indexed: 11/18/2022] Open
Abstract
Morphology remains a primary source of phylogenetic information for many groups of organisms, and the only one for most fossil taxa. Organismal anatomy is not a collection of randomly assembled and independent “parts”, but instead a set of dependent and hierarchically nested entities resulting from ontogeny and phylogeny. How do we make sense of these dependent and at times redundant characters? One promising approach is using ontologies—structured controlled vocabularies that summarize knowledge about different properties of anatomical entities, including developmental and structural dependencies. Here, we assess whether evolutionary patterns can explain the proximity of ontology-annotated characters within an ontology. To do so, we measure phylogenetic information across characters and evaluate if it matches the hierarchical structure given by ontological knowledge—in much the same way as across-species diversity structure is given by phylogeny. We implement an approach to evaluate the Bayesian phylogenetic information (BPI) content and phylogenetic dissonance among ontology-annotated anatomical data subsets. We applied this to data sets representing two disparate animal groups: bees (Hexapoda: Hymenoptera: Apoidea, 209 chars) and characiform fishes (Actinopterygii: Ostariophysi: Characiformes, 463 chars). For bees, we find that BPI is not substantially explained by anatomy since dissonance is often high among morphologically related anatomical entities. For fishes, we find substantial information for two clusters of anatomical entities instantiating concepts from the jaws and branchial arch bones, but among-subset information decreases and dissonance increases substantially moving to higher-level subsets in the ontology. We further applied our approach to address particular evolutionary hypotheses with an example of morphological evolution in miniature fishes. While we show that phylogenetic information does match ontology structure for some anatomical entities, additional relationships and processes, such as convergence, likely play a substantial role in explaining BPI and dissonance, and merit future investigation. Our work demonstrates how complex morphological data sets can be interrogated with ontologies by allowing one to access how information is spread hierarchically across anatomical concepts, how congruent this information is, and what sorts of processes may play a role in explaining it: phylogeny, development, or convergence. [Apidae; Bayesian phylogenetic information; Ostariophysi; Phenoscape; phylogenetic dissonance; semantic similarity.]
Collapse
Affiliation(s)
- Diego S Porto
- Department of Biological Sciences, Virginia Polytechnic Institute and State University, 926 West Campus Drive, Blacksburg, VA 24061, USA
| | - Wasila M Dahdul
- UCI Libraries,University of California, Irvine, Irvine, CA 92623, USA
- Department of Biology, University of South Dakota, 414 East Clark Street, Vermillion, SD 57069, USA
| | - Hilmar Lapp
- Center for Genomic and Computational Biology, Duke University, 101 Science Drive, Durham, NC 27708, USA
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, 100 Europa Drive, Suite 540, Chapel Hill, NC 27517, USA
| | - Todd J Vision
- Department of Biology and School of Information and Library Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Paula M Mabee
- Department of Biology, University of South Dakota, 414 East Clark Street, Vermillion, SD 57069, USA
- Battelle, National Ecological Observatory Network, Boulder, CO 80301, USA
| | - Josef Uyeda
- Department of Biological Sciences, Virginia Polytechnic Institute and State University, 926 West Campus Drive, Blacksburg, VA 24061, USA
| |
Collapse
|
7
|
Bansal MS. Deciphering Microbial Gene Family Evolution Using Duplication-Transfer-Loss Reconciliation and RANGER-DTL. Methods Mol Biol 2022; 2569:233-252. [PMID: 36083451 DOI: 10.1007/978-1-0716-2691-7_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Phylogenetic reconciliation has emerged as a principled, highly effective technique for investigating the origin, spread, and evolutionary history of microbial gene families. Proper application of phylogenetic reconciliation requires a clear understanding of potential pitfalls and sources of error, and knowledge of the most effective reconciliation-based tools and protocols to use to maximize accuracy. In this book chapter, we provide a brief overview of Duplication-Transfer-Loss (DTL) reconciliation, the standard reconciliation model used to study microbial gene families and provide a step-by-step computational protocol to maximize the accuracy of DTL reconciliation and minimize false-positive evolutionary inferences.
Collapse
Affiliation(s)
- Mukul S Bansal
- Department of Computer Science & Engineering, University of Connecticut, Storrs, CT, USA.
| |
Collapse
|
8
|
Harrington SM, Wishingrad V, Thomson RC. Properties of Markov Chain Monte Carlo Performance across Many Empirical Alignments. Mol Biol Evol 2021; 38:1627-1640. [PMID: 33185685 PMCID: PMC8042746 DOI: 10.1093/molbev/msaa295] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Nearly all current Bayesian phylogenetic applications rely on Markov chain Monte Carlo (MCMC) methods to approximate the posterior distribution for trees and other parameters of the model. These approximations are only reliable if Markov chains adequately converge and sample from the joint posterior distribution. Although several studies of phylogenetic MCMC convergence exist, these have focused on simulated data sets or select empirical examples. Therefore, much that is considered common knowledge about MCMC in empirical systems derives from a relatively small family of analyses under ideal conditions. To address this, we present an overview of commonly applied phylogenetic MCMC diagnostics and an assessment of patterns of these diagnostics across more than 18,000 empirical analyses. Many analyses appeared to perform well and failures in convergence were most likely to be detected using the average standard deviation of split frequencies, a diagnostic that compares topologies among independent chains. Different diagnostics yielded different information about failed convergence, demonstrating that multiple diagnostics must be employed to reliably detect problems. The number of taxa and average branch lengths in analyses have clear impacts on MCMC performance, with more taxa and shorter branches leading to more difficult convergence. We show that the usage of models that include both Γ-distributed among-site rate variation and a proportion of invariable sites is not broadly problematic for MCMC convergence but is also unnecessary. Changes to heating and the usage of model-averaged substitution models can both offer improved convergence in some cases, but neither are a panacea.
Collapse
Affiliation(s)
| | - Van Wishingrad
- School of Life Sciences, University of Hawai'i, Honolulu, HI
| | | |
Collapse
|
9
|
Shi D, Chen MH, Kuo L, O Lewis P. New partition based measures for data compatibility and information gain. Stat Med 2021; 40:3560-3581. [PMID: 33853200 DOI: 10.1002/sim.8982] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Revised: 03/19/2021] [Accepted: 03/20/2021] [Indexed: 11/09/2022]
Abstract
It is of great practical importance to compare and combine data from different studies in order to carry out appropriate and more powerful statistical inference. We propose a partition based measure to quantify the compatibility of two datasets using their respective posterior distributions. We further propose an information gain measure to quantify the information increase (or decrease) in combining two datasets. These measures are well calibrated and efficient computational algorithms are provided for their calculations. We use examples in a benchmark dose toxicology study, a six cities pollution data and a melanoma clinical trial to illustrate how these two measures are useful in combining current data with historical data and missing data.
Collapse
Affiliation(s)
- Daoyuan Shi
- Department of Statistics, University of Connecticut, Storrs, Connecticut, USA
| | - Ming-Hui Chen
- Department of Statistics, University of Connecticut, Storrs, Connecticut, USA
| | - Lynn Kuo
- Department of Statistics, University of Connecticut, Storrs, Connecticut, USA
| | - Paul O Lewis
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, Connecticut, USA
| |
Collapse
|
10
|
Porto DS, Almeida EAB, Pennell MW. Investigating Morphological Complexes Using Informational Dissonance and Bayes Factors: A Case Study in Corbiculate Bees. Syst Biol 2021; 70:295-306. [PMID: 32722788 PMCID: PMC7882150 DOI: 10.1093/sysbio/syaa059] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 07/16/2020] [Accepted: 07/17/2020] [Indexed: 11/22/2022] Open
Abstract
It is widely recognized that different regions of a genome often have different evolutionary histories and that ignoring this variation when estimating phylogenies can be misleading. However, the extent to which this is also true for morphological data is still largely unknown. Discordance among morphological traits might plausibly arise due to either variable convergent selection pressures or else phenomena such as hemiplasy. Here, we investigate patterns of discordance among 282 morphological characters, which we scored for 50 bee species particularly targeting corbiculate bees, a group that includes the well-known eusocial honeybees and bumblebees. As a starting point for selecting the most meaningful partitions in the data, we grouped characters as morphological modules, highly integrated trait complexes that as a result of developmental constraints or coordinated selection we expect to share an evolutionary history and trajectory. In order to assess conflict and coherence across and within these morphological modules, we used recently developed approaches for computing Bayesian phylogenetic information allied with model comparisons using Bayes factors. We found that despite considerable conflict among morphological complexes, accounting for among-character and among-partition rate variation with individual gamma distributions, rate multipliers, and linked branch lengths can lead to coherent phylogenetic inference using morphological data. We suggest that evaluating information content and dissonance among partitions is a useful step in estimating phylogenies from morphological data, just as it is with molecular data. Furthermore, we argue that adopting emerging approaches for investigating dissonance in genomic datasets may provide new insights into the integration and evolution of anatomical complexes. [Apidae; entropy; morphological modules; phenotypic integration; phylogenetic information.].
Collapse
Affiliation(s)
- Diego S Porto
- Laboratório de Biologia Comparada e Abelhas (LBCA), Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto (FFCLRP), Universidade de São Paulo, 14040-901 Ribeirão Preto, SP, Brazil
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver BC V6T 1Z4, Canada
- Department of Biological Sciences, Virginia Polytechnic Institute and State University, 926 West Campus Drive, Blacksburg, VA 24061 USA
| | - Eduardo A B Almeida
- Laboratório de Biologia Comparada e Abelhas (LBCA), Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto (FFCLRP), Universidade de São Paulo, 14040-901 Ribeirão Preto, SP, Brazil
| | - Matthew W Pennell
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver BC V6T 1Z4, Canada
| |
Collapse
|
11
|
Simon C. An Evolving View of Phylogenetic Support. Syst Biol 2020; 71:921-928. [PMID: 32915964 DOI: 10.1093/sysbio/syaa068] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 08/04/2020] [Accepted: 08/15/2020] [Indexed: 01/09/2023] Open
Abstract
If all nucleotide sites evolved at the same rate within molecules and throughout the history of lineages, if all nucleotides were in equal proportion, if any nucleotide or amino acid evolved to any other with equal probability, if all taxa could be sampled, if diversification happened at well-spaced intervals, and if all gene segments had the same history, then tree building would be easy. But of course none of those conditions are true. Hence the need for evaluating the information content and accuracy of phylogenetic trees. The symposium for which this historial essay and presentation were developed focused on the importance of phylogenetic support, specifically branch support for individual clades. Here I present a timeline and review significant events in the history of systematics that set the stage for the development of the sophisticated measures of branch support and examinations of the information content of data highlighted in this symposium.
Collapse
Affiliation(s)
- Chris Simon
- Department of Ecology and Evolutionary Biology, 75 N. Eagleville Road, University of Connecticut, Storrs, CT
| |
Collapse
|
12
|
Fourment M, Magee AF, Whidden C, Bilge A, Matsen FA, Minin VN. 19 Dubious Ways to Compute the Marginal Likelihood of a Phylogenetic Tree Topology. Syst Biol 2020; 69:209-220. [PMID: 31504998 DOI: 10.1093/sysbio/syz046] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 06/27/2019] [Accepted: 07/02/2019] [Indexed: 11/12/2022] Open
Abstract
The marginal likelihood of a model is a key quantity for assessing the evidence provided by the data in support of a model. The marginal likelihood is the normalizing constant for the posterior density, obtained by integrating the product of the likelihood and the prior with respect to model parameters. Thus, the computational burden of computing the marginal likelihood scales with the dimension of the parameter space. In phylogenetics, where we work with tree topologies that are high-dimensional models, standard approaches to computing marginal likelihoods are very slow. Here, we study methods to quickly compute the marginal likelihood of a single fixed tree topology. We benchmark the speed and accuracy of 19 different methods to compute the marginal likelihood of phylogenetic topologies on a suite of real data sets under the JC69 model. These methods include several new ones that we develop explicitly to solve this problem, as well as existing algorithms that we apply to phylogenetic models for the first time. Altogether, our results show that the accuracy of these methods varies widely, and that accuracy does not necessarily correlate with computational burden. Our newly developed methods are orders of magnitude faster than standard approaches, and in some cases, their accuracy rivals the best established estimators.
Collapse
Affiliation(s)
- Mathieu Fourment
- University of Technology Sydney, ithree Institute, Ultimo NSW 2007, Australia
| | - Andrew F Magee
- Department of Biology, University of Washington, Seattle, WA 98195, USA
| | - Chris Whidden
- Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Arman Bilge
- Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | | | - Vladimir N Minin
- Department of Statistics, University of California, Irvine, CA 92697, USA
| |
Collapse
|
13
|
Prasanna AN, Gerber D, Kijpornyongpan T, Aime MC, Doyle VP, Nagy LG. Model Choice, Missing Data, and Taxon Sampling Impact Phylogenomic Inference of Deep Basidiomycota Relationships. Syst Biol 2020; 69:17-37. [PMID: 31062852 DOI: 10.1093/sysbio/syz029] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Revised: 04/21/2019] [Accepted: 04/26/2019] [Indexed: 11/12/2022] Open
Abstract
Resolving deep divergences in the tree of life is challenging even for analyses of genome-scale phylogenetic data sets. Relationships between Basidiomycota subphyla, the rusts and allies (Pucciniomycotina), smuts and allies (Ustilaginomycotina), and mushroom-forming fungi and allies (Agaricomycotina) were found particularly recalcitrant both to traditional multigene and genome-scale phylogenetics. Here, we address basal Basidiomycota relationships using concatenated and gene tree-based analyses of various phylogenomic data sets to examine the contribution of several potential sources of bias. We evaluate the contribution of biological causes (hard polytomy, incomplete lineage sorting) versus unmodeled evolutionary processes and factors that exacerbate their effects (e.g., fast-evolving sites and long-branch taxa) to inferences of basal Basidiomycota relationships. Bayesian Markov Chain Monte Carlo and likelihood mapping analyses reject the hard polytomy with confidence. In concatenated analyses, fast-evolving sites and oversimplified models of amino acid substitution favored the grouping of smuts with mushroom-forming fungi, often leading to maximal bootstrap support in both concatenation and coalescent analyses. On the contrary, the most conserved data subsets grouped rusts and allies with mushroom-forming fungi, although this relationship proved labile, sensitive to model choice, to different data subsets and to missing data. Excluding putative long-branch taxa, genes with high proportions of missing data and/or with strong signal failed to reveal a consistent trend toward one or the other topology, suggesting that additional sources of conflict are at play. While concatenated analyses yielded strong but conflicting support, individual gene trees mostly provided poor support for any resolution of rusts, smuts, and mushroom-forming fungi, suggesting that the true Basidiomycota tree might be in a part of tree space that is difficult to access using both concatenation and gene tree-based approaches. Inference-based assessments of absolute model fit strongly reject best-fit models for the vast majority of genes, indicating a poor fit of even the most commonly used models. While this is consistent with previous assessments of site-homogenous models of amino acid evolution, this does not appear to be the sole source of confounding signal. Our analyses suggest that topologies uniting smuts with mushroom-forming fungi can arise as a result of inappropriate modeling of amino acid sites that might be prone to systematic bias. We speculate that improved models of sequence evolution could shed more light on basal splits in the Basidiomycota, which, for now, remain unresolved despite the use of whole genome data.
Collapse
Affiliation(s)
- Arun N Prasanna
- Synthetic and Systems Biology Unit, Institute of Biochemistry, BRC-HAS, Szeged 6726, Hungary
| | - Daniel Gerber
- Synthetic and Systems Biology Unit, Institute of Biochemistry, BRC-HAS, Szeged 6726, Hungary.,Institute of Archaeology, Research Centre for the Humanities, Hungarian Academy of Sciences, Budapest 1097, Hungary
| | | | - M Catherine Aime
- Department of Botany and Plant Pathology, Purdue University, West Lafayette, IN 47907, USA
| | - Vinson P Doyle
- Department of Plant Pathology and Crop Physiology, Louisiana State University AgCenter, Baton Rouge, LA 70803, USA
| | - Laszlo G Nagy
- Synthetic and Systems Biology Unit, Institute of Biochemistry, BRC-HAS, Szeged 6726, Hungary
| |
Collapse
|
14
|
Neupane S, Fučíková K, Lewis LA, Kuo L, Chen MH, Lewis PO. Assessing Combinability of Phylogenomic Data Using Bayes Factors. Syst Biol 2020; 68:744-754. [PMID: 30726954 DOI: 10.1093/sysbio/syz007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 01/26/2019] [Accepted: 02/04/2019] [Indexed: 11/14/2022] Open
Abstract
With the rapid reduction in sequencing costs of high-throughput genomic data, it has become commonplace to use hundreds of genes to infer phylogeny of any study system. While sampling a large number of genes has given us a tremendous opportunity to uncover previously unknown relationships and improve phylogenetic resolution, it also presents us with new challenges when the phylogenetic signal is confused by differences in the evolutionary histories of sampled genes. Given the incorporation of accurate marginal likelihood estimation methods into popular Bayesian software programs, it is natural to consider using the Bayes Factor (BF) to compare different partition models in which genes within any given partition subset share both tree topology and edge lengths. We explore using marginal likelihood to assess data subset combinability when data subsets have varying levels of phylogenetic discordance due to deep coalescence events among genes (simulated within a species tree), and compare the results with our recently described phylogenetic informational dissonance index (D) estimated for each data set. BF effectively detects phylogenetic incongruence and provides a way to assess the statistical significance of D values. We use BFs to assess data combinability using an empirical data set comprising 56 plastid genes from the green algal order Volvocales. We also discuss the potential need for calibrating BFs and demonstrate that BFs used in this study are correctly calibrated.
Collapse
Affiliation(s)
- Suman Neupane
- Department of Biological Sciences, Virginia Tech University, 4076 Derring Hall, 926 West Campus Drive, Blacksburg, VA 24061, USA.,Department of Ecology and Evolutionary Biology, University of Connecticut, 75 N. Eagleville Road, Unit 3043, Storrs, CT 06269, USA
| | - Karolina Fučíková
- Department of Natural Sciences, Assumption College, 500 Salisbury St., Worcester, MA 01609, USA
| | - Louise A Lewis
- Department of Ecology and Evolutionary Biology, University of Connecticut, 75 N. Eagleville Road, Unit 3043, Storrs, CT 06269, USA
| | - Lynn Kuo
- Department of Statistics, University of Connecticut, 215 Glenbrook Road, Unit 4120, Storrs, CT 06269, USA
| | - Ming-Hui Chen
- Department of Statistics, University of Connecticut, 215 Glenbrook Road, Unit 4120, Storrs, CT 06269, USA
| | - Paul O Lewis
- Department of Ecology and Evolutionary Biology, University of Connecticut, 75 N. Eagleville Road, Unit 3043, Storrs, CT 06269, USA
| |
Collapse
|
15
|
Rangel LT, Marden J, Colston S, Setubal JC, Graf J, Gogarten JP. Identification and characterization of putative Aeromonas spp. T3SS effectors. PLoS One 2019; 14:e0214035. [PMID: 31163020 PMCID: PMC6548356 DOI: 10.1371/journal.pone.0214035] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Accepted: 05/21/2019] [Indexed: 11/23/2022] Open
Abstract
The genetic determinants of bacterial pathogenicity are highly variable between species and strains. However, a factor that is commonly associated with virulent Gram-negative bacteria, including many Aeromonas spp., is the type 3 secretion system (T3SS), which is used to inject effector proteins into target eukaryotic cells. In this study, we developed a bioinformatics pipeline to identify T3SS effector proteins, applied this approach to the genomes of 105 Aeromonas strains isolated from environmental, mutualistic, or pathogenic contexts and evaluated the cytotoxicity of the identified effectors through their heterologous expression in yeast. The developed pipeline uses a two-step approach, where candidate Aeromonas gene families are initially selected using Hidden Markov Model (HMM) profile searches against the Virulence Factors DataBase (VFDB), followed by strict comparisons against positive and negative control datasets, greatly reducing the number of false positives. This approach identified 21 Aeromonas T3SS likely effector families, of which 8 represent known or characterized effectors, while the remaining 13 have not previously been described in Aeromonas. We experimentally validated our in silico findings by assessing the cytotoxicity of representative effectors in Saccharomyces cerevisiae BY4741, with 15 out of 21 assayed proteins eliciting a cytotoxic effect in yeast. The results of this study demonstrate the utility of our approach, combining a novel in silico search method with in vivo experimental validation, and will be useful in future research aimed at identifying and authenticating bacterial effector proteins from other genera.
Collapse
Affiliation(s)
- Luiz Thiberio Rangel
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
- Interunidades em Bioinformática, Universidade de São Paulo, São Paulo, Brasil
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brasil
| | - Jeremiah Marden
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
| | - Sophie Colston
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
| | - João Carlos Setubal
- Interunidades em Bioinformática, Universidade de São Paulo, São Paulo, Brasil
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brasil
| | - Joerg Graf
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
- Institute for Systems Genomics, University of Connecticut, Storrs, Connecticut, United States of America
| | - Johann Peter Gogarten
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
- Institute for Systems Genomics, University of Connecticut, Storrs, Connecticut, United States of America
| |
Collapse
|
16
|
Brown DG, Owen M. Mean and Variance of Phylogenetic Trees. Syst Biol 2019; 69:139-154. [DOI: 10.1093/sysbio/syz041] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Revised: 05/13/2019] [Accepted: 05/24/2019] [Indexed: 11/13/2022] Open
Abstract
Abstract
We describe the use of the Fréchet mean and variance in the Billera–Holmes–Vogtmann (BHV) treespace to summarize and explore the diversity of a set of phylogenetic trees. We show that the Fréchet mean is comparable to other summary methods, and, despite its stickiness property, is more likely to be binary than the majority-rule consensus tree. We show that the Fréchet variance is faster and more precise than commonly used variance measures. The Fréchet mean and variance are more theoretically justified, and more robust, than previous estimates of this type and can be estimated reasonably efficiently, providing a foundation for building more advanced statistical methods and leading to applications such as mean hypothesis testing and outlier detection.
Collapse
Affiliation(s)
- Daniel G Brown
- David R. Cheriton School of Computer Science, University of Waterloo, 200 University Ave. W, Waterloo ON N2L 3G1, Canada
| | - Megan Owen
- Department of Mathematics, Lehman College, City University of New York, 250 Bedford Park Blvd West, Bronx, New York, NY 10468, USA
| |
Collapse
|
17
|
Fučíková K, Lewis PO, Neupane S, Karol KG, Lewis LA. Order, please! Uncertainty in the ordinal-level classification of Chlorophyceae. PeerJ 2019; 7:e6899. [PMID: 31143537 PMCID: PMC6525593 DOI: 10.7717/peerj.6899] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Accepted: 04/02/2019] [Indexed: 11/20/2022] Open
Abstract
Background Chlorophyceae is one of three most species-rich green algal classes and also the only class in core Chlorophyta whose monophyly remains uncontested as gene and taxon sampling improves. However, some key relationships within Chlorophyceae are less clear-cut and warrant further investigation. The present study combined genome-scale chloroplast data and rich sampling in an attempt to resolve the ordinal classification in Chlorophyceae. The traditional division into Sphaeropleales and Volvocales (SV), and a clade containing Oedogoniales, Chaetopeltidales, and Chaetophorales (OCC) was of particular interest with the addition of deeply branching members of these groups, as well as the placement of several incertae sedis taxa. Methods We sequenced 18 chloroplast genomes across Chlorophyceae to compile a data set of 58 protein-coding genes of a total of 68 chlorophycean taxa. We analyzed the concatenated nucleotide and amino acid datasets in the Bayesian and Maximum Likelihood frameworks, supplemented by analyses to examine potential discordant signal among genes. We also examined gene presence and absence data across Chlorophyceae. Results Concatenated analyses yielded at least two well-supported phylogenies: nucleotide data supported the traditional classification with the inclusion of the enigmatic Treubarinia into Sphaeropleales sensu lato. However, amino acid data yielded equally strong support for Sphaeropleaceae as sister to Volvocales, with the rest of the taxa traditionally classified in Sphaeropleales in a separate clade, and Treubarinia as sister to all of the above. Single-gene and other supplementary analyses indicated that the data have low phylogenetic signal at these critical nodes. Major clades were supported by genomic structural features such as gene losses and trans-spliced intron insertions in the plastome. Discussion While the sequence and gene order data support the deep split between the SV and OCC lineages, multiple phylogenetic hypotheses are possible for Sphaeropleales s.l. Given this uncertainty as well as the higher-taxonomic disorder seen in other algal groups, dwelling on well-defined, strongly supported Linnaean orders is not currently practical in Chlorophyceae and a less formal clade system may be more useful in the foreseeable future. For example, we identify two strongly and unequivocally supported clades: Treubarinia and Scenedesminia, as well as other smaller groups that could serve a practical purpose as named clades. This system does not preclude future establishment of new orders, or emendment of the current ordinal classification if new data support such conclusions.
Collapse
Affiliation(s)
- Karolina Fučíková
- Department of Natural Sciences, Assumption College, Worcester, MA, United States of America
| | - Paul O Lewis
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, United States of America
| | - Suman Neupane
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, United States of America
| | - Kenneth G Karol
- The Lewis B. and Dorothy Cullman Program for Molecular Systematics, New York Botanical Garden, Bronx, NY, United States of America
| | - Louise A Lewis
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, United States of America
| |
Collapse
|
18
|
Denton JSS, Goolsby EW. Measuring inferential importance of taxa using taxon influence indices. Ecol Evol 2018; 8:4484-4494. [PMID: 29760889 PMCID: PMC5938459 DOI: 10.1002/ece3.3941] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2017] [Revised: 01/14/2018] [Accepted: 01/31/2018] [Indexed: 11/30/2022] Open
Abstract
Assessing the importance of different taxa for inferring evolutionary history is a critical, but underutilized, aspect of systematics. Quantifying the importance of all taxa within a dataset provides an empirical measurement that can establish a ranking of extant taxa for ecological study and/or quantify the relative importance of newly announced or redescribed specimens to enable the disentangling of novelty and inferential influence. Here, we illustrate the use of taxon influence indices through analysis of both molecular and morphological datasets, introducing a modified Bayesian approach to the taxon influence index that accounts for model and topological uncertainty. Quantification of taxon influence using the Bayesian approach produced clear rankings for both dataset types. Bayesian taxon rankings differed from maximum likelihood (ML)‐derived rankings from a mitogenomic dataset, and the highest ranking taxa exhibited the largest interquartile range in influence estimate, suggesting variance in the estimate must be taken into account when the ranking of taxa is the feature of interest. Application of the Bayesian taxon influence index to a recent morphological analysis of the Tully Monster (Tullimonstrum) reveals that it exhibits consistently low inferential importance across two recent treatments of the taxon with alternative character codings. These results lend support to the idea that taxon influence indices may be robust to character coding and therefore effective for morphological analyses. These results underscore a need for the development of approaches to, and application of, taxon influence analyses both for the purpose of establishing robust rankings for future inquiry and for explicitly quantifying the importance of individual taxa. Quantifying the importance of individual taxa refocuses debates in morphological studies from questions of character choice/significance and taxon sampling to explicitly analytical techniques, and guides discussion of the context of new discoveries.
Collapse
Affiliation(s)
- John S S Denton
- Department of Vertebrate Paleontology American Museum of Natural History New York NY USA
| | - Eric W Goolsby
- Department of Ecology and Evolutionary Biology Yale University New Haven CT USA
| |
Collapse
|
19
|
Brown JW, Smith SA. The Past Sure is Tense: On Interpreting Phylogenetic Divergence Time Estimates. Syst Biol 2018; 67:340-353. [PMID: 28945912 DOI: 10.1093/sysbio/syx074] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Accepted: 09/04/2017] [Indexed: 11/12/2022] Open
Abstract
Divergence time estimation-the calibration of a phylogeny to geological time-is an integral first step in modeling the tempo of biological evolution (traits and lineages). However, despite increasingly sophisticated methods to infer divergence times from molecular genetic sequences, the estimated age of many nodes across the tree of life contrast significantly and consistently with timeframes conveyed by the fossil record. This is perhaps best exemplified by crown angiosperms, where molecular clock (Triassic) estimates predate the oldest (Early Cretaceous) undisputed angiosperm fossils by tens of millions of years or more. While the incompleteness of the fossil record is a common concern, issues of data limitation and model inadequacy are viable (if underexplored) alternative explanations. In this vein, Beaulieu et al. (2015) convincingly demonstrated how methods of divergence time inference can be misled by both (i) extreme state-dependent molecular substitution rate heterogeneity and (ii) biased sampling of representative major lineages. These results demonstrate the impact of (potentially common) model violations. Here, we suggest another potential challenge: that the configuration of the statistical inference problem (i.e., the parameters, their relationships, and associated priors) alone may preclude the reconstruction of the paleontological timeframe for the crown age of angiosperms. We demonstrate, through sampling from the joint prior (formed by combining the tree (diversification) prior with the calibration densities specified for fossil-calibrated nodes) that with no data present at all, that an Early Cretaceous crown angiosperms is rejected (i.e., has essentially zero probability). More worrisome, however, is that for the 24 nodes calibrated by fossils, almost all have indistinguishable marginal prior and posterior age distributions when employing routine lognormal fossil calibration priors. These results indicate that there is inadequate information in the data to over-rule the joint prior. Given that these calibrated nodes are strategically placed in disparate regions of the tree, they act to anchor the tree scaffold, and so the posterior inference for the tree as a whole is largely determined by the pseudodata present in the (often arbitrary) calibration densities. We recommend, as for any Bayesian analysis, that marginal prior and posterior distributions be carefully compared to determine whether signal is coming from the data or prior belief, especially for parameters of direct interest. This recommendation is not novel. However, given how rarely such checks are carried out in evolutionary biology, it bears repeating. Our results demonstrate the fundamental importance of prior/posterior comparisons in any Bayesian analysis, and we hope that they further encourage both researchers and journals to consistently adopt this crucial step as standard practice. Finally, we note that the results presented here do not refute the biological modeling concerns identified by Beaulieu et al. (2015). Both sets of issues remain apposite to the goals of accurate divergence time estimation, and only by considering them in tandem can we move forward more confidently.
Collapse
Affiliation(s)
- Joseph W Brown
- Department of Ecology & Evolutionary Biology, University of Michigan, 830 North University Avenue, Ann Arbor, MI 48109, USA
| | - Stephen A Smith
- Department of Ecology & Evolutionary Biology, University of Michigan, 830 North University Avenue, Ann Arbor, MI 48109, USA
| |
Collapse
|
20
|
Steel M, Leuenberger C. The optimal rate for resolving a near-polytomy in a phylogeny. J Theor Biol 2017; 420:174-179. [PMID: 28263815 DOI: 10.1016/j.jtbi.2017.02.037] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2016] [Revised: 02/25/2017] [Accepted: 02/28/2017] [Indexed: 11/29/2022]
Abstract
The reconstruction of phylogenetic trees from discrete character data typically relies on models that assume the characters evolve under a continuous-time Markov process operating at some overall rate λ. When λ is too high or too low, it becomes difficult to distinguish a short interior edge from a polytomy (the tree that results from collapsing the edge). In this note, we investigate the rate that maximizes the expected log-likelihood ratio (i.e. the Kullback-Leibler separation) between the four-leaf unresolved (star) tree and a four-leaf binary tree with interior edge length ϵ. For a simple two-state model, we show that as ϵ converges to 0 the optimal rate also converges to zero when the four pendant edges have equal length. However, when the four pendant branches have unequal length, two local optima can arise, and it is possible for the globally optimal rate to converge to a non-zero constant as ϵ→0. Moreover, in the setting where the four pendant branches have equal lengths and either (i) we replace the two-state model by an infinite-state model or (ii) we retain the two-state model and replace the Kullback-Leibler separation by Euclidean distance as the maximization goal, then the optimal rate also converges to a non-zero constant.
Collapse
Affiliation(s)
- Mike Steel
- Biomathematics Research Centre, University of Canterbury, 8041, Christchurch, New Zealand.
| | - Christoph Leuenberger
- Département de mathématiques, Université de Fribourg, Chemin du Musée 3, 1705 Fribourg, Switzerland.
| |
Collapse
|