1
|
Manuel C, Sakalli E, Schmidt HA, Viñas C, von Haeseler A, Elgert C. When the Past Fades: Detecting Phylogenetic Signal with SatuTe. Mol Biol Evol 2025; 42:msaf090. [PMID: 40423578 PMCID: PMC12108095 DOI: 10.1093/molbev/msaf090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Revised: 03/10/2025] [Accepted: 03/26/2025] [Indexed: 05/28/2025] Open
Abstract
In phylogenetics, the phenomenon of saturation is well known, although its influence on tree reconstruction lacks a systematic and well-founded method. Here, we propose a new measure of the phylogenetic information shared between two subtrees connected by a branch in a phylogeny. This measure generalizes the concept of saturation between two sequences to a theory of saturation between subtrees, whose implementation we provide as the versatile program SatuTe. We describe different usages of SatuTe, identifying which branches in a tree are phylogenetically informative and which alignment regions support a given branch. As an example, we discuss the Tree of Life reconstruction from ribosomal proteins and the 16S rRNA gene, with emphasis on the two-domain versus three-domain hypotheses. For the branch leading to Eukaryota, we show that most ribosomal proteins contain a strong phylogenetic signal, whereas some regions of the 16S rRNA gene have lost phylogenetic information. Thus, SatuTe opens new insights into phylogenetic inference and complements standard phylogenetic analysis.
Collapse
Affiliation(s)
- Cassius Manuel
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna, Medical University of Vienna, Dr. Bohr Gasse 9, Vienna A-1030, Austria
| | - Enes Sakalli
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna, Medical University of Vienna, Dr. Bohr Gasse 9, Vienna A-1030, Austria
- Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, Vienna A-1030, Austria
| | - Heiko A Schmidt
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna, Medical University of Vienna, Dr. Bohr Gasse 9, Vienna A-1030, Austria
| | - Carme Viñas
- Faculty of Mathematics and Statistics, Polytechnic University of Catalonia, Barcelona, Spain
| | - Arndt von Haeseler
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna, Medical University of Vienna, Dr. Bohr Gasse 9, Vienna A-1030, Austria
- Ludwig Boltzmann Institute for Network Medicine, University of Vienna, Augasse 2-6, Vienna A-1090, Austria
| | - Christiane Elgert
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna, Medical University of Vienna, Dr. Bohr Gasse 9, Vienna A-1030, Austria
| |
Collapse
|
2
|
Atherton S, Hulterström J, Guidetti R, Jönsson KI. Three new species of Mesobiotus (Eutardigrada: Macrobiotidae) from Sweden with an updated phylogeny of the genus. Sci Rep 2025; 15:4535. [PMID: 39915525 PMCID: PMC11802830 DOI: 10.1038/s41598-025-88063-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Accepted: 01/23/2025] [Indexed: 02/09/2025] Open
Abstract
Three new species of Mesobiotus (Tardigrada: Eutardigrada: Macrobiotidae) are described from Skåne County in the southernmost region of Sweden. All three species are distinguished morphologically and through differences in DNA sequences as supported by PTP and mPTP analyses. With the addition of Mesobiotus bockebodicus sp. nov., M. skanensis sp. nov., and M. zelmae sp. nov., the number of nominal species of Macrobiotidae in Sweden has increased to 26, 73% of which have been documented from Skåne. Finally, new morphological details and DNA sequences are presented for Mesobiotus emiliae, a new record is presented of M. mandalori from Sweden, and the phylogenetic relationships within the genus is reconstructed using previously published and new 18S and COI gene sequences.
Collapse
Affiliation(s)
- Sarah Atherton
- Department of Zoology, Naturhistoriska riksmuseet, Box 50007, Stockholm, 104 05, Sweden.
| | - Jens Hulterström
- Department of Zoology, Naturhistoriska riksmuseet, Box 50007, Stockholm, 104 05, Sweden
| | - Roberto Guidetti
- Department of Life Sciences, University of Modena and Reggio Emilia, Modena, 41124, Italy
| | - K Ingemar Jönsson
- Department of Environmental Science, Kristianstad University, Kristianstad, SE-291 88, Sweden
| |
Collapse
|
3
|
Schlüter HM, Uhler C. Integrating representation learning, permutation, and optimization to detect lineage-related gene expression patterns. Nat Commun 2025; 16:1062. [PMID: 39870610 PMCID: PMC11772648 DOI: 10.1038/s41467-025-56388-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Accepted: 01/17/2025] [Indexed: 01/29/2025] Open
Abstract
Recent barcoding technologies allow reconstructing lineage trees while capturing paired single-cell RNA-sequencing (scRNA-seq) data. Such datasets provide opportunities to compare gene expression memory maintenance through lineage branching and pinpoint critical genes in these processes. Here we develop Permutation, Optimization, and Representation learning based single Cell gene Expression and Lineage ANalysis (PORCELAN) to identify lineage-informative genes or subtrees where lineage and expression are tightly coupled. We validate our method using synthetic data and apply it to recent paired lineage and scRNA-seq data of lung cancer in a mouse model and embryogenesis of mouse and C. elegans. Our method pinpoints subtrees giving rise to metastases or new cell states, and genes identified as most informative about lineage overlap with known pathways involved in lung cancer progression. Furthermore, our method highlights differences in how gene expression memory is maintained through divisions in cancer and embryogenesis, thereby providing a tool for studying cell state memory through divisions across biological systems.
Collapse
Affiliation(s)
- Hannah M Schlüter
- Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA, USA
- Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Caroline Uhler
- Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
4
|
Penn MJ, Scheidwasser N, Donnelly CA, Duchêne DA, Bhatt S. Bayesian Inference of Phylogenetic Distances: Revisiting the Eigenvalue Approach. Bull Math Biol 2025; 87:32. [PMID: 39847307 PMCID: PMC11759294 DOI: 10.1007/s11538-024-01403-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 12/13/2024] [Indexed: 01/24/2025]
Abstract
Using genetic data to infer evolutionary distances between molecular sequence pairs based on a Markov substitution model is a common procedure in phylogenetics, in particular for selecting a good starting tree to improve upon. Many evolutionary patterns can be accurately modelled using substitution models that are available in closed form, including the popular general time reversible model (GTR) for DNA data. For more complex biological phenomena, such as variations in lineage-specific evolutionary rates over time (heterotachy), other approaches such as the GTR with rate variation (GTR + Γ ) are required, but do not admit analytical solutions and do not automatically allow for likelihood calculations crucial for Bayesian analysis. In this paper, we derive a hybrid approach between these two methods, incorporating Γ ( α , α ) -distributed rate variation and heterotachy into a hierarchical Bayesian GTR-style framework. Our approach is differentiable and amenable to both stochastic gradient descent for optimisation and Hamiltonian Markov chain Monte Carlo for Bayesian inference. We show the utility of our approach by studying hypotheses regarding the origins of the eukaryotic cell within the context of a universal tree of life and find evidence for a two-domain theory.
Collapse
Affiliation(s)
- Matthew J Penn
- Department of Statistics, University of Oxford, Oxford, UK
| | - Neil Scheidwasser
- Section of Epidemiology, University of Copenhagen, Copenhagen, Denmark
| | - Christl A Donnelly
- Department of Statistics, University of Oxford, Oxford, UK
- Pandemic Sciences Institute, University of Oxford, Oxford, UK
| | - David A Duchêne
- Section of Epidemiology, University of Copenhagen, Copenhagen, Denmark
| | - Samir Bhatt
- Section of Epidemiology, University of Copenhagen, Copenhagen, Denmark.
- MRC Centre for Global Infectious Disease Analysis, Imperial College London, London, UK.
| |
Collapse
|
5
|
Yao N, Zhang Z, Yu L, Hazarika R, Yu C, Jang H, Smith LM, Ton J, Liu L, Stachowicz JJ, Reusch TBH, Schmitz RJ, Johannes F. An evolutionary epigenetic clock in plants. Science 2023; 381:1440-1445. [PMID: 37769069 DOI: 10.1126/science.adh9443] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 08/08/2023] [Indexed: 09/30/2023]
Abstract
Molecular clocks are the basis for dating the divergence between lineages over macroevolutionary timescales (~105 to 108 years). However, classical DNA-based clocks tick too slowly to inform us about the recent past. Here, we demonstrate that stochastic DNA methylation changes at a subset of cytosines in plant genomes display a clocklike behavior. This "epimutation clock" is orders of magnitude faster than DNA-based clocks and enables phylogenetic explorations on a scale of years to centuries. We show experimentally that epimutation clocks recapitulate known topologies and branching times of intraspecies phylogenetic trees in the self-fertilizing plant Arabidopsis thaliana and the clonal seagrass Zostera marina, which represent two major modes of plant reproduction. This discovery will open new possibilities for high-resolution temporal studies of plant biodiversity.
Collapse
Affiliation(s)
- N Yao
- Department of Genetics, University of Georgia, Athens, GA, USA
| | - Z Zhang
- Plant Epigenomics, Technical University of Munich, Freising, Germany
| | - L Yu
- Marine Evolutionary Ecology, GEOMAR Helmholtz Centre for Ocean Research Kiel, Kiel, Germany
| | - R Hazarika
- Plant Epigenomics, Technical University of Munich, Freising, Germany
| | - C Yu
- Plant Epigenomics, Technical University of Munich, Freising, Germany
| | - H Jang
- Department of Genetics, University of Georgia, Athens, GA, USA
| | - L M Smith
- School of Biosciences, University of Sheffield, Sheffield, UK
| | - J Ton
- School of Biosciences, University of Sheffield, Sheffield, UK
| | - L Liu
- Department of Statistics, University of Georgia, Athens, GA, USA
| | - J J Stachowicz
- Department of Evolution and Ecology, University of California, Davis, CA, USA
| | - T B H Reusch
- Marine Evolutionary Ecology, GEOMAR Helmholtz Centre for Ocean Research Kiel, Kiel, Germany
| | - R J Schmitz
- Department of Genetics, University of Georgia, Athens, GA, USA
| | - F Johannes
- Plant Epigenomics, Technical University of Munich, Freising, Germany
| |
Collapse
|
6
|
Yao N, Zhang Z, Yu L, Hazarika R, Yu C, Jang H, Smith LM, Ton J, Liu L, Stachowicz J, Reusch T, Schmitz RJ, Johannes F. An evolutionary epigenetic clock in plants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.15.532766. [PMID: 36993545 PMCID: PMC10055040 DOI: 10.1101/2023.03.15.532766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Molecular clocks are the basis for dating the divergence between lineages over macro-evolutionary timescales (~10 5 -10 8 years). However, classical DNA-based clocks tick too slowly to inform us about the recent past. Here, we demonstrate that stochastic DNA methylation changes at a subset of cytosines in plant genomes possess a clock-like behavior. This 'epimutation-clock' is orders of magnitude faster than DNA-based clocks and enables phylogenetic explorations on a scale of years to centuries. We show experimentally that epimutation-clocks recapitulate known topologies and branching times of intra-species phylogenetic trees in the selfing plant A. thaliana and the clonal seagrass Z. marina , which represent two major modes of plant reproduction. This discovery will open new possibilities for high-resolution temporal studies of plant biodiversity.
Collapse
Affiliation(s)
- N Yao
- Department of Genetics, University of Georgia, Athens, USA
| | - Z Zhang
- Plant Epigenomics, Technical University of Munich, Freising, Germany
| | - L Yu
- Marine Evolutionary Ecology, GEOMAR Helmholtz Centre for Ocean Research Kiel, Kiel, Germany
| | - R Hazarika
- Plant Epigenomics, Technical University of Munich, Freising, Germany
| | - C Yu
- Plant Epigenomics, Technical University of Munich, Freising, Germany
| | - H Jang
- Department of Genetics, University of Georgia, Athens, USA
| | - L M Smith
- School of Biosciences, University of Sheffield, UK
| | - J Ton
- School of Biosciences, University of Sheffield, UK
| | - L Liu
- Department of Statistics, University of Georgia, Athens, USA
| | - J Stachowicz
- Department of Evolution and Ecology, University of California, Davis, USA
| | - Tbh Reusch
- Marine Evolutionary Ecology, GEOMAR Helmholtz Centre for Ocean Research Kiel, Kiel, Germany
| | - R J Schmitz
- Department of Genetics, University of Georgia, Athens, USA
| | - F Johannes
- Plant Epigenomics, Technical University of Munich, Freising, Germany
| |
Collapse
|
7
|
Vecchi M, Tsvetkova A, Stec D, Ferrari C, Calhim S, Tumanov D. Expanding Acutuncus: Phylogenetics and morphological analyses reveal a considerably wider distribution for this tardigrade genus. Mol Phylogenet Evol 2023; 180:107707. [PMID: 36681365 DOI: 10.1016/j.ympev.2023.107707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 01/11/2023] [Accepted: 01/12/2023] [Indexed: 01/20/2023]
Abstract
The tardigrade genus Acutuncus has been long thought to be an Antarctic endemism, well adapted to this harsh environment. The Antarctic endemicity of Acutuncus was recently dispelled with the description of Acutuncus mariae Zawierucha, 2020 found in the Svalbard archipelago. The integrated analyses on two newly found Acutuncus populations from UK and Italy, and a population of Acutuncus antarcticus found close to its type locality allowed us to expand the climatic and geographic range of the genus Acutuncus. These findings also allowed us to re-evaluate the morphological diagnoses of Acutuncus and accommodate it in the newly proposed monotypic family Acutuncidae fam. nov. Two new Acutuncus species morpho-groups are instituted based on eggs morphology: one (Acutuncus antarcticus morphogroup) including the Antarctic Acutuncus taxa characterized by eggs with long pillars within the chorion and eggs laid freely to the environment, the other (Acutuncus mariae morphogroup) including the European species, characterized by eggs with short pillars within the chorion and eggs laid in the exuvium. Finally, we describe two new Acutuncus species from Europe: Acutuncus mecnuffisp. nov. and Acutuncus giovanniniaesp. nov.
Collapse
Affiliation(s)
- Matteo Vecchi
- Department of Biological and Environmental Science, University of Jyvaskyla, PO Box 35, FI-40014 Jyvaskyla, Finland.
| | - Alexandra Tsvetkova
- Department of Invertebrate Zoology, Faculty of Biology, Saint Petersburg State University, 199034, Universitetskaya nab. 7/9, Saint Petersburg, Russia
| | - Daniel Stec
- Institute of Systematics and Evolution of Animals, Polish Academy of Sciences, Sławkowska 17, 31-016 Kraków, Poland
| | - Claudio Ferrari
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area delle Scienze 33/A, 43124 Parma, Italy
| | - Sara Calhim
- Department of Biological and Environmental Science, University of Jyvaskyla, PO Box 35, FI-40014 Jyvaskyla, Finland
| | - Denis Tumanov
- Department of Invertebrate Zoology, Faculty of Biology, Saint Petersburg State University, 199034, Universitetskaya nab. 7/9, Saint Petersburg, Russia; Zoological Institute of the Russian Academy of Sciences, 199034, Universitetskaja nab. 1, Saint Petersburg, Russia.
| |
Collapse
|
8
|
Lin Q, Braukmann TWA, Soto Gomez M, Mayer JLS, Pinheiro F, Merckx VSFT, Stefanović S, Graham SW. Mitochondrial genomic data are effective at placing mycoheterotrophic lineages in plant phylogeny. THE NEW PHYTOLOGIST 2022; 236:1908-1921. [PMID: 35731179 DOI: 10.1111/nph.18335] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2022] [Accepted: 06/13/2022] [Indexed: 05/03/2023]
Abstract
Fully mycoheterotrophic plants can be difficult to place in plant phylogeny due to elevated substitution rates associated with photosynthesis loss. This potentially limits the effectiveness of downstream analyses of mycoheterotrophy that depend on accurate phylogenetic inference. Although mitochondrial genomic data sets are rarely used in plant phylogenetics, theory predicts that they should be resilient to long-branch artefacts, thanks to their generally slow evolution, coupled with limited rate elevation in heterotrophs. We examined the utility of mitochondrial genomes for resolving contentious higher-order placements of mycoheterotrophic lineages in two test cases: monocots (focusing on Dioscoreales) and Ericaceae. We find Thismiaceae to be distantly related to Burmanniaceae in the monocot order Dioscoreales, conflicting with current classification schemes based on few gene data sets. We confirm that the unusual Afrothismia is related to Taccaceae-Thismiaceae, with a corresponding independent loss of photosynthesis. In Ericaceae we recovered the first well supported relationships among its five major lineages: mycoheterotrophic Ericaceae are not monophyletic, as pyroloids are inferred to be sister to core Ericaceae, and monotropoids to arbutoids. Genes recovered from mitochondrial genomes collectively resolved previously ambiguous mycoheterotroph higher-order relationships. We propose that mitochondrial genomic data should be considered in standardised gene panels for inferring overall plant phylogeny.
Collapse
Affiliation(s)
- Qianshi Lin
- Department of Botany, University of British Columbia, 6270 University Boulevard, Vancouver, BC, V6T 1Z4, Canada
- Department of Biology, University of Toronto Mississauga, Mississauga, ON, L5L 1C6, Canada
- Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, M5S 2Z9, Canada
| | - Thomas W A Braukmann
- Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, M5S 2Z9, Canada
- Department of Pathology, Stanford University, Stanford, CA, 94305, USA
| | - Marybel Soto Gomez
- Department of Botany, University of British Columbia, 6270 University Boulevard, Vancouver, BC, V6T 1Z4, Canada
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AB, UK
| | - Juliana Lischka Sampaio Mayer
- Departamento de Biologia Vegetal, Universidade Estadual de Campinas, 255 Rua Monteiro Lobato, Campinas, São Paulo, 13.083-862, Brazil
| | - Fábio Pinheiro
- Departamento de Biologia Vegetal, Universidade Estadual de Campinas, 255 Rua Monteiro Lobato, Campinas, São Paulo, 13.083-862, Brazil
| | - Vincent S F T Merckx
- Naturalis Biodiversity Center, Vondellaan 55, 2332 AA, Leiden, the Netherlands
- Department of Evolutionary and Population Biology, Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, PO Box 94240, 1090 GE, Amsterdam, the Netherlands
| | - Saša Stefanović
- Department of Biology, University of Toronto Mississauga, Mississauga, ON, L5L 1C6, Canada
- Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, M5S 2Z9, Canada
| | - Sean W Graham
- Department of Botany, University of British Columbia, 6270 University Boulevard, Vancouver, BC, V6T 1Z4, Canada
| |
Collapse
|
9
|
Lozano-Fernandez J. A Practical Guide to Design and Assess a Phylogenomic Study. Genome Biol Evol 2022; 14:evac129. [PMID: 35946263 PMCID: PMC9452790 DOI: 10.1093/gbe/evac129] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/03/2022] [Indexed: 11/13/2022] Open
Abstract
Over the last decade, molecular systematics has undergone a change of paradigm as high-throughput sequencing now makes it possible to reconstruct evolutionary relationships using genome-scale datasets. The advent of "big data" molecular phylogenetics provided a battery of new tools for biologists but simultaneously brought new methodological challenges. The increase in analytical complexity comes at the price of highly specific training in computational biology and molecular phylogenetics, resulting very often in a polarized accumulation of knowledge (technical on one side and biological on the other). Interpreting the robustness of genome-scale phylogenetic studies is not straightforward, particularly as new methodological developments have consistently shown that the general belief of "more genes, more robustness" often does not apply, and because there is a range of systematic errors that plague phylogenomic investigations. This is particularly problematic because phylogenomic studies are highly heterogeneous in their methodology, and best practices are often not clearly defined. The main aim of this article is to present what I consider as the ten most important points to take into consideration when planning a well-thought-out phylogenomic study and while evaluating the quality of published papers. The goal is to provide a practical step-by-step guide that can be easily followed by nonexperts and phylogenomic novices in order to assess the technical robustness of phylogenomic studies or improve the experimental design of a project.
Collapse
Affiliation(s)
- Jesus Lozano-Fernandez
- Department of Genetics, Microbiology and Statistics, Biodiversity Research Institute (IRBio), University of Barcelona, Avd. Diagonal 643, 08028 Barcelona, Spain
- Institute of Evolutionary Biology (CSIC – Universitat Pompeu Fabra), Passeig marítim de la Barcelona 37-49, 08003 Barcelona, Spain
| |
Collapse
|
10
|
Behl A, Nair A, Mohagaonkar S, Yadav P, Gambhir K, Tyagi N, Sharma RK, Butola BS, Sharma N. Threat, challenges, and preparedness for future pandemics: A descriptive review of phylogenetic analysis based predictions. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2022; 98:105217. [PMID: 35065303 DOI: 10.1016/j.meegid.2022.105217] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Revised: 12/01/2021] [Accepted: 01/14/2022] [Indexed: 11/27/2022]
Abstract
For centuries the world has been confronted with many infectious diseases, with a potential to turn into a pandemic posing a constant threat to human lives. Some of these pandemics occurred due to the emergence of new disease or re-emergence of previously known diseases with a few mutations. In such scenarios their optimal prevention and control options were not adequately developed. Most of these diseases are highly contagious and for their timely control, knowledge about the pathogens and disease progression is the basic necessity. In this review, we have presented a documented chronology of the earlier pandemics, evolutionary analysis of the infectious disease with pandemic potential, the role of RNA, difficulties in controlling pandemics, and the likely pathogens that could trigger future pandemics. In this study, the evolutionary history of the pathogens was identified by carrying out phylogenetic analysis. The percentage similarity between different infectious diseases is critically analysed for the identification of their correlation using online sequence matcher tools. The Baltimore classification system was used for finding the genomic nature of the viruses. It was observed that most of the infectious pathogens rise from their animal hosts with some mutations in their genome composition. The phylogenetic tree shows that the single-stranded RNA diseases have a common origin and many of them are having high similarity percentage. The outcomes of this study will help in the identification of potential pathogens that can cause future pandemics. This information will be helpful in the development of early detection techniques, devising preventive mechanism to limit their spread, prophylactic measures, Infection control and therapeutic options, thereby, strengthening our approach towards global preparedness against future pandemics.
Collapse
Affiliation(s)
- Amanpreet Behl
- Department of Molecular Medicine, Jamia Hamdard Univeristy, Hamdard Nagar, New Delhi, Delhi 110062, India
| | - Ashrit Nair
- Department of Textile and Fibre Engineering, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India
| | - Sanika Mohagaonkar
- Department of Metabolism, Digestion and Reproduction, Imperial College, London, United Kingdom
| | - Pooja Yadav
- Department of Medical Elementology and Toxicology, Jamia Hamdard, Hamdard Nagar, New Delhi 110062, India
| | - Kirtida Gambhir
- Stem cell and Gene Therapy Research Group, Institute of Nuclear Medicine and Allied Sciences, Defence Research and Development Organisation, Delhi 110054, India
| | - Nishant Tyagi
- Stem cell and Gene Therapy Research Group, Institute of Nuclear Medicine and Allied Sciences, Defence Research and Development Organisation, Delhi 110054, India
| | - Rakesh Kumar Sharma
- Saveetha Institute of Medical and Technical Sciences, 162, Poonamallee High Road, Chennai 600077, Tamil Nadu, India
| | - Bhupendra Singh Butola
- Department of Textile and Fibre Engineering, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India
| | - Navneet Sharma
- Department of Textile and Fibre Engineering, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India.
| |
Collapse
|
11
|
Tumanov DV. End of a mystery: Integrative approach reveals the phylogenetic position of an enigmatic Antarctic tardigrade genus
Ramajendas
(Tardigrada, Eutardigrada). ZOOL SCR 2021. [DOI: 10.1111/zsc.12521] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Denis V. Tumanov
- Department of Invertebrate Zoology Faculty of Biology Saint Petersburg State University Saint Petersburg Russia
- Marine Research Laboratory Zoological Institute of the Russian Academy of Sciences Saint Petersburg Russia
| |
Collapse
|
12
|
Stringer DN, Bertozzi T, Meusemann K, Delean S, Guzik MT, Tierney SM, Mayer C, Cooper SJB, Javidkar M, Zwick A, Austin AD. Development and evaluation of a custom bait design based on 469 single-copy protein-coding genes for exon capture of isopods (Philosciidae: Haloniscus). PLoS One 2021; 16:e0256861. [PMID: 34534224 PMCID: PMC8448321 DOI: 10.1371/journal.pone.0256861] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 08/17/2021] [Indexed: 12/02/2022] Open
Abstract
Transcriptome-based exon capture approaches, along with next-generation sequencing, are allowing for the rapid and cost-effective production of extensive and informative phylogenomic datasets from non-model organisms for phylogenetics and population genetics research. These approaches generally employ a reference genome to infer the intron-exon structure of targeted loci and preferentially select longer exons. However, in the absence of an existing and well-annotated genome, we applied this exon capture method directly, without initially identifying intron-exon boundaries for bait design, to a group of highly diverse Haloniscus (Philosciidae), paraplatyarthrid and armadillid isopods, and examined the performance of our methods and bait design for phylogenetic inference. Here, we identified an isopod-specific set of single-copy protein-coding loci, and a custom bait design to capture targeted regions from 469 genes, and analysed the resulting sequence data with a mapping approach and newly-created post-processing scripts. We effectively recovered a large and informative dataset comprising both short (<100 bp) and longer (>300 bp) exons, with high uniformity in sequencing depth. We were also able to successfully capture exon data from up to 16-year-old museum specimens along with more distantly related outgroup taxa, and efficiently pool multiple samples prior to capture. Our well-resolved phylogenies highlight the overall utility of this methodological approach and custom bait design, which offer enormous potential for application to future isopod, as well as broader crustacean, molecular studies.
Collapse
Affiliation(s)
- Danielle N. Stringer
- Australian Centre for Evolutionary Biology and Biodiversity, School of Biological Sciences, The University of Adelaide, Adelaide, South Australia, Australia
- South Australian Museum, Adelaide, South Australia, Australia
- * E-mail:
| | - Terry Bertozzi
- Australian Centre for Evolutionary Biology and Biodiversity, School of Biological Sciences, The University of Adelaide, Adelaide, South Australia, Australia
- South Australian Museum, Adelaide, South Australia, Australia
| | - Karen Meusemann
- Evolutionary Biology and Ecology, Institute for Biology I, University of Freiburg, Freiburg, Germany
- Australian National Insect Collection, CSIRO National Research Collections Australia, Acton, Australian Capital Territory, Australia
- Center for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, Bonn, Germany
| | - Steven Delean
- School of Biological Sciences and the Environment Institute, The University of Adelaide, Adelaide, South Australia, Australia
| | - Michelle T. Guzik
- Australian Centre for Evolutionary Biology and Biodiversity, School of Biological Sciences, The University of Adelaide, Adelaide, South Australia, Australia
| | - Simon M. Tierney
- Australian Centre for Evolutionary Biology and Biodiversity, School of Biological Sciences, The University of Adelaide, Adelaide, South Australia, Australia
- Hawkesbury Institute for the Environment, Western Sydney University, Richmond, New South Wales, Australia
| | - Christoph Mayer
- Center for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, Bonn, Germany
| | - Steven J. B. Cooper
- Australian Centre for Evolutionary Biology and Biodiversity, School of Biological Sciences, The University of Adelaide, Adelaide, South Australia, Australia
- South Australian Museum, Adelaide, South Australia, Australia
| | - Mohammad Javidkar
- Australian Centre for Evolutionary Biology and Biodiversity, School of Biological Sciences, The University of Adelaide, Adelaide, South Australia, Australia
| | - Andreas Zwick
- Australian National Insect Collection, CSIRO National Research Collections Australia, Acton, Australian Capital Territory, Australia
| | - Andrew D. Austin
- Australian Centre for Evolutionary Biology and Biodiversity, School of Biological Sciences, The University of Adelaide, Adelaide, South Australia, Australia
- South Australian Museum, Adelaide, South Australia, Australia
| |
Collapse
|
13
|
Duchêne DA, Mather N, Van Der Wal C, Ho SYW. Excluding loci with substitution saturation improves inferences from phylogenomic data. Syst Biol 2021; 71:676-689. [PMID: 34508605 PMCID: PMC9016599 DOI: 10.1093/sysbio/syab075] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Accepted: 09/07/2021] [Indexed: 11/21/2022] Open
Abstract
The historical signal in nucleotide sequences becomes eroded over time by substitutions occurring repeatedly at the same sites. This phenomenon, known as substitution saturation, is recognized as one of the primary obstacles to deep-time phylogenetic inference using genome-scale data sets. We present a new test of substitution saturation and demonstrate its performance in simulated and empirical data. For some of the 36 empirical phylogenomic data sets that we examined, we detect substitution saturation in around 50% of loci. We found that saturation tends to be flagged as problematic in loci with highly discordant phylogenetic signals across sites. Within each data set, the loci with smaller numbers of informative sites are more likely to be flagged as containing problematic levels of saturation. The entropy saturation test proposed here is sensitive to high evolutionary rates relative to the evolutionary timeframe, while also being sensitive to several factors known to mislead phylogenetic inference, including short internal branches relative to external branches, short nucleotide sequences, and tree imbalance. Our study demonstrates that excluding loci with substitution saturation can be an effective means of mitigating the negative impact of multiple substitutions on phylogenetic inferences. [Phylogenetic model performance; phylogenomics; substitution model; substitution saturation; test statistics.]
Collapse
Affiliation(s)
- David A Duchêne
- Centre for Evolutionary Hologenomics, University of Copenhagen, 1352 Copenhagen, Denmark
| | - Niklas Mather
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia
| | - Cara Van Der Wal
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia
| | - Simon Y W Ho
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
14
|
Li X, Teasdale LC, Bayless KM, Ellis AG, Wiegmann BM, Lamas CJE, Lambkin CL, Evenhuis NL, Nicholls JA, Hartley D, Shin S, Trautwein M, Zwick A, Lessard BD, Yeates DK. Phylogenomics reveals accelerated late Cretaceous diversification of bee flies (Diptera: Bombyliidae). Cladistics 2021; 37:276-297. [PMID: 34478201 DOI: 10.1111/cla.12436] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 07/07/2020] [Accepted: 08/18/2020] [Indexed: 02/06/2023] Open
Abstract
Bombyliidae is a very species-rich and widespread family of parasitoid flies with more than 250 genera classified into 17 extant subfamilies. However, little is known about their evolutionary history or how their present-day diversity was shaped. Transcriptomes of 15 species and anchored hybrid enrichment (AHE) sequence captures of 86 species, representing 94 bee fly species and 14 subfamilies, were used to reconstruct the phylogeny of Bombyliidae. We integrated data from transcriptomes across each of the main lineages in our AHE tree to build a data set with more genes (550 loci versus 216 loci) and higher support levels. Our overall results show strong congruence with the current classification of the family, with 11 out of 14 included subfamilies recovered as monophyletic. Heterotropinae and Mythicomyiinae are successive sister groups to the remainder of the family. We examined the evolution of key morphological characters through our phylogenetic hypotheses and show that neither the "sand chamber subfamilies" nor the "Tomophthalmae" are monophyletic in our phylogenomic analyses. Based on our results, we reinstate two tribes at the subfamily level (Phthiriinae stat. rev. and Ecliminae stat. rev.) and we include the genus Sericosoma Macquart (previously incertae sedis) in the subfamily Oniromyiinae, bringing the total number of bee fly subfamilies to 19. Our dating analyses indicate a Jurassic origin of the family (165-194 Ma), with the sand chamber evolving early in bee fly evolution, in the late Jurassic or mid-Cretaceous (100-165 Ma). We hypothesize that the angiosperm radiation and the hothouse climate established during the late Cretaceous accelerated the diversification of bee flies, by providing an expanded range of resources for the parasitoid larvae and nectarivorous adults.
Collapse
Affiliation(s)
- Xuankun Li
- Australian National Insect Collection, CSIRO National Research Collections Australia, Canberra, ACT, 2601, Australia.,Research School of Biology, Australian National University, Canberra, ACT, 2601, Australia
| | - Luisa C Teasdale
- Australian National Insect Collection, CSIRO National Research Collections Australia, Canberra, ACT, 2601, Australia
| | - Keith M Bayless
- Australian National Insect Collection, CSIRO National Research Collections Australia, Canberra, ACT, 2601, Australia
| | - Allan G Ellis
- Botany and Zoology Department, Stellenbosch University, Private Bag X1, Matieland, 7602, South Africa
| | - Brian M Wiegmann
- Department of Entomology & Plant Pathology, North Carolina State University, Raleigh, NC, 27695, USA
| | - Carlos José E Lamas
- Museu de Zoologia da Universidade de São Paulo. Avenida Nazaré, 481 Ipiranga 04263-000, São Paulo, SP, Brazil
| | | | - Neal L Evenhuis
- J. Linsley Gressitt Center for Research in Entomology, Bishop Museum, 1525 Bernice Street, Honolulu, HI, 96817, USA
| | - James A Nicholls
- Australian National Insect Collection, CSIRO National Research Collections Australia, Canberra, ACT, 2601, Australia
| | - Diana Hartley
- Australian National Insect Collection, CSIRO National Research Collections Australia, Canberra, ACT, 2601, Australia
| | - Seunggwan Shin
- Department of Biological Sciences, University of Memphis, Memphis, TN, 38152, USA.,School of Biological Sciences, Seoul National University, Seoul, 08826, Korea
| | - Michelle Trautwein
- Entomology Department, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, San Francisco, CA, 94118, USA
| | - Andreas Zwick
- Australian National Insect Collection, CSIRO National Research Collections Australia, Canberra, ACT, 2601, Australia
| | - Bryan D Lessard
- Australian National Insect Collection, CSIRO National Research Collections Australia, Canberra, ACT, 2601, Australia
| | - David K Yeates
- Australian National Insect Collection, CSIRO National Research Collections Australia, Canberra, ACT, 2601, Australia
| |
Collapse
|
15
|
Mongiardino Koch N. Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci. Mol Biol Evol 2021; 38:4025-4038. [PMID: 33983409 DOI: 10.1101/2021.02.13.431075] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/21/2023] Open
Abstract
Phylogenomic subsampling is a procedure by which small sets of loci are selected from large genome-scale data sets and used for phylogenetic inference. This step is often motivated by either computational limitations associated with the use of complex inference methods or as a means of testing the robustness of phylogenetic results by discarding loci that are deemed potentially misleading. Although many alternative methods of phylogenomic subsampling have been proposed, little effort has gone into comparing their behavior across different data sets. Here, I calculate multiple gene properties for a range of phylogenomic data sets spanning animal, fungal, and plant clades, uncovering a remarkable predictability in their patterns of covariance. I also show how these patterns provide a means for ordering loci by both their rate of evolution and their relative phylogenetic usefulness. This method of retrieving phylogenetically useful loci is found to be among the top performing when compared with alternative subsampling protocols. Relatively common approaches such as minimizing potential sources of systematic bias or increasing the clock-likeness of the data are found to fare worse than selecting loci at random. Likewise, the general utility of rate-based subsampling is found to be limited: loci evolving at both low and high rates are among the least effective, and even those evolving at optimal rates can still widely differ in usefulness. This study shows that many common subsampling approaches introduce unintended effects in off-target gene properties and proposes an alternative multivariate method that simultaneously optimizes phylogenetic signal while controlling for known sources of bias.
Collapse
|
16
|
Abstract
Phylogenomic subsampling is a procedure by which small sets of loci are selected from large genome-scale data sets and used for phylogenetic inference. This step is often motivated by either computational limitations associated with the use of complex inference methods or as a means of testing the robustness of phylogenetic results by discarding loci that are deemed potentially misleading. Although many alternative methods of phylogenomic subsampling have been proposed, little effort has gone into comparing their behavior across different data sets. Here, I calculate multiple gene properties for a range of phylogenomic data sets spanning animal, fungal, and plant clades, uncovering a remarkable predictability in their patterns of covariance. I also show how these patterns provide a means for ordering loci by both their rate of evolution and their relative phylogenetic usefulness. This method of retrieving phylogenetically useful loci is found to be among the top performing when compared with alternative subsampling protocols. Relatively common approaches such as minimizing potential sources of systematic bias or increasing the clock-likeness of the data are found to fare worse than selecting loci at random. Likewise, the general utility of rate-based subsampling is found to be limited: loci evolving at both low and high rates are among the least effective, and even those evolving at optimal rates can still widely differ in usefulness. This study shows that many common subsampling approaches introduce unintended effects in off-target gene properties and proposes an alternative multivariate method that simultaneously optimizes phylogenetic signal while controlling for known sources of bias.
Collapse
|
17
|
Literman R, Schwartz R. Genome-Scale Profiling Reveals Noncoding Loci Carry Higher Proportions of Concordant Data. Mol Biol Evol 2021; 38:2306-2318. [PMID: 33528497 PMCID: PMC8136493 DOI: 10.1093/molbev/msab026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Many evolutionary relationships remain controversial despite whole-genome sequencing data. These controversies arise, in part, due to challenges associated with accurately modeling the complex phylogenetic signal coming from genomic regions experiencing distinct evolutionary forces. Here, we examine how different regions of the genome support or contradict well-established relationships among three mammal groups using millions of orthologous parsimony-informative biallelic sites (PIBS) distributed across primate, rodent, and Pecora genomes. We compared PIBS concordance percentages among locus types (e.g. coding sequences (CDS), introns, intergenic regions), and contrasted PIBS utility over evolutionary timescales. Sites derived from noncoding sequences provided more data and proportionally more concordant sites compared with those from CDS in all clades. CDS PIBS were also predominant drivers of tree incongruence in two cases of topological conflict. PIBS derived from most locus types provided surprisingly consistent support for splitting events spread across the timescales we examined, although we find evidence that CDS and intronic PIBS may, respectively and to a limited degree, inform disproportionately about older and younger splits. In this era of accessible wholegenome sequence data, these results:1) suggest benefits to more intentionally focusing on noncoding loci as robust data for tree inference and 2) reinforce the importance of accurate modeling, especially when using CDS data.
Collapse
Affiliation(s)
- Robert Literman
- Department of Biological Sciences, University of Rhode Island, South Kingstown, RI, USA.,Center for Food Safety and Applied Nutrition, Office of Regulatory Science, U.S. Food and Drug Administration, College Park, MD, USA
| | - Rachel Schwartz
- Department of Biological Sciences, University of Rhode Island, South Kingstown, RI, USA
| |
Collapse
|
18
|
Shapiro JT, Víquez-R L, Leopardi S, Vicente-Santos A, Mendenhall IH, Frick WF, Kading RC, Medellín RA, Racey P, Kingston T. Setting the Terms for Zoonotic Diseases: Effective Communication for Research, Conservation, and Public Policy. Viruses 2021; 13:1356. [PMID: 34372562 PMCID: PMC8310020 DOI: 10.3390/v13071356] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 06/29/2021] [Accepted: 07/04/2021] [Indexed: 12/19/2022] Open
Abstract
Many of the world's most pressing issues, such as the emergence of zoonotic diseases, can only be addressed through interdisciplinary research. However, the findings of interdisciplinary research are susceptible to miscommunication among both professional and non-professional audiences due to differences in training, language, experience, and understanding. Such miscommunication contributes to the misunderstanding of key concepts or processes and hinders the development of effective research agendas and public policy. These misunderstandings can also provoke unnecessary fear in the public and have devastating effects for wildlife conservation. For example, inaccurate communication and subsequent misunderstanding of the potential associations between certain bats and zoonoses has led to persecution of diverse bats worldwide and even government calls to cull them. Here, we identify four types of miscommunication driven by the use of terminology regarding bats and the emergence of zoonotic diseases that we have categorized based on their root causes: (1) incorrect or overly broad use of terms; (2) terms that have unstable usage within a discipline, or different usages among disciplines; (3) terms that are used correctly but spark incorrect inferences about biological processes or significance in the audience; (4) incorrect inference drawn from the evidence presented. We illustrate each type of miscommunication with commonly misused or misinterpreted terms, providing a definition, caveats and common misconceptions, and suggest alternatives as appropriate. While we focus on terms specific to bats and disease ecology, we present a more general framework for addressing miscommunication that can be applied to other topics and disciplines to facilitate more effective research, problem-solving, and public policy.
Collapse
Affiliation(s)
- Julie Teresa Shapiro
- Department of Life Sciences, Ben-Gurion University of the Negev, Be’er Sheva 8410501, Israel
| | - Luis Víquez-R
- Institute of Evolutionary Ecology and Conservation Genomics, University of Ulm, 89069 Ulm, Germany;
| | - Stefania Leopardi
- Laboratory of Emerging Viral Zoonoses, Istituto Zooprofilattico Sperimentale delle Venezie, 35020 Legnaro, Italy;
| | - Amanda Vicente-Santos
- Graduate Program in Population Biology, Ecology and Evolution, Emory University, Atlanta, GA 30322, USA;
| | - Ian H. Mendenhall
- Programme in Emerging Infectious Diseases, Duke-NUS Medical School, Singapore 169857, Singapore;
| | - Winifred F. Frick
- Bat Conservation International, Austin, TX 78746, USA;
- Department of Ecology and Evolution, University of California, Santa Cruz, CA 95060, USA
| | - Rebekah C. Kading
- Department of Microbiology, Immunology and Pathology, Colorado State University, Fort Collins, CO 80523, USA;
| | - Rodrigo A. Medellín
- Institute of Ecology, National Autonomous University of Mexico (UNAM), Mexico City 04510, Mexico;
| | - Paul Racey
- The Centre for Ecology and Conservation, University of Exeter, Exeter TR10 9FE, UK;
| | - Tigga Kingston
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA
| |
Collapse
|
19
|
Vankan M, Ho SYW, Duchêne DA. Evolutionary Rate Variation Among Lineages in Gene Trees has a Negative Impact on Species-Tree Inference. Syst Biol 2021; 71:490-500. [PMID: 34255084 PMCID: PMC8830059 DOI: 10.1093/sysbio/syab051] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 06/18/2021] [Indexed: 11/12/2022] Open
Abstract
Phylogenetic analyses of genomic data provide a powerful means of reconstructing the evolutionary relationships among organisms, yet such analyses are often hindered by conflicting phylogenetic signals among loci. Identifying the signals that are most influential to species-tree estimation can help to inform the choice of data for phylogenomic analysis. We investigated this in an analysis of 30 phylogenomic data sets. For each data set, we examined the association between several branch-length characteristics of gene trees and the distance between these gene trees and the corresponding species trees. We found that the distance of each gene tree to the species tree inferred from the full data set was positively associated with variation in root-to-tip distances and negatively associated with mean branch support. However, no such associations were found for gene-tree length, a measure of the overall substitution rate at each locus. We further explored the usefulness of the best-performing branch-based characteristics for selecting loci for phylogenomic analyses. We found that loci that yield gene trees with high variation in root-to-tip distances have a disproportionately distant signal of tree topology compared with the complete data sets. These results suggest that rate variation across lineages should be taken into consideration when exploring and even selecting loci for phylogenomic analysis.[Branch support; data filtering; nucleotide substitution model; phylogenomics; substitution rate; summary coalescent methods.]
Collapse
Affiliation(s)
- Mezzalina Vankan
- School of Life and Environmental Sciences, University of Sydney, NSW 2006, Australia.,Research School of Biology, Australian National University, ACT 2601, Australia
| | - Simon Y W Ho
- School of Life and Environmental Sciences, University of Sydney, NSW 2006, Australia
| | - David A Duchêne
- Research School of Biology, Australian National University, ACT 2601, Australia.,Centre for Evolutionary Hologenomics, University of Copenhagen, Copenhagen 1352, Denmark
| |
Collapse
|
20
|
Vázquez-Miranda H, Barker FK. Autosomal, sex-linked and mitochondrial loci resolve evolutionary relationships among wrens in the genus Campylorhynchus. Mol Phylogenet Evol 2021; 163:107242. [PMID: 34224849 DOI: 10.1016/j.ympev.2021.107242] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Revised: 06/14/2021] [Accepted: 06/29/2021] [Indexed: 01/18/2023]
Abstract
Although there is general consensus that sampling of multiple genetic loci is critical in accurate reconstruction of species trees, the exact numbers and the best types of molecular markers remain an open question. In particular, the phylogenetic utility of sex-linked loci is underexplored. Here, we sample all species and 70% of the named diversity of the New World wren genus Campylorhynchus using sequences from 23 loci, to evaluate the effects of linkage on efficiency in recovering a well-supported tree for the group. At a tree-wide level, we found that most loci supported fewer than half the possible clades and that sex-linked loci produced similar resolution to slower-coalescing autosomal markers, controlling for locus length. By contrast, we did find evidence that linkage affected the efficiency of recovery of individual relationships; as few as two sex-linked loci were necessary to resolve a selection of clades with long to medium subtending branches, whereas 4-6 autosomal loci were necessary to achieve comparable results. These results support an expanded role for sampling of the avian Z chromosome in phylogenetic studies, including target enrichment approaches. Our concatenated and species tree analyses represent significant improvements in our understanding of diversification in Campylorhynchus, and suggest a relatively complex scenario for its radiation across the Miocene/Pliocene boundary, with multiple invasions of South America.
Collapse
Affiliation(s)
- Hernán Vázquez-Miranda
- Departamento de Zoología, Instituto de Biología, Universidad Nacional Autónoma de México, Ciudad de México C.P. 04510, Mexico
| | - F Keith Barker
- Department of Ecology, Evolution and Behavior, Bell Museum of Natural History, University of Minnesota, 40 Gortner Laboratory, 1479 Gortner Avenue, Saint Paul, MN 55108, USA
| |
Collapse
|
21
|
Takezaki N. Resolving the Early Divergence Pattern of Teleost Fish Using Genome-Scale Data. Genome Biol Evol 2021; 13:6178791. [PMID: 33739405 PMCID: PMC8103497 DOI: 10.1093/gbe/evab052] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/10/2021] [Indexed: 12/13/2022] Open
Abstract
Regarding the phylogenetic relationship of the three primary groups of teleost fishes, Osteoglossomorpha (bonytongues and others), Elopomorpha (eels and relatives), Clupeocephala (the remaining teleost fish), early morphological studies hypothesized the first divergence of Osteoglossomorpha, whereas the recent prevailing view is the first divergence of Elopomorpha. Molecular studies supported all the possible relationships of the three primary groups. This study analyzed genome-scale data from four previous studies: 1) 412 genes from 12 species, 2) 772 genes from 15 species, 3) 1,062 genes from 30 species, and 4) 491 UCE loci from 27 species. The effects of the species, loci, and models used on the constructed tree topologies were investigated. In the analyses of the data sets (1)–(3), although the first divergence of Clupeocephala that left the other two groups in a sister relationship was supported by concatenated sequences and gene trees of all the species and genes, the first divergence of Elopomorpha among the three groups was supported using species and/or genes with low divergence of sequence and amino-acid frequencies. This result corresponded to that of the UCE data set (4), whose sequence divergence was low, which supported the first divergence of Elopomorpha with high statistical significance. The increase in accuracy of the phylogenetic construction by using species and genes with low sequence divergence was predicted by a phylogenetic informativeness approach and confirmed by computer simulation. These results supported that Elopomorpha was the first basal group of teleost fish to have diverged, consistent with the prevailing view of recent morphological studies.
Collapse
Affiliation(s)
- Naoko Takezaki
- Life Science Research Center, Kagawa University, Mikicho, Kitagun, Kagawa, Japan
| |
Collapse
|
22
|
Spasojevic T, Broad GR, Sääksjärvi IE, Schwarz M, Ito M, Korenko S, Klopfstein S. Mind the Outgroup and Bare Branches in Total-Evidence Dating: a Case Study of Pimpliform Darwin Wasps (Hymenoptera, Ichneumonidae). Syst Biol 2021; 70:322-339. [PMID: 33057674 PMCID: PMC7875445 DOI: 10.1093/sysbio/syaa079] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Revised: 10/02/2020] [Accepted: 10/02/2020] [Indexed: 01/16/2023] Open
Abstract
Taxon sampling is a central aspect of phylogenetic study design, but it has received limited attention in the context of total-evidence dating, a widely used dating approach that directly integrates molecular and morphological information from extant and fossil taxa. We here assess the impact of commonly employed outgroup sampling schemes and missing morphological data in extant taxa on age estimates in a total-evidence dating analysis under the uniform tree prior. Our study group is Pimpliformes, a highly diverse, rapidly radiating group of parasitoid wasps of the family Ichneumonidae. We analyze a data set comprising 201 extant and 79 fossil taxa, including the oldest fossils of the family from the Early Cretaceous and the first unequivocal representatives of extant subfamilies from the mid-Paleogene. Based on newly compiled molecular data from ten nuclear genes and a morphological matrix that includes 222 characters, we show that age estimates become both older and less precise with the inclusion of more distant and more poorly sampled outgroups. These outgroups not only lack morphological and temporal information but also sit on long terminal branches and considerably increase the evolutionary rate heterogeneity. In addition, we discover an artifact that might be detrimental for total-evidence dating: "bare-branch attraction," namely high attachment probabilities of certain fossils to terminal branches for which morphological data are missing. Using computer simulations, we confirm the generality of this phenomenon and show that a large phylogenetic distance to any of the extant taxa, rather than just older age, increases the risk of a fossil being misplaced due to bare-branch attraction. After restricting outgroup sampling and adding morphological data for the previously attracting, bare branches, we recover a Jurassic origin for Pimpliformes and Ichneumonidae. This first age estimate for the group not only suggests an older origin than previously thought but also that diversification of the crown group happened well before the Cretaceous-Paleogene boundary. Our case study demonstrates that in order to obtain robust age estimates, total-evidence dating studies need to be based on a thorough and balanced sampling of both extant and fossil taxa, with the aim of minimizing evolutionary rate heterogeneity and missing morphological information. [Bare-branch attraction; ichneumonids; fossils; morphological matrix; phylogeny; RoguePlots.].
Collapse
Affiliation(s)
- Tamara Spasojevic
- Abteilung Wirbellose Tiere Invertebrates, Naturhistorisches Museum der Burgergemeinde Bern, Bernastrasse 15, 3005 Bern, Switzerland
- Institute of Ecology and Evolution, Department of Biology, University of Bern, 3012 Bern, Switzerland
- Department of Entomology, National Museum of Natural History, Washington, DC 20560, USA
| | - Gavin R Broad
- Department of Life Sciences, Natural History Museum, London SW7 5BD, UK
| | | | | | - Masato Ito
- Graduate School of Agricultural Science, Department of Agrobioscience, Kobe University, 657-8501 Japan
| | - Stanislav Korenko
- Department of Agroecology and Crop Production, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, 165 21 Prague 6, Suchdol, Czech Republic
| | - Seraina Klopfstein
- Abteilung Wirbellose Tiere Invertebrates, Naturhistorisches Museum der Burgergemeinde Bern, Bernastrasse 15, 3005 Bern, Switzerland
- Institute of Ecology and Evolution, Department of Biology, University of Bern, 3012 Bern, Switzerland
- Abteilung für Biowissenschaften, Naturhistorisches Museum Basel, 4051 Basel, Switzerland
| |
Collapse
|
23
|
Unraveling the Global Phylodynamic and Phylogeographic Expansion of Mycoplasma gallisepticum: Understanding the Origin and Expansion of This Pathogen in Ecuador. Pathogens 2020; 9:pathogens9090674. [PMID: 32825097 PMCID: PMC7557814 DOI: 10.3390/pathogens9090674] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 07/31/2020] [Accepted: 08/18/2020] [Indexed: 12/17/2022] Open
Abstract
Mycoplasma gallisepticum (MG) is among the most significant problems in the poultry industry worldwide, representing a serious threat to international trade. Despite the fact that the mgc2 gene has been widely used for diagnostic and molecular characterization purposes, there is a lack of evidence supporting the reliability of this gene as a marker for molecular epidemiology approaches. Therefore, the current study aimed to assess the accuracy of the mgc2 gene for phylogenetic, phylodynamic, and phylogeographic evaluations. Furthermore, the global phylodynamic expansion of MG is described, and the origin and extension of the outbreak caused by MG in Ecuador were tracked and characterized. The results obtained strongly supported the use of the mgc2 gene as a reliable phylogenetic marker and accurate estimator for the temporal and phylogeographic structure reconstruction of MG. The phylodynamic analysis denoted the failures in the current policies to control MG and highlighted the imperative need to implement more sensitive methodologies of diagnosis and more efficient vaccines. Framed in Ecuador, the present study provides the first piece of evidence of the circulation of virulent field MG strains in Ecuadorian commercial poultry. The findings derived from the current study provide novel and significant insights into the origin, diversification, and evolutionary process of MG globally.
Collapse
|
24
|
Huang J, Flouri T, Yang Z. A Simulation Study to Examine the Information Content in Phylogenomic Data Sets under the Multispecies Coalescent Model. Mol Biol Evol 2020; 37:3211-3224. [DOI: 10.1093/molbev/msaa166] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
AbstractWe use computer simulation to examine the information content in multilocus data sets for inference under the multispecies coalescent model. Inference problems considered include estimation of evolutionary parameters (such as species divergence times, population sizes, and cross-species introgression probabilities), species tree estimation, and species delimitation based on Bayesian comparison of delimitation models. We found that the number of loci is the most influential factor for almost all inference problems examined. Although the number of sequences per species does not appear to be important to species tree estimation, it is very influential to species delimitation. Increasing the number of sites and the per-site mutation rate both increase the mutation rate for the whole locus and these have the same effect on estimation of parameters, but the sequence length has a greater effect than the per-site mutation rate for species tree estimation. We discuss the computational costs when the data size increases and provide guidelines concerning the subsampling of genomic data to enable the application of full-likelihood methods of inference.
Collapse
Affiliation(s)
- Jun Huang
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
- Department of Mathematics, Beijing Jiaotong University, Beijing, P.R. China
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| |
Collapse
|
25
|
Karin BR, Gamble T, Jackman TR. Optimizing Phylogenomics with Rapidly Evolving Long Exons: Comparison with Anchored Hybrid Enrichment and Ultraconserved Elements. Mol Biol Evol 2020; 37:904-922. [PMID: 31710677 PMCID: PMC7038749 DOI: 10.1093/molbev/msz263] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Marker selection has emerged as an important component of phylogenomic study design due to rising concerns of the effects of gene tree estimation error, model misspecification, and data-type differences. Researchers must balance various trade-offs associated with locus length and evolutionary rate among other factors. The most commonly used reduced representation data sets for phylogenomics are ultraconserved elements (UCEs) and Anchored Hybrid Enrichment (AHE). Here, we introduce Rapidly Evolving Long Exon Capture (RELEC), a new set of loci that targets single exons that are both rapidly evolving (evolutionary rate faster than RAG1) and relatively long in length (>1,500 bp), while at the same time avoiding paralogy issues across amniotes. We compare the RELEC data set to UCEs and AHE in squamate reptiles by aligning and analyzing orthologous sequences from 17 squamate genomes, composed of 10 snakes and 7 lizards. The RELEC data set (179 loci) outperforms AHE and UCEs by maximizing per-locus genetic variation while maintaining presence and orthology across a range of evolutionary scales. RELEC markers show higher phylogenetic informativeness than UCE and AHE loci, and RELEC gene trees show greater similarity to the species tree than AHE or UCE gene trees. Furthermore, with fewer loci, RELEC remains computationally tractable for full Bayesian coalescent species tree analyses. We contrast RELEC to and discuss important aspects of comparable methods, and demonstrate how RELEC may be the most effective set of loci for resolving difficult nodes and rapid radiations. We provide several resources for capturing or extracting RELEC loci from other amniote groups.
Collapse
Affiliation(s)
- Benjamin R Karin
- Department of Biology, Villanova University, Villanova, PA
- Museum of Vertebrate Zoology and Department of Integrative Biology, University of California, Berkeley, CA
| | - Tony Gamble
- Department of Biological Sciences, Marquette University, Milwaukee, WI
- Milwaukee Public Museum, Milwaukee, WI
- Bell Museum of Natural History, University of Minnesota, St. Paul, MN
| | - Todd R Jackman
- Department of Biology, Villanova University, Villanova, PA
| |
Collapse
|
26
|
Bellot S, Mitchell TC, Schaefer H. Phylogenetic informativeness analyses to clarify past diversification processes in Cucurbitaceae. Sci Rep 2020; 10:488. [PMID: 31949198 PMCID: PMC6965171 DOI: 10.1038/s41598-019-57249-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 12/20/2019] [Indexed: 01/12/2023] Open
Abstract
Phylogenomic studies have so far mostly relied on genome skimming or target sequence capture, which suffer from representation bias and can fail to resolve relationships even with hundreds of loci. Here, we explored the potential of phylogenetic informativeness and tree confidence analyses to interpret phylogenomic datasets. We studied Cucurbitaceae because their small genome size allows cost-efficient genome skimming, and many relationships in the family remain controversial, preventing inferences on the evolution of characters such as sexual system or floral morphology. Genome skimming and PCR allowed us to retrieve the plastome, 57 single copy nuclear genes, and the nuclear ribosomal ITS from 29 species representing all but one tribe of Cucurbitaceae. Node support analyses revealed few inter-locus conflicts but a pervasive lack of phylogenetic signal among plastid loci, suggesting a fast divergence of Cucurbitaceae tribes. Data filtering based on phylogenetic informativeness and risk of homoplasy clarified tribe-level relationships, which support two independent evolutions of fringed petals in the family. Our study illustrates how formal analysis of phylogenomic data can increase our understanding of past diversification processes. Our data and results will facilitate the design of well-sampled phylogenomic studies in Cucurbitaceae and related families.
Collapse
Affiliation(s)
| | - Thomas C Mitchell
- Plant Biodiversity Research, Department Ecology & Ecosystem Management, Technical University of Munich, Emil-Ramann Strasse 2, 85354, Freising, Germany
| | - Hanno Schaefer
- Plant Biodiversity Research, Department Ecology & Ecosystem Management, Technical University of Munich, Emil-Ramann Strasse 2, 85354, Freising, Germany.
| |
Collapse
|
27
|
Duchêne DA, Tong KJ, Foster CSP, Duchêne S, Lanfear R, Ho SYW. Linking Branch Lengths across Sets of Loci Provides the Highest Statistical Support for Phylogenetic Inference. Mol Biol Evol 2019; 37:1202-1210. [DOI: 10.1093/molbev/msz291] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
AbstractEvolution leaves heterogeneous patterns of nucleotide variation across the genome, with different loci subject to varying degrees of mutation, selection, and drift. In phylogenetics, the potential impacts of partitioning sequence data for the assignment of substitution models are well appreciated. In contrast, the treatment of branch lengths has received far less attention. In this study, we examined the effects of linking and unlinking branch-length parameters across loci or subsets of loci. By analyzing a range of empirical data sets, we find consistent support for a model in which branch lengths are proportionate between subsets of loci: gene trees share the same pattern of branch lengths, but form subsets that vary in their overall tree lengths. These models had substantially better statistical support than models that assume identical branch lengths across gene trees, or those in which genes form subsets with distinct branch-length patterns. We show using simulations and empirical data that the complexity of the branch-length model with the highest support depends on the length of the sequence alignment and on the numbers of taxa and loci in the data set. Our findings suggest that models in which branch lengths are proportionate between subsets have the highest statistical support under the conditions that are most commonly seen in practice. The results of our study have implications for model selection, computational efficiency, and experimental design in phylogenomics.
Collapse
Affiliation(s)
- David A Duchêne
- Research School of Biology, Australian National University, Canberra, ACT, Australia
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| | - K Jun Tong
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| | - Charles S P Foster
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| | - Sebastián Duchêne
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC, Australia
| | - Robert Lanfear
- Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Simon Y W Ho
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
28
|
Lamarca AP, Schrago CG. Fast speciations and slow genes: uncovering the root of living canids. Biol J Linn Soc Lond 2019. [DOI: 10.1093/biolinnean/blz181] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Abstract
Despite ongoing efforts relying on computationally intensive tree-building methods and large datasets, the deeper phylogenetic relationships between living canid genera remain controversial. We demonstrate that this issue arises fundamentally from the uncertainty of root placement as a consequence of the short length of the branch connecting the major canid clades, which probably resulted from a fast radiation during the early diversification of extant Canidae. Using both nuclear and mitochondrial genes, we investigate the position of the canid root and its consistency by using three rooting methods. We find that mitochondrial genomes consistently retrieve a root node separating the tribe Canini from the remaining canids, whereas nuclear data mostly recover a root that places the Urocyon foxes as the sister lineage of living canids. We demonstrate that, to resolve the canid root, the nuclear segments sequenced so far are significantly less informative than mitochondrial genomes. We also propose that short intervals between speciations obscure the place of the true root, because methods are susceptible to stochastic error in the presence of short internal branches near the root.
Collapse
Affiliation(s)
- Alessandra P Lamarca
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Carlos G Schrago
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
29
|
Identifying genetic markers for a range of phylogenetic utility-From species to family level. PLoS One 2019; 14:e0218995. [PMID: 31369563 PMCID: PMC6675087 DOI: 10.1371/journal.pone.0218995] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 06/13/2019] [Indexed: 12/03/2022] Open
Abstract
Resolving the phylogenetic relationships of closely related species using a small set of loci is challenging as sufficient information may not be captured from a limited sample of the genome. Relying on few loci can also be problematic when conflict between gene-trees arises from incomplete lineage sorting and/or ongoing hybridization, problems especially likely in recently diverged lineages. Here, we developed a method using limited genomic resources that allows identification of many low copy candidate loci from across the nuclear and chloroplast genomes, design probes for target capture and sequence the captured loci. To validate our method we present data from Eucalyptus and Melaleuca, two large and phylogenetically problematic genera within the Myrtaceae family. With one annotated genome, one transcriptome and two whole-genome shotgun sequences of one Eucalyptus and four Melaleuca species, respectively, we identified 212 loci representing 263 kbp for targeted sequence capture and sequencing. Of these, 209 were successfully tested from 47 samples across five related genera of Myrtaceae. The average percentage of reads mapped back to the reference was 57.6% with coverage of more than 20 reads per position across 83.5% of the data. The methods developed here should be applicable across a large range of taxa across all kingdoms. The core methods are very flexible, providing a platform for various genomic resource availabilities and are useful from shallow to deep phylogenies.
Collapse
|
30
|
Vasilikopoulos A, Balke M, Beutel RG, Donath A, Podsiadlowski L, Pflug JM, Waterhouse RM, Meusemann K, Peters RS, Escalona HE, Mayer C, Liu S, Hendrich L, Alarie Y, Bilton DT, Jia F, Zhou X, Maddison DR, Niehuis O, Misof B. Phylogenomics of the superfamily Dytiscoidea (Coleoptera: Adephaga) with an evaluation of phylogenetic conflict and systematic error. Mol Phylogenet Evol 2019; 135:270-285. [DOI: 10.1016/j.ympev.2019.02.022] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 02/22/2019] [Accepted: 02/25/2019] [Indexed: 02/07/2023]
|
31
|
Parker E, Dornburg A, Domínguez-Domínguez O, Piller KR. Assessing phylogenetic information to reveal uncertainty in historical data: An example using Goodeinae (Teleostei: Cyprinodontiformes: Goodeidae). Mol Phylogenet Evol 2019; 134:282-290. [DOI: 10.1016/j.ympev.2019.01.025] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Revised: 01/17/2019] [Accepted: 01/30/2019] [Indexed: 01/18/2023]
|
32
|
Bravo GA, Antonelli A, Bacon CD, Bartoszek K, Blom MPK, Huynh S, Jones G, Knowles LL, Lamichhaney S, Marcussen T, Morlon H, Nakhleh LK, Oxelman B, Pfeil B, Schliep A, Wahlberg N, Werneck FP, Wiedenhoeft J, Willows-Munro S, Edwards SV. Embracing heterogeneity: coalescing the Tree of Life and the future of phylogenomics. PeerJ 2019; 7:e6399. [PMID: 30783571 PMCID: PMC6378093 DOI: 10.7717/peerj.6399] [Citation(s) in RCA: 76] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 01/07/2019] [Indexed: 12/23/2022] Open
Abstract
Building the Tree of Life (ToL) is a major challenge of modern biology, requiring advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by high-throughput sequencing (HTS). Such signals include those most commonly encountered in phylogenomic datasets, such as incomplete lineage sorting, but also those reticulate processes emerging with greater frequency, such as recombination and introgression. Here we focus specifically on how phylogenetic methods can accommodate the heterogeneity incurred by such population genetic processes; we do not discuss phylogenetic methods that ignore such processes, such as concatenation or supermatrix approaches or supertrees. We suggest that methods of data acquisition and the types of markers used in phylogenomics will remain restricted until a posteriori methods of marker choice are made possible with routine whole-genome sequencing of taxa of interest. We discuss limitations and potential extensions of a model supporting innovation in phylogenomics today, the multispecies coalescent model (MSC). Macroevolutionary models that use phylogenies, such as character mapping, often ignore the heterogeneity on which building phylogenies increasingly rely and suggest that assimilating such heterogeneity is an important goal moving forward. Finally, we argue that an integrative cyberinfrastructure linking all steps of the process of building the ToL, from specimen acquisition in the field to publication and tracking of phylogenomic data, as well as a culture that values contributors at each step, are essential for progress.
Collapse
Affiliation(s)
- Gustavo A. Bravo
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | - Alexandre Antonelli
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
- Gothenburg Botanical Garden, Göteborg, Sweden
| | - Christine D. Bacon
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Krzysztof Bartoszek
- Department of Computer and Information Science, Linköping University, Linköping, Sweden
| | - Mozes P. K. Blom
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden
| | - Stella Huynh
- Institut de Biologie, Université de Neuchâtel, Neuchâtel, Switzerland
| | - Graham Jones
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - L. Lacey Knowles
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Sangeet Lamichhaney
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | - Thomas Marcussen
- Centre for Ecological and Evolutionary Synthesis, University of Oslo, Oslo, Norway
| | - Hélène Morlon
- Institut de Biologie, Ecole Normale Supérieure de Paris, Paris, France
| | - Luay K. Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bengt Oxelman
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Bernard Pfeil
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Alexander Schliep
- Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
| | | | - Fernanda P. Werneck
- Coordenação de Biodiversidade, Programa de Coleções Científicas Biológicas, Instituto Nacional de Pesquisa da Amazônia, Manaus, AM, Brazil
| | - John Wiedenhoeft
- Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
- Department of Computer Science, Rutgers University, Piscataway, NJ, USA
| | - Sandi Willows-Munro
- School of Life Sciences, University of Kwazulu-Natal, Pietermaritzburg, South Africa
| | - Scott V. Edwards
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
- Gothenburg Centre for Advanced Studies in Science and Technology, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
| |
Collapse
|
33
|
Dornburg A, Su Z, Townsend JP. Optimal Rates for Phylogenetic Inference and Experimental Design in the Era of Genome-Scale Data Sets. Syst Biol 2018; 68:145-156. [PMID: 29939341 DOI: 10.1093/sysbio/syy047] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Accepted: 06/13/2018] [Indexed: 02/02/2023] Open
Abstract
With the rise of genome-scale data sets, there has been a call for increased data scrutiny and careful selection of loci that are appropriate to use in an attempt to resolve a phylogenetic problem. Such loci should maximize phylogenetic information content while minimizing the risk of homoplasy. Theory posits the existence of characters that evolve at an optimum rate, and efforts to determine optimal rates of inference have been a cornerstone of phylogenetic experimental design for over two decades. However, both theoretical and empirical investigations of optimal rates have varied dramatically in their conclusions: spanning no relationship to a tight relationship between the rate of change and phylogenetic utility. Herein, we synthesize these apparently contradictory views, demonstrating both empirical and theoretical conditions under which each is correct. We find that optimal rates of characters-not genes-are generally robust to most experimental design decisions. Moreover, consideration of site rate heterogeneity within a given locus is critical to accurate predictions of utility. Factors such as taxon sampling or the targeted number of characters providing support for a topology are additionally critical to the predictions of phylogenetic utility based on the rate of character change. Further, optimality of rates and predictions of phylogenetic utility are not equivalent, demonstrating the need for further development of comprehensive theory of phylogenetic experimental design. [Divergence time; GC bias; homoplasy; incongruence; information content; internode length; optimal rates; phylogenetic informativeness; phylogenetic theory; phylogenetic utility; phylogenomics; signal and noise; subtending branch length; state space; taxon and character sampling.].
Collapse
Affiliation(s)
- Alex Dornburg
- North Carolina Museum of Natural Sciences, Raleigh, 1671 Goldstar Drive, NC 27601, USA
| | - Zhuo Su
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, 165 Prospect Street, CT 06525, USA
| | - Jeffrey P Townsend
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, 165 Prospect Street, CT 06525, USA
- Department of Biostatistics, Yale University, New Haven, 60 College Street, CT 06510, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, 300 George Street, CT 06511, USA
| |
Collapse
|
34
|
Givnish TJ, Zuluaga A, Spalink D, Soto Gomez M, Lam VKY, Saarela JM, Sass C, Iles WJD, de Sousa DJL, Leebens-Mack J, Chris Pires J, Zomlefer WB, Gandolfo MA, Davis JI, Stevenson DW, dePamphilis C, Specht CD, Graham SW, Barrett CF, Ané C. Monocot plastid phylogenomics, timeline, net rates of species diversification, the power of multi-gene analyses, and a functional model for the origin of monocots. AMERICAN JOURNAL OF BOTANY 2018; 105:1888-1910. [PMID: 30368769 DOI: 10.1002/ajb2.1178] [Citation(s) in RCA: 112] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Accepted: 08/03/2018] [Indexed: 05/03/2023]
Abstract
PREMISE OF THE STUDY We present the first plastome phylogeny encompassing all 77 monocot families, estimate branch support, and infer monocot-wide divergence times and rates of species diversification. METHODS We conducted maximum likelihood analyses of phylogeny and BAMM studies of diversification rates based on 77 plastid genes across 545 monocots and 22 outgroups. We quantified how branch support and ascertainment vary with gene number, branch length, and branch depth. KEY RESULTS Phylogenomic analyses shift the placement of 16 families in relation to earlier studies based on four plastid genes, add seven families, date the divergence between monocots and eudicots+Ceratophyllum at 136 Mya, successfully place all mycoheterotrophic taxa examined, and support recognizing Taccaceae and Thismiaceae as separate families and Arecales and Dasypogonales as separate orders. Only 45% of interfamilial divergences occurred after the Cretaceous. Net species diversification underwent four large-scale accelerations in PACMAD-BOP Poaceae, Asparagales sister to Doryanthaceae, Orchidoideae-Epidendroideae, and Araceae sister to Lemnoideae, each associated with specific ecological/morphological shifts. Branch ascertainment and support across monocots increase with gene number and branch length, and decrease with relative branch depth. Analysis of entire plastomes in Zingiberales quantifies the importance of non-coding regions in identifying and supporting short, deep branches. CONCLUSIONS We provide the first resolved, well-supported monocot phylogeny and timeline spanning all families, and quantify the significant contribution of plastome-scale data to resolving short, deep branches. We outline a new functional model for the evolution of monocots and their diagnostic morphological traits from submersed aquatic ancestors, supported by convergent evolution of many of these traits in aquatic Hydatellaceae (Nymphaeales).
Collapse
Affiliation(s)
- Thomas J Givnish
- Department of Botany, University of Wisconsin-Madison, Madison, Wisconsin, 53706, USA
| | | | - Daniel Spalink
- Department of Ecosystem Science, Texas A&M University, College Station, Texas, 77840, USA
| | - Marybel Soto Gomez
- Department of Botany, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada
| | - Vivienne K Y Lam
- Department of Botany, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada
| | | | - Chodon Sass
- The University and Jepson Herbarium, University of California-Berkeley, Berkeley, California, 94720, USA
| | - William J D Iles
- Department of Earth and Environmental Sciences, University of Michigan, Ann Arbor, Michigan, 48109, USA
| | - Danilo José Lima de Sousa
- Departamento de Ciéncias Biológicas, Universidade Estadual de Feira de Santana, Feira de Santana, Bahia, 44036-900, Brazil
| | - James Leebens-Mack
- Department of Plant Biology, University of Georgia, Athens, Georgia, 30602, USA
| | - J Chris Pires
- Division of Biological Sciences, University of Missouri-Columbia, Columbia, Missouri, 65211, USA
| | - Wendy B Zomlefer
- Department of Plant Biology, University of Georgia, Athens, Georgia, 30602, USA
| | - Maria A Gandolfo
- School of Integrative Plant Sciences and L.H. Bailey Hortorium, Cornell University, Ithaca, New York, 14853, USA
| | - Jerrold I Davis
- School of Integrative Plant Sciences and L.H. Bailey Hortorium, Cornell University, Ithaca, New York, 14853, USA
| | | | - Claude dePamphilis
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania, 16802, USA
| | - Chelsea D Specht
- School of Integrative Plant Sciences and L.H. Bailey Hortorium, Cornell University, Ithaca, New York, 14853, USA
| | - Sean W Graham
- Department of Botany, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada
| | - Craig F Barrett
- Department of Biology, West Virginia University, Morgantown, West Virginia, 26506, USA
| | - Cécile Ané
- Department of Botany, University of Wisconsin-Madison, Madison, Wisconsin, 53706, USA
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, 53706, USA
| |
Collapse
|
35
|
Duchêne DA, Duchêne S, Ho SYW. Differences in Performance among Test Statistics for Assessing Phylogenomic Model Adequacy. Genome Biol Evol 2018; 10:1375-1388. [PMID: 29788113 PMCID: PMC6007652 DOI: 10.1093/gbe/evy094] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/11/2018] [Indexed: 11/12/2022] Open
Abstract
Statistical phylogenetic analyses of genomic data depend on models of nucleotide or amino acid substitution. The adequacy of these substitution models can be assessed using a number of test statistics, allowing the model to be rejected when it is found to provide a poor description of the evolutionary process. A potentially valuable use of model-adequacy test statistics is to identify when data sets are likely to produce unreliable phylogenetic estimates, but their differences in performance are rarely explored. We performed a comprehensive simulation study to identify test statistics that are sensitive to some of the most commonly cited sources of phylogenetic estimation error. Our results show that, for many test statistics, traditional thresholds for assessing model adequacy can fail to reject the model when the phylogenetic inferences are inaccurate and imprecise. This is particularly problematic when analysing loci that have few informative sites. We propose new thresholds for assessing substitution model adequacy and demonstrate their effectiveness in analyses of three phylogenomic data sets. These thresholds lead to frequent rejection of the model for loci that yield topological inferences that are imprecise and are likely to be inaccurate. We also propose the use of a summary statistic that provides a practical assessment of overall model adequacy. Our approach offers a promising means of enhancing model choice in genome-scale data sets, potentially leading to improvements in the reliability of phylogenomic inference.
Collapse
Affiliation(s)
- David A Duchêne
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| | - Sebastian Duchêne
- Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Melbourne, VIC, Australia
| | - Simon Y W Ho
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
36
|
Mclean BS, Bell KC, Allen JM, Helgen KM, Cook JA. Impacts of Inference Method and Data set Filtering on Phylogenomic Resolution in a Rapid Radiation of Ground Squirrels (Xerinae: Marmotini). Syst Biol 2018; 68:298-316. [DOI: 10.1093/sysbio/syy064] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Accepted: 09/12/2018] [Indexed: 12/20/2022] Open
Affiliation(s)
- Bryan S Mclean
- Department of Biology and Museum of Southwestern Biology, 1 University of New Mexico, MSC03-2020, Albuquerque, NM 87131, USA
- Florida Museum of Natural History, University of Florida, 1659 Museum Road, Gainesville, FL 32611, USA
| | - Kayce C Bell
- Department of Biology and Museum of Southwestern Biology, 1 University of New Mexico, MSC03-2020, Albuquerque, NM 87131, USA
- Department of Invertebrate Zoology, Smithsonian Institution National Museum of Natural History, P.O. Box 37012, MRC 163, Washington, DC 20013-7012, USA
| | - Julie M Allen
- Department of Biology, University of Nevada, 1664 N. Virginia Street, Reno, NV 89557, USA
| | - Kristofer M Helgen
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of Adelaide, North Terrace, Adelaide SA 5005, Australia
| | - Joseph A Cook
- Department of Biology and Museum of Southwestern Biology, 1 University of New Mexico, MSC03-2020, Albuquerque, NM 87131, USA
| |
Collapse
|
37
|
Mongiardino Koch N, Gauthier JA. Noise and biases in genomic data may underlie radically different hypotheses for the position of Iguania within Squamata. PLoS One 2018; 13:e0202729. [PMID: 30133514 PMCID: PMC6105018 DOI: 10.1371/journal.pone.0202729] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Accepted: 08/08/2018] [Indexed: 12/23/2022] Open
Abstract
Squamate reptiles are a major component of vertebrate biodiversity whose crown-clade traces its origin to a narrow window of time in the Mesozoic during which the main subclades diverged in rapid succession. Deciphering phylogenetic relationships among these lineages has proven challenging given the conflicting signals provided by genomic and phenomic data. Most notably, the placement of Iguania has routinely differed between data sources, with morphological evidence supporting a sister relationship to the remaining squamates (Scleroglossa hypothesis) and molecular data favoring a highly nested position alongside snakes and anguimorphs (Toxicofera hypothesis). We provide novel insights by generating an expanded morphological dataset and exploring the presence of phylogenetic signal, noise, and biases in molecular data. Our analyses confirm the presence of strong conflicting signals for the position of Iguania between morphological and molecular datasets. However, we also find that molecular data behave highly erratically when inferring the deepest branches of the squamate tree, a consequence of limited phylogenetic signal to resolve this ancient radiation with confidence. This, in turn, seems to result from a rate of evolution that is too high for historical signals to survive to the present. Finally, we detect significant systematic biases, with iguanians and snakes sharing faster rates of molecular evolution and a similarly biased nucleotide composition. A combination of scant phylogenetic signal, high levels of noise, and the presence of systematic biases could result in the misplacement of Iguania. We regard this explanation to be at least as plausible as the complex scenario of convergence and reversals required for morphological data to be misleading. We further evaluate and discuss the utility of morphological data to resolve ancient radiations, as well as its impact in combined-evidence phylogenomic analyses, with results relevant for the assessment of evidence and conflict across the Tree of Life.
Collapse
Affiliation(s)
- Nicolás Mongiardino Koch
- Department of Geology and Geophysics, Yale University, New Haven, Connecticut, United States of America
| | - Jacques A. Gauthier
- Department of Geology and Geophysics, Yale University, New Haven, Connecticut, United States of America
- Yale Peabody Museum of Natural History, New Haven, Connecticut, United States of America
| |
Collapse
|