1
|
Feoktistova SG, Ivanova AO, Degtyarev EP, Smirnova DI, Volchkov PY, Deviatkin AA. Phylogenetic Insights into H7Nx Influenza Viruses: Uncovering Reassortment Patterns and Geographic Variability. Viruses 2024; 16:1656. [PMID: 39599771 PMCID: PMC11598867 DOI: 10.3390/v16111656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Revised: 10/21/2024] [Accepted: 10/22/2024] [Indexed: 11/29/2024] Open
Abstract
Influenza A viruses (IAVs), which belong to the Orthomyxoviridae family, are RNA viruses characterized by a segmented genome that allows them to evolve and adapt rapidly. These viruses are mainly transmitted by wild waterfowl. In this study, we investigated the evolutionary processes of H7Nx (H7N1, H7N2, H7N3, H7N4, H7N5, H7N6, H7N7, H7N8, H7N9) viruses, which pose a significant pandemic risk due to the known cases of human infection and their potential for rapid genetic evolution and reassortment. The complete genome sequences of H7Nx influenza viruses (n = 3239) were compared between each other to investigate their phylogenetic relationships and reassortment patterns. For the selected viruses, phylogenetic trees were constructed for eight genome segments (PB2, PB1, PA, HA, NP, NA, M, NS) to assess the genetic diversity and geographic distribution of these viruses. Distinct phylogenetic clades with remarkable geographic patterns were found for the different segments. While the viruses were consistently grouped by subtype based on the NA segment sequences, the phylogeny of the other segment sequences, with the exception of the NS segment, showed distinct grouping patterns based on geographic origin rather than formal subtype assignment. Reassortment events leading to complex phylogenetic relationships were frequently observed. In addition, multiple cases of previously undescribed reassortments between subtypes were detected, emphasizing the fluidity of H7Nx virus populations. These results indicate a high degree of genetic diversity and reassortment within H7Nx influenza viruses. In other words, H7Nx viruses exist as constantly changing combinations of gene pools rather than stable genetic lineages.
Collapse
Affiliation(s)
- Sofya G. Feoktistova
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia (P.Y.V.)
| | - Alexandra O. Ivanova
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, RAS (IBCh RAS), 117997 Moscow, Russia
| | - Egor P. Degtyarev
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia (P.Y.V.)
| | - Daria I. Smirnova
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia (P.Y.V.)
| | - Pavel Yu. Volchkov
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia (P.Y.V.)
- Center for Personalized Medicine, The MCSC Named After A.S. Loginov, 111123 Moscow, Russia
| | - Andrei A. Deviatkin
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia (P.Y.V.)
| |
Collapse
|
2
|
Everson KM, Donohue ME, Weisrock DW. A Pervasive History of Gene Flow in Madagascar's True Lemurs (Genus Eulemur). Genes (Basel) 2023; 14:1130. [PMID: 37372308 DOI: 10.3390/genes14061130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 05/16/2023] [Accepted: 05/19/2023] [Indexed: 06/29/2023] Open
Abstract
In recent years, it has become widely accepted that interspecific gene flow is common across the Tree of Life. Questions remain about how species boundaries can be maintained in the face of high levels of gene flow and how phylogeneticists should account for reticulation in their analyses. The true lemurs of Madagascar (genus Eulemur, 12 species) provide a unique opportunity to explore these questions, as they form a recent radiation with at least five active hybrid zones. Here, we present new analyses of a mitochondrial dataset with hundreds of individuals in the genus Eulemur, as well as a nuclear dataset containing hundreds of genetic loci for a small number of individuals. Traditional coalescent-based phylogenetic analyses of both datasets reveal that not all recognized species are monophyletic. Using network-based approaches, we also find that a species tree containing between one and three ancient reticulations is supported by strong evidence. Together, these results suggest that hybridization has been a prominent feature of the genus Eulemur in both the past and present. We also recommend that greater taxonomic attention should be paid to this group so that geographic boundaries and conservation priorities can be better established.
Collapse
Affiliation(s)
- Kathryn M Everson
- Department of Integrative Biology, Oregon State University, Corvallis, OR 97331, USA
- Department of Biology, University of Kentucky, Lexington, KY 40506, USA
| | - Mariah E Donohue
- Department of Biology, University of Kentucky, Lexington, KY 40506, USA
| | - David W Weisrock
- Department of Biology, University of Kentucky, Lexington, KY 40506, USA
| |
Collapse
|
3
|
Phylogeography of Ramalina farinacea (Lichenized Fungi, Ascomycota) in the Mediterranean Basin, Europe, and Macaronesia. DIVERSITY 2023. [DOI: 10.3390/d15030310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Ramalina farinacea is an epiphytic lichen-forming fungus with a broad geographic distribution, especially in the Northern Hemisphere. In the eighties of the last century, it was hypothesized that R. farinacea had originated in the Macaronesian–Mediterranean region, with the Canary Islands as its probable southernmost limit, and thereafter it would have increased its distribution area. In order to explore the phylogeography of this emblematic lichen, we analyzed 120 thalli of R. farinacea collected in 38 localities distributed in temperate and boreal Europe, the Western Mediterranean Basin, and several Macaronesian archipelagos in the Atlantic Ocean. Data from two nuclear markers (nrITS and uid70) of the mycobiont were obtained to calculate genetic diversity indices to infer the phylogenies and haplotype networks and to investigate population structure. In addition, dating analysis was conducted to provide a valuable hypothesis of the timing of the origin and diversification of R. farinacea and its close allies. Our results highlight that phylogenetic species circumscription in the “Ramalina farinacea group” is complex and suggests that incomplete lineage sorting is at the base of conflicting phylogenetic signals. The existence of a high number of haplotypes restricted to the Macaronesian region, together with the diversification of R. farinacea in the Pleistocene, suggests that this species and its closest relatives originated during relatively recent geological times and then expanded its range to higher latitudes. However, our data cannot rule out whether the species originated from the Macaronesian archipelagos exclusively or also from the Mediterranean Basin. In conclusion, the present work provides a valuable biogeographical hypothesis for disentangling the evolution of this epiphytic lichen in space and time.
Collapse
|
4
|
Müller NF, Kistler KE, Bedford T. A Bayesian approach to infer recombination patterns in coronaviruses. Nat Commun 2022; 13:4186. [PMID: 35859071 PMCID: PMC9297283 DOI: 10.1038/s41467-022-31749-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 06/30/2022] [Indexed: 02/06/2023] Open
Abstract
As shown during the SARS-CoV-2 pandemic, phylogenetic and phylodynamic methods are essential tools to study the spread and evolution of pathogens. One of the central assumptions of these methods is that the shared history of pathogens isolated from different hosts can be described by a branching phylogenetic tree. Recombination breaks this assumption. This makes it problematic to apply phylogenetic methods to study recombining pathogens, including, for example, coronaviruses. Here, we introduce a Markov chain Monte Carlo approach that allows inference of recombination networks from genetic sequence data under a template switching model of recombination. Using this method, we first show that recombination is extremely common in the evolutionary history of SARS-like coronaviruses. We then show how recombination rates across the genome of the human seasonal coronaviruses 229E, OC43 and NL63 vary with rates of adaptation. This suggests that recombination could be beneficial to fitness of human seasonal coronaviruses. Additionally, this work sets the stage for Bayesian phylogenetic tracking of the spread and evolution of SARS-CoV-2 in the future, even as recombinant viruses become prevalent.
Collapse
Affiliation(s)
- Nicola F Müller
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
| | - Kathryn E Kistler
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | - Trevor Bedford
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| |
Collapse
|
5
|
Shikov AE, Malovichko YV, Nizhnikov AA, Antonets KS. Current Methods for Recombination Detection in Bacteria. Int J Mol Sci 2022; 23:ijms23116257. [PMID: 35682936 PMCID: PMC9181119 DOI: 10.3390/ijms23116257] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 05/30/2022] [Accepted: 05/30/2022] [Indexed: 02/05/2023] Open
Abstract
The role of genetic exchanges, i.e., homologous recombination (HR) and horizontal gene transfer (HGT), in bacteria cannot be overestimated for it is a pivotal mechanism leading to their evolution and adaptation, thus, tracking the signs of recombination and HGT events is importance both for fundamental and applied science. To date, dozens of bioinformatics tools for revealing recombination signals are available, however, their pros and cons as well as the spectra of solvable tasks have not yet been systematically reviewed. Moreover, there are two major groups of software. One aims to infer evidence of HR, while the other only deals with horizontal gene transfer (HGT). However, despite seemingly different goals, all the methods use similar algorithmic approaches, and the processes are interconnected in terms of genomic evolution influencing each other. In this review, we propose a classification of novel instruments for both HR and HGT detection based on the genomic consequences of recombination. In this context, we summarize available methodologies paying particular attention to the type of traceable events for which a certain program has been designed.
Collapse
Affiliation(s)
- Anton E. Shikov
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 196608 St. Petersburg, Russia; (A.E.S.); (Y.V.M.); (A.A.N.)
- Faculty of Biology, St. Petersburg State University (SPbSU), 199034 St. Petersburg, Russia
| | - Yury V. Malovichko
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 196608 St. Petersburg, Russia; (A.E.S.); (Y.V.M.); (A.A.N.)
- Faculty of Biology, St. Petersburg State University (SPbSU), 199034 St. Petersburg, Russia
| | - Anton A. Nizhnikov
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 196608 St. Petersburg, Russia; (A.E.S.); (Y.V.M.); (A.A.N.)
- Faculty of Biology, St. Petersburg State University (SPbSU), 199034 St. Petersburg, Russia
| | - Kirill S. Antonets
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 196608 St. Petersburg, Russia; (A.E.S.); (Y.V.M.); (A.A.N.)
- Faculty of Biology, St. Petersburg State University (SPbSU), 199034 St. Petersburg, Russia
- Correspondence:
| |
Collapse
|
6
|
Müller NF, Kistler KE, Bedford T. Recombination patterns in coronaviruses. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2021.04.28.441806. [PMID: 33948594 PMCID: PMC8095201 DOI: 10.1101/2021.04.28.441806] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
As shown during the SARS-CoV-2 pandemic, phylogenetic and phylodynamic methods are essential tools to study the spread and evolution of pathogens. One of the central assumptions of these methods is that the shared history of pathogens isolated from different hosts can be described by a branching phylogenetic tree. Recombination breaks this assumption. This makes it problematic to apply phylogenetic methods to study recombining pathogens, including, for example, coronaviruses. Here, we introduce a Markov chain Monte Carlo approach that allows inference of recombination networks from genetic sequence data under a template switching model of recombination. Using this method, we first show that recombination is extremely common in the evolutionary history of SARS-like coronaviruses. We then show how recombination rates across the genome of the human seasonal coronaviruses 229E, OC43 and NL63 vary with rates of adaptation. This suggests that recombination could be beneficial to fitness of human seasonal coronaviruses. Additionally, this work sets the stage for Bayesian phylogenetic tracking of the spread and evolution of SARS-CoV-2 in the future, even as recombinant viruses become prevalent.
Collapse
Affiliation(s)
- Nicola F. Müller
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Kathryn E. Kistler
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | - Trevor Bedford
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| |
Collapse
|
7
|
Allman ES, Mitchell JD, Rhodes JA. Gene tree discord, simplex plots, and statistical tests under the coalescent. Syst Biol 2021; 71:929-942. [PMID: 33560348 DOI: 10.1093/sysbio/syab008] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 01/31/2021] [Accepted: 02/03/2021] [Indexed: 02/06/2023] Open
Abstract
A simple graphical device, the simplex plot of quartet concordance factors, is introduced to aid in the exploration of a collection of gene trees on a common set of taxa. A single plot summarizes all gene tree discord, and allows for visual comparison to the expected discord from the multispecies coalescent model (MSC) of incomplete lineage sorting on a species tree. A formal statistical procedure is described that can quantify the deviation from expectation for each subset of four taxa, suggesting when the data is not in accord with the MSC, and thus that either gene tree inference error is substantial or a more complex model such as that on a network may be required. If the collection of gene trees is in accord with the MSC, the plots reveal when substantial incomplete lineage sorting is present. Applications to both simulated and empirical multilocus data sets illustrate the insights provided.
Collapse
Affiliation(s)
- Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK 99709, USA
| | - Jonathan D Mitchell
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK 99709, USA.,Unité Bioinformatique Evolutive, C3BI USR 3756, Institut Pasteur & CNRS, Paris, France
| | - John A Rhodes
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK 99709, USA
| |
Collapse
|
8
|
Boskova V, Stadler T. PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences. Mol Biol Evol 2020; 37:3061-3075. [PMID: 32492139 PMCID: PMC7530608 DOI: 10.1093/molbev/msaa136] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Next-generation sequencing of pathogen quasispecies within a host yields data sets of tens to hundreds of unique sequences. However, the full data set often contains thousands of sequences, because many of those unique sequences have multiple identical copies. Data sets of this size represent a computational challenge for currently available Bayesian phylogenetic and phylodynamic methods. Through simulations, we explore how large data sets with duplicate sequences affect the speed and accuracy of phylogenetic and phylodynamic analysis within BEAST 2. We show that using unique sequences only leads to biases, and using a random subset of sequences yields imprecise parameter estimates. To overcome these shortcomings, we introduce PIQMEE, a BEAST 2 add-on that produces reliable parameter estimates from full data sets with increased computational efficiency as compared with the currently available methods within BEAST 2. The principle behind PIQMEE is to resolve the tree structure of the unique sequences only, while simultaneously estimating the branching times of the duplicate sequences. Distinguishing between unique and duplicate sequences allows our method to perform well even for very large data sets. Although the classic method converges poorly for data sets of 6,000 sequences when allowed to run for 7 days, our method converges in slightly more than 1 day. In fact, PIQMEE can handle data sets of around 21,000 sequences with 20 unique sequences in 14 days. Finally, we apply the method to a real, within-host HIV sequencing data set with several thousand sequences per patient.
Collapse
Affiliation(s)
- Veronika Boskova
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Switzerland
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna and Medical University of Vienna, Vienna, Austria
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Switzerland
| |
Collapse
|
9
|
Bayesian inference of reassortment networks reveals fitness benefits of reassortment in human influenza viruses. Proc Natl Acad Sci U S A 2020; 117:17104-17111. [PMID: 32631984 PMCID: PMC7382287 DOI: 10.1073/pnas.1918304117] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Genetic recombination processes, such as reassortment, make it complex or impossible to use standard phylogenetic and phylodynamic methods. This is due to the fact that the shared evolutionary history of individuals has to be represented by a phylogenetic network instead of a tree. We therefore require novel approaches that allow us to coherently model these processes and that allow us to perform inference in the presence of such processes. Here, we introduce an approach to infer reassortment networks of segmented viruses using a Markov chain Monte Carlo approach. Our approach allows us to study different aspects of the reassortment process and allows us to show fitness benefits of reassortment events in seasonal human influenza viruses. Reassortment is an important source of genetic diversity in segmented viruses and is the main source of novel pathogenic influenza viruses. Despite this, studying the reassortment process has been constrained by the lack of a coherent, model-based inference framework. Here, we introduce a coalescent-based model that allows us to explicitly model the joint coalescent and reassortment process. In order to perform inference under this model, we present an efficient Markov chain Monte Carlo algorithm to sample rooted networks and the embedding of phylogenetic trees within networks. This algorithm provides the means to jointly infer coalescent and reassortment rates with the reassortment network and the embedding of segments in that network from full-genome sequence data. Studying reassortment patterns of different human influenza datasets, we find large differences in reassortment rates across different human influenza viruses. Additionally, we find that reassortment events predominantly occur on selectively fitter parts of reassortment networks showing that on a population level, reassortment positively contributes to the fitness of human influenza viruses.
Collapse
|
10
|
Alkhamis MA, Li C, Torremorell M. Animal Disease Surveillance in the 21st Century: Applications and Robustness of Phylodynamic Methods in Recent U.S. Human-Like H3 Swine Influenza Outbreaks. Front Vet Sci 2020; 7:176. [PMID: 32373634 PMCID: PMC7186338 DOI: 10.3389/fvets.2020.00176] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Accepted: 03/16/2020] [Indexed: 11/22/2022] Open
Abstract
Emerging and endemic animal viral diseases continue to impose substantial impacts on animal and human health. Most current and past molecular surveillance studies of animal diseases investigated spatio-temporal and evolutionary dynamics of the viruses in a disjointed analytical framework, ignoring many uncertainties and made joint conclusions from both analytical approaches. Phylodynamic methods offer a uniquely integrated platform capable of inferring complex epidemiological and evolutionary processes from the phylogeny of viruses in populations using a single Bayesian statistical framework. In this study, we reviewed and outlined basic concepts and aspects of phylodynamic methods and attempted to summarize essential components of the methodology in one analytical pipeline to facilitate the proper use of the methods by animal health researchers. Also, we challenged the robustness of the posterior evolutionary parameters, inferred by the commonly used phylodynamic models, using hemagglutinin (HA) and polymerase basic 2 (PB2) segments of the currently circulating human-like H3 swine influenza (SI) viruses isolated in the United States and multiple priors. Subsequently, we compared similarities and differences between the posterior parameters inferred from sequence data using multiple phylodynamic models. Our suggested phylodynamic approach attempts to reduce the impact of its inherent limitations to offer less biased and biologically plausible inferences about the pathogen evolutionary characteristics to properly guide intervention activities. We also pinpointed requirements and challenges for integrating phylodynamic methods in routine animal disease surveillance activities.
Collapse
Affiliation(s)
- Moh A Alkhamis
- Department of Epidemiology and Biostatistics, Faculty of Public Health, Health Sciences Center, Kuwait University, Kuwait City, Kuwait.,Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, St. Paul, MN, United States
| | - Chong Li
- Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, St. Paul, MN, United States
| | - Montserrat Torremorell
- Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, St. Paul, MN, United States
| |
Collapse
|
11
|
Allman ES, Baños H, Rhodes JA. NANUQ: a method for inferring species networks from gene trees under the coalescent model. Algorithms Mol Biol 2019; 14:24. [PMID: 31827592 PMCID: PMC6896299 DOI: 10.1186/s13015-019-0159-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 11/07/2019] [Indexed: 01/07/2023] Open
Abstract
Species networks generalize the notion of species trees to allow for hybridization or other lateral gene transfer. Under the network multispecies coalescent model, individual gene trees arising from a network can have any topology, but arise with frequencies dependent on the network structure and numerical parameters. We propose a new algorithm for statistical inference of a level-1 species network under this model, from data consisting of gene tree topologies, and provide the theoretical justification for it. The algorithm is based on an analysis of quartets displayed on gene trees, combining several statistical hypothesis tests with combinatorial ideas such as a quartet-based intertaxon distance appropriate to networks, the NeighborNet algorithm for circular split systems, and the Circular Network algorithm for constructing a splits graph.
Collapse
|
12
|
Bouckaert R, Vaughan TG, Barido-Sottani J, Duchêne S, Fourment M, Gavryushkina A, Heled J, Jones G, Kühnert D, De Maio N, Matschiner M, Mendes FK, Müller NF, Ogilvie HA, du Plessis L, Popinga A, Rambaut A, Rasmussen D, Siveroni I, Suchard MA, Wu CH, Xie D, Zhang C, Stadler T, Drummond AJ. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Comput Biol 2019; 15:e1006650. [PMID: 30958812 PMCID: PMC6472827 DOI: 10.1371/journal.pcbi.1006650] [Citation(s) in RCA: 1997] [Impact Index Per Article: 332.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 04/18/2019] [Accepted: 02/04/2019] [Indexed: 11/18/2022] Open
Abstract
Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release.
Collapse
Affiliation(s)
- Remco Bouckaert
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
- Max Planck Institute for the Science of Human History, Jena, Germany
| | - Timothy G. Vaughan
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Joëlle Barido-Sottani
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Sebastián Duchêne
- Department of Biochemistry and Molecular Biology, University of Melbourne, Melbourne, Victoria, Australia
| | - Mathieu Fourment
- ithree institute, University of Technology Sydney, Sydney, Australia
| | | | | | - Graham Jones
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE 405 30 Göteborg, Sweden
| | - Denise Kühnert
- Max Planck Institute for the Science of Human History, Jena, Germany
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridgeshire, UK
| | - Michael Matschiner
- Department of Environmental Sciences, University of Basel, 4051 Basel, Switzerland
| | - Fábio K. Mendes
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
| | - Nicola F. Müller
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Huw A. Ogilvie
- Department of Computer Science, Rice University, Houston, TX 77005-1892, USA
| | - Louis du Plessis
- Department of Zoology, University of Oxford, Oxford, OX1 3PS, UK
| | - Alex Popinga
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, Edinburgh, EH9 3FL UK
| | - David Rasmussen
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC 27695, USA
| | - Igor Siveroni
- Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place, W2 1PG, UK
| | - Marc A. Suchard
- Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Chieh-Hsi Wu
- Department of Statistics, University of Oxford, OX1 3LB, UK
| | - Dong Xie
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
| | - Chi Zhang
- Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, China
| | - Tanja Stadler
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Alexei J. Drummond
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
| |
Collapse
|
13
|
Varsani A, Lefeuvre P, Roumagnac P, Martin D. Notes on recombination and reassortment in multipartite/segmented viruses. Curr Opin Virol 2018; 33:156-166. [PMID: 30237098 DOI: 10.1016/j.coviro.2018.08.013] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Revised: 08/07/2018] [Accepted: 08/28/2018] [Indexed: 11/29/2022]
Abstract
Besides evolving through nucleotide substitution, viruses frequently also evolve by genetic recombination which can occur when related viral variants co-infect the same cells. Viruses with segmented or multipartite genomes can additionally evolve via the reassortment of genomic components. Various computational techniques are now available for identifying and characterizing recombination and reassortment. While these techniques have revealed both that all well studied segmented and multipartite virus species show some capacity for reassortment, and that recombination is common in many multipartite species, they have indicated that recombination is either rare or does not occur in species with segmented genomes. Reassortment and recombination can make it very difficult to study segmented/multipartite viruses using metagenomics-based approaches. Notable challenges include, both the accurate identification and assignment of genomic components to individual genomes, and the differentiation between natural 'real' recombination events and artifactual 'fake' recombination events arising from the inaccurate de novo assembly of genome component sequences determined using short read sequencing.
Collapse
Affiliation(s)
- Arvind Varsani
- The Biodesign Center for Fundamental and Applied Microbiomics, Center for Evolution and Medicine and School of Life Sciences, Arizona State University, Tempe, AZ 85287-5001, USA; Structural Biology Research Unit, Department of Clinical Laboratory Sciences, University of Cape Town, Observatory, 7925, Cape Town, South Africa.
| | | | - Philippe Roumagnac
- CIRAD, BGPI, Montpellier, France; BGPI, INRA, CIRAD, SupAgro, Univ. Montpellier, Montpellier, France
| | - Darren Martin
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Diseases and Molecular Medicine. University of Cape Town, Observatory, 7925, South Africa
| |
Collapse
|
14
|
Wen D, Nakhleh L. Coestimating Reticulate Phylogenies and Gene Trees from Multilocus Sequence Data. Syst Biol 2017; 67:439-457. [DOI: 10.1093/sysbio/syx085] [Citation(s) in RCA: 90] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Accepted: 10/24/2017] [Indexed: 11/13/2022] Open
Affiliation(s)
| | - Luay Nakhleh
- Department of Computer Science
- Department of BioSciences, Rice University, 6100 Main Street, Houston, TX 77005, USA
| |
Collapse
|
15
|
Vaughan TG, Welch D, Drummond AJ, Biggs PJ, George T, French NP. Inferring Ancestral Recombination Graphs from Bacterial Genomic Data. Genetics 2017; 205:857-870. [PMID: 28007885 PMCID: PMC5289856 DOI: 10.1534/genetics.116.193425] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 12/03/2016] [Indexed: 11/18/2022] Open
Abstract
Homologous recombination is a central feature of bacterial evolution, yet it confounds traditional phylogenetic methods. While a number of methods specific to bacterial evolution have been developed, none of these permit joint inference of a bacterial recombination graph and associated parameters. In this article, we present a new method which addresses this shortcoming. Our method uses a novel Markov chain Monte Carlo algorithm to perform phylogenetic inference under the ClonalOrigin model. We demonstrate the utility of our method by applying it to ribosomal multilocus sequence typing data sequenced from pathogenic and nonpathogenic Escherichia coli serotype O157 and O26 isolates collected in rural New Zealand. The method is implemented as an open source BEAST 2 package, Bacter, which is available via the project web page at http://tgvaughan.github.io/bacter.
Collapse
Affiliation(s)
- Timothy G Vaughan
- Centre for Computational Evolution, The University of Auckland, 1010, New Zealand
- Department of Computer Science, The University of Auckland, 1010, New Zealand
| | - David Welch
- Centre for Computational Evolution, The University of Auckland, 1010, New Zealand
- Department of Computer Science, The University of Auckland, 1010, New Zealand
| | - Alexei J Drummond
- Centre for Computational Evolution, The University of Auckland, 1010, New Zealand
- Department of Computer Science, The University of Auckland, 1010, New Zealand
| | - Patrick J Biggs
- Molecular Epidemiology and Public Health Laboratory, Infectious Disease Research Centre, Hopkirk Research Institute, Massey University, Palmerston North 4442, New Zealand
| | - Tessy George
- Molecular Epidemiology and Public Health Laboratory, Infectious Disease Research Centre, Hopkirk Research Institute, Massey University, Palmerston North 4442, New Zealand
| | - Nigel P French
- Molecular Epidemiology and Public Health Laboratory, Infectious Disease Research Centre, Hopkirk Research Institute, Massey University, Palmerston North 4442, New Zealand
| |
Collapse
|
16
|
Baele G, Suchard MA, Rambaut A, Lemey P. Emerging Concepts of Data Integration in Pathogen Phylodynamics. Syst Biol 2017; 66:e47-e65. [PMID: 28173504 PMCID: PMC5837209 DOI: 10.1093/sysbio/syw054] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2015] [Accepted: 06/02/2016] [Indexed: 12/24/2022] Open
Abstract
Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics.
Collapse
Affiliation(s)
- Guy Baele
- Department of Microbiology and Immunology, Rega Institute, KU Leuven - University of Leuven, Leuven, Belgium
| | - Marc A. Suchard
- Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
- Department of Biostatistics, School of Public Health, University of California, Los Angeles, CA 90095, USA
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, Kings Buildings, Edinburgh EH9 3FL, UK
- Centre for Immunity, Infection and Evolution, University of Edinburgh, Kings Buildings, Edinburgh EH9 3FL, UK
| | - Philippe Lemey
- Department of Microbiology and Immunology, Rega Institute, KU Leuven - University of Leuven, Leuven, Belgium
| |
Collapse
|
17
|
Bayesian Inference of Reticulate Phylogenies under the Multispecies Network Coalescent. PLoS Genet 2016; 12:e1006006. [PMID: 27144273 PMCID: PMC4856265 DOI: 10.1371/journal.pgen.1006006] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2015] [Accepted: 04/04/2016] [Indexed: 11/19/2022] Open
Abstract
The multispecies coalescent (MSC) is a statistical framework that models how gene genealogies grow within the branches of a species tree. The field of computational phylogenetics has witnessed an explosion in the development of methods for species tree inference under MSC, owing mainly to the accumulating evidence of incomplete lineage sorting in phylogenomic analyses. However, the evolutionary history of a set of genomes, or species, could be reticulate due to the occurrence of evolutionary processes such as hybridization or horizontal gene transfer. We report on a novel method for Bayesian inference of genome and species phylogenies under the multispecies network coalescent (MSNC). This framework models gene evolution within the branches of a phylogenetic network, thus incorporating reticulate evolutionary processes, such as hybridization, in addition to incomplete lineage sorting. As phylogenetic networks with different numbers of reticulation events correspond to points of different dimensions in the space of models, we devise a reversible-jump Markov chain Monte Carlo (RJMCMC) technique for sampling the posterior distribution of phylogenetic networks under MSNC. We implemented the methods in the publicly available, open-source software package PhyloNet and studied their performance on simulated and biological data. The work extends the reach of Bayesian inference to phylogenetic networks and enables new evolutionary analyses that account for reticulation. Trees have long formed in biology the basic structure with which to represent and understand evolutionary relationships. Mathematical models, computational methods, and software tools for inferring phylogenetic trees and studying their mathematical properties are currently the norm in biology. The availability of genomic data from closely related species, as well as from multiple individuals within species, have brought the two fields of phylogenetics and population genetics closer than ever. In particular, the last two decades have witnessed a great flourish in the development and implementation of phylogenetic methods based on the multispecies coalescent model to capture the intricate relationship between gene and genome evolution. However, when reticulation processes such as hybridization occur, the phylogenetic history is best represented by a network. In this work, we demonstrate how the multispecies coalescent model can be adapted to reticulate evolutionary histories and report on a Bayesian method for inference of such histories under this extended model. As networks subsume trees, the model and method provide a principled and unified statistical framework for inferring treelike and non-treelike evolutionary relationships.
Collapse
|
18
|
Hedge J, Wilson DJ. Bacterial phylogenetic reconstruction from whole genomes is robust to recombination but demographic inference is not. mBio 2014; 5:e02158. [PMID: 25425237 PMCID: PMC4251999 DOI: 10.1128/mbio.02158-14] [Citation(s) in RCA: 90] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 11/07/2014] [Indexed: 12/24/2022] Open
Abstract
UNLABELLED Phylogenetic inference in bacterial genomics is fundamental to understanding problems such as population history, antimicrobial resistance, and transmission dynamics. The field has been plagued by an apparent state of contradiction since the distorting effects of recombination on phylogeny were discovered more than a decade ago. Researchers persist with detailed phylogenetic analyses while simultaneously acknowledging that recombination seriously misleads inference of population dynamics and selection. Here we resolve this paradox by showing that phylogenetic tree topologies based on whole genomes robustly reconstruct the clonal frame topology but that branch lengths are badly skewed. Surprisingly, removing recombining sites can exacerbate branch length distortion caused by recombination. IMPORTANCE Phylogenetic tree reconstruction is a popular approach for understanding the relatedness of bacteria in a population from differences in their genome sequences. However, bacteria frequently exchange regions of their genomes by a process called homologous recombination, which violates a fundamental assumption of phylogenetic methods. Since many researchers continue to use phylogenetics for recombining bacteria, it is important to understand how recombination affects the conclusions drawn from these analyses. We find that whole-genome sequences afford great accuracy in reconstructing evolutionary relationships despite concerns surrounding the presence of recombination, but the branch lengths of the phylogenetic tree are indeed badly distorted. Surprisingly, methods to reduce the impact of recombination on branch lengths can exacerbate the problem.
Collapse
Affiliation(s)
- Jessica Hedge
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford, United Kingdom
| | | |
Collapse
|
19
|
Abstract
This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.
Collapse
Affiliation(s)
- Gergely J Szöllősi
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Eric Tannier
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Vincent Daubin
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Bastien Boussau
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France;
| |
Collapse
|
20
|
Morrison DA. Is the Tree of Life the Best Metaphor, Model, or Heuristic for Phylogenetics? Syst Biol 2014; 63:628-38. [DOI: 10.1093/sysbio/syu026] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- David A. Morrison
- Section for Parasitology, Swedish University of Agricultural Sciences, 751 89 Uppsala, Sweden
| |
Collapse
|
21
|
Fontanez KM, Cavanaugh CM. Evidence for horizontal transmission from multilocus phylogeny of deep-sea mussel (Mytilidae) symbionts. Environ Microbiol 2014; 16:3608-21. [DOI: 10.1111/1462-2920.12379] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2013] [Accepted: 12/22/2013] [Indexed: 11/29/2022]
Affiliation(s)
- Kristina M. Fontanez
- Department of Organismic and Evolutionary Biology; Harvard University; Cambridge MA 02138 USA
| | - Colleen M. Cavanaugh
- Department of Organismic and Evolutionary Biology; Harvard University; Cambridge MA 02138 USA
| |
Collapse
|
22
|
Sjostrand J, Tofigh A, Daubin V, Arvestad L, Sennblad B, Lagergren J. A Bayesian Method for Analyzing Lateral Gene Transfer. Syst Biol 2014; 63:409-20. [DOI: 10.1093/sysbio/syu007] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
|
23
|
Yu Y, Barnett RM, Nakhleh L. Parsimonious inference of hybridization in the presence of incomplete lineage sorting. Syst Biol 2013; 62:738-51. [PMID: 23736104 PMCID: PMC3739885 DOI: 10.1093/sysbio/syt037] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2012] [Revised: 11/26/2012] [Accepted: 05/28/2013] [Indexed: 12/31/2022] Open
Abstract
Hybridization plays an important evolutionary role in several groups of organisms. A phylogenetic approach to detect hybridization entails sequencing multiple loci across the genomes of a group of species of interest, reconstructing their gene trees, and taking their differences as indicators of hybridization. However, methods that follow this approach mostly ignore population effects, such as incomplete lineage sorting (ILS). Given that hybridization occurs between closely related organisms, ILS may very well be at play and, hence, must be accounted for in the analysis framework. To address this issue, we present a parsimony criterion for reconciling gene trees within the branches of a phylogenetic network, and a local search heuristic for inferring phylogenetic networks from collections of gene-tree topologies under this criterion. This framework enables phylogenetic analyses while accounting for both hybridization and ILS. Further, we propose two techniques for incorporating information about uncertainty in gene-tree estimates. Our simulation studies demonstrate the good performance of our framework in terms of identifying the location of hybridization events, as well as estimating the proportions of genes that underwent hybridization. Also, our framework shows good performance in terms of efficiency on handling large data sets in our experiments. Further, in analysing a yeast data set, we demonstrate issues that arise when analysing real data sets. Although a probabilistic approach was recently introduced for this problem, and although parsimonious reconciliations have accuracy issues under certain settings, our parsimony framework provides a much more computationally efficient technique for this type of analysis. Our framework now allows for genome-wide scans for hybridization, while also accounting for ILS.
Collapse
Affiliation(s)
- Yun Yu
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | - R. Matthew Barnett
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| |
Collapse
|
24
|
Testing species delimitations in four Italian sympatric leuciscine fishes in the Tiber River: a combined morphological and molecular approach. PLoS One 2013; 8:e60392. [PMID: 23565240 PMCID: PMC3614999 DOI: 10.1371/journal.pone.0060392] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2012] [Accepted: 02/27/2013] [Indexed: 11/19/2022] Open
Abstract
Leuciscine fishes represent an important component of freshwater ichthyofauna endemic to northern Mediterranean areas. This lineage shows high intra-specific morphological variability and exhibits high levels of hybridization, two characteristics that contribute to systematic uncertainties, misclassification of taxa and, potentially, the mismanagement of biodiversity. This study focused on brook chub, Squalius lucumonis, an endemic taxon of Central Italy. The taxonomic status of this species has long been questioned, and a hybrid origin from sympatric leusciscines (S. squalus x Rutilus rubilio, or S. squalus x Telestes muticellus) has been hypothesised. A phenotypic (evaluating shape and meristic counts) and genetic (using mitochondrial and nuclear markers) investigation of these four taxa was conducted to test species delimitation in sympatric areas and to evaluate the taxonomic status of S. lucumonis. One hundred and forty-five individuals of all four taxa were collected within streams of the lowest portion of the Tiber River basin and analysed; this region encompasses a large portion of the S. lucumonis distribution. The different morphological and genetic approaches were individually examined, compared, and then combined in a quantitative model to both investigate the limits of each approach and to identify cases of misclassification. The results obtained confirm the cladogenetic non-hybrid origin of S. lucumonis, highlight the need for immediate conservation actions and emphasise the value of an integrated approach in the study of leuciscines evolution.
Collapse
|
25
|
de Villiers MJ, Pirie MD, Hughes M, Möller M, Edwards TJ, Bellstedt DU. An approach to identify putative hybrids in the 'coalescent stochasticity zone', as exemplified in the African plant genus Streptocarpus (Gesneriaceae). THE NEW PHYTOLOGIST 2013; 198:284-300. [PMID: 23373903 DOI: 10.1111/nph.12133] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2012] [Accepted: 11/29/2012] [Indexed: 06/01/2023]
Abstract
The inference of phylogenetic relationships is often complicated by differing evolutionary histories of independently-inherited markers. The causes of the resulting gene tree incongruence can be challenging to identify, often relying on coalescent simulations dependent on unverifiable assumptions. We investigated alternative techniques using the South African rosulate species of Streptocarpus as a study group. Two independent gene trees - from the nuclear ITS region and from three concatenated plastid regions (trnL-F, rpl20-rps12 and trnC-D) - displayed widespread, strongly supported incongruence. We investigated the causes by detecting genetic exchange across morphological borders using morphological optimizations and genetic exchange across species boundaries using the genealogical sorting index. Incongruence between gene trees was associated with ancestral shifts in growth form (in four species) but not in pollination syndrome, suggesting introgression limited by reproductive barriers. Genealogical sorting index calculations showed polyphyly of two additional species, while individuals of all others were significantly associated. In one case the association was stronger according to the internal transcribed spacer data than according to the plastid data, which, given the smaller effective population size of the plastid, may also indicate introgression. These approaches offer alternative ways to identify potential hybridization events where incomplete lineage sorting cannot be rejected using simulations.
Collapse
Affiliation(s)
- Margaret J de Villiers
- Department of Biochemistry, University of Stellenbosch, Private Bag X1, Matieland, 7602, South Africa
| | - Michael D Pirie
- Department of Biochemistry, University of Stellenbosch, Private Bag X1, Matieland, 7602, South Africa
| | - Mark Hughes
- Royal Botanic Garden Edinburgh, 20A Inverleith Row, Edinburgh, EH3 5LR, UK
| | - Michael Möller
- Royal Botanic Garden Edinburgh, 20A Inverleith Row, Edinburgh, EH3 5LR, UK
| | - Trevor J Edwards
- Botany Department, La Trobe University, Melbourne, Vic., Australia
| | - Dirk U Bellstedt
- Department of Biochemistry, University of Stellenbosch, Private Bag X1, Matieland, 7602, South Africa
| |
Collapse
|
26
|
Dearlove B, Wilson DJ. Coalescent inference for infectious disease: meta-analysis of hepatitis C. Philos Trans R Soc Lond B Biol Sci 2013; 368:20120314. [PMID: 23382432 PMCID: PMC3678333 DOI: 10.1098/rstb.2012.0314] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Genetic analysis of pathogen genomes is a powerful approach to investigating the population dynamics and epidemic history of infectious diseases. However, the theoretical underpinnings of the most widely used, coalescent methods have been questioned, casting doubt on their interpretation. The aim of this study is to develop robust population genetic inference for compartmental models in epidemiology. Using a general approach based on the theory of metapopulations, we derive coalescent models under susceptible–infectious (SI), susceptible–infectious–susceptible (SIS) and susceptible–infectious–recovered (SIR) dynamics. We show that exponential and logistic growth models are equivalent to SI and SIS models, respectively, when co-infection is negligible. Implementing SI, SIS and SIR models in BEAST, we conduct a meta-analysis of hepatitis C epidemics, and show that we can directly estimate the basic reproductive number (R0) and prevalence under SIR dynamics. We find that differences in genetic diversity between epidemics can be explained by differences in underlying epidemiology (age of the epidemic and local population density) and viral subtype. Model comparison reveals SIR dynamics in three globally restricted epidemics, but most are better fit by the simpler SI dynamics. In summary, metapopulation models provide a general and practical framework for integrating epidemiology and population genetics for the purposes of joint inference.
Collapse
Affiliation(s)
- Bethany Dearlove
- Nuffield Department of Clinical Medicine, Experimental Medicine Division, University of Oxford, Oxford, UK
| | | |
Collapse
|
27
|
Ward MJ, Lycett SJ, Kalish ML, Rambaut A, Leigh Brown AJ. Estimating the rate of intersubtype recombination in early HIV-1 group M strains. J Virol 2013; 87:1967-73. [PMID: 23236072 PMCID: PMC3571495 DOI: 10.1128/jvi.02478-12] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2012] [Accepted: 12/06/2012] [Indexed: 11/20/2022] Open
Abstract
West Central Africa has been implicated as the epicenter of the HIV-1 epidemic, and almost all group M subtypes can be found there. Previous analysis of early HIV-1 group M sequences from Kinshasa in the Democratic Republic of Congo, formerly Zaire, revealed that isolates from a number of individuals fall in different positions in phylogenetic trees constructed from sequences from opposite ends of the genome as a result of recombination between viruses of different subtypes. Here, we use discrete ancestral trait mapping to develop a procedure for quantifying HIV-1 group M intersubtype recombination across phylogenies, using individuals' gag (p17) and env (gp41) subtypes. The method was applied to previously described HIV-1 group M sequences from samples obtained in Kinshasa early in the global radiation of HIV. Nine different p17 and gp41 intersubtype recombinant combinations were present in the data set. The mean number of excess ancestral subtype transitions (NEST) required to map individuals' p17 subtypes onto the gp14 phylogeny samples, compared to the number required to map them onto the p17 phylogenies, and vice versa, indicated that excess subtype transitions occurred at a rate of approximately 7 × 10(-3) to 8 × 10(-3) per lineage per year as a result of intersubtype recombination. Our results imply that intersubtype recombination may have occurred in approximately 20% of lineages evolving over a period of 30 years and confirm intersubtype recombination as a substantial force in generating HIV-1 group M diversity.
Collapse
Affiliation(s)
- Melissa J. Ward
- University of Edinburgh, Institute of Evolutionary Biology, Ashworth Laboratories, Edinburgh, United Kingdom
| | - Samantha J. Lycett
- University of Edinburgh, Institute of Evolutionary Biology, Ashworth Laboratories, Edinburgh, United Kingdom
| | - Marcia L. Kalish
- Vanderbilt University, Vanderbilt Institute for Global Health, Nashville, Tennessee, USA
| | - Andrew Rambaut
- University of Edinburgh, Institute of Evolutionary Biology, Ashworth Laboratories, Edinburgh, United Kingdom
- Fogarty International Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Andrew J. Leigh Brown
- University of Edinburgh, Institute of Evolutionary Biology, Ashworth Laboratories, Edinburgh, United Kingdom
| |
Collapse
|
28
|
Jin Q, He LJ, Zhang AB. A simple 2D non-parametric resampling statistical approach to assess confidence in species identification in DNA barcoding--an alternative to likelihood and bayesian approaches. PLoS One 2012; 7:e50831. [PMID: 23239988 PMCID: PMC3519818 DOI: 10.1371/journal.pone.0050831] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2012] [Accepted: 10/24/2012] [Indexed: 11/19/2022] Open
Abstract
In the recent worldwide campaign for the global biodiversity inventory via DNA barcoding, a simple and easily used measure of confidence for assigning sequences to species in DNA barcoding has not been established so far, although the likelihood ratio test and the bayesian approach had been proposed to address this issue from a statistical point of view. The TDR (Two Dimensional non-parametric Resampling) measure newly proposed in this study offers users a simple and easy approach to evaluate the confidence of species membership in DNA barcoding projects. We assessed the validity and robustness of the TDR approach using datasets simulated under coalescent models, and an empirical dataset, and found that TDR measure is very robust in assessing species membership of DNA barcoding. In contrast to the likelihood ratio test and bayesian approach, the TDR method stands out due to simplicity in both concepts and calculations, with little in the way of restrictive population genetic assumptions. To implement this approach we have developed a computer program package (TDR1.0beta) freely available from ftp://202.204.209.200/education/video/TDR1.0beta.rar.
Collapse
Affiliation(s)
- Qian Jin
- College of Life Sciences, Capital Normal University, Beijing, P. R. China
| | - Li-Jun He
- State Key Laboratory of Estuarine and Coastal Research, East China Normal University, Shanghai, P. R. China
| | - Ai-Bing Zhang
- College of Life Sciences, Capital Normal University, Beijing, P. R. China
| |
Collapse
|
29
|
Blanco-Pastor JL, Vargas P, Pfeil BE. Coalescent simulations reveal hybridization and incomplete lineage sorting in Mediterranean Linaria. PLoS One 2012; 7:e39089. [PMID: 22768061 PMCID: PMC3387178 DOI: 10.1371/journal.pone.0039089] [Citation(s) in RCA: 88] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2012] [Accepted: 05/18/2012] [Indexed: 11/21/2022] Open
Abstract
We examined the phylogenetic history of Linaria with special emphasis on the Mediterranean sect. Supinae (44 species). We revealed extensive highly supported incongruence among two nuclear (ITS, AGT1) and two plastid regions (rpl32-trnLUAG, trnS-trnG). Coalescent simulations, a hybrid detection test and species tree inference in *BEAST revealed that incomplete lineage sorting and hybridization may both be responsible for the incongruent pattern observed. Additionally, we present a multilabelled *BEAST species tree as an alternative approach that allows the possibility of observing multiple placements in the species tree for the same taxa. That permitted the incorporation of processes such as hybridization within the tree while not violating the assumptions of the *BEAST model. This methodology is presented as a functional tool to disclose the evolutionary history of species complexes that have experienced both hybridization and incomplete lineage sorting. The drastic climatic events that have occurred in the Mediterranean since the late Miocene, including the Quaternary-type climatic oscillations, may have made both processes highly recurrent in the Mediterranean flora.
Collapse
Affiliation(s)
- José Luis Blanco-Pastor
- Departamento de Biodiversidad y Conservación, Real Jardín Botánico (RJB-CSIC), Madrid, Spain.
| | | | | |
Collapse
|
30
|
Yu Y, Degnan JH, Nakhleh L. The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection. PLoS Genet 2012; 8:e1002660. [PMID: 22536161 PMCID: PMC3330115 DOI: 10.1371/journal.pgen.1002660] [Citation(s) in RCA: 145] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2011] [Accepted: 03/05/2012] [Indexed: 11/29/2022] Open
Abstract
Gene tree topologies have proven a powerful data source for various tasks, including species tree inference and species delimitation. Consequently, methods for computing probabilities of gene trees within species trees have been developed and widely used in probabilistic inference frameworks. All these methods assume an underlying multispecies coalescent model. However, when reticulate evolutionary events such as hybridization occur, these methods are inadequate, as they do not account for such events. Methods that account for both hybridization and deep coalescence in computing the probability of a gene tree topology currently exist for very limited cases. However, no such methods exist for general cases, owing primarily to the fact that it is currently unknown how to compute the probability of a gene tree topology within the branches of a phylogenetic network. Here we present a novel method for computing the probability of gene tree topologies on phylogenetic networks and demonstrate its application to the inference of hybridization in the presence of incomplete lineage sorting. We reanalyze a Saccharomyces species data set for which multiple analyses had converged on a species tree candidate. Using our method, though, we show that an evolutionary hypothesis involving hybridization in this group has better support than one of strict divergence. A similar reanalysis on a group of three Drosophila species shows that the data is consistent with hybridization. Further, using extensive simulation studies, we demonstrate the power of gene tree topologies at obtaining accurate estimates of branch lengths and hybridization probabilities of a given phylogenetic network. Finally, we discuss identifiability issues with detecting hybridization, particularly in cases that involve extinction or incomplete sampling of taxa. Species trees depict how species split and diverge. Within the branches of a species tree, gene trees, which depict the evolutionary histories of different genomic regions in the species, grow. Evolutionary analyses of the genomes of closely related organisms have highlighted the phenomenon that gene trees may disagree with each other as well as with the species tree that contains them due to deep coalescence. Furthermore, for several groups of organisms, hybridization plays an important role in their evolution and diversification. This evolutionary event also results in gene tree incongruence and gives rise to a species phylogeny that is a network. Thus, inferring the evolutionary histories of groups of organisms where hybridization is known, or suspected, to play an evolutionary role requires dealing simultaneously with hybridization and other sources of gene tree incongruence. Currently, no methods exist for doing this with general scenarios of hybridization. In this paper, we propose the first method for this task and demonstrate its performance. We revisit the analysis of a set of yeast species and another of Drosophila species, and show that evolutionary histories involving hybridization have higher support than the strictly diverging evolutionary histories estimated when not incorporating hybridization in the analysis.
Collapse
Affiliation(s)
- Yun Yu
- Department of Computer Science, Rice University, Houston, Texas, United States of America
| | - James H. Degnan
- Department of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
- National Institute of Mathematical and Biological Synthesis, Knoxville, Tennessee, United States of America
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
31
|
Drummond AJ, Suchard MA, Xie D, Rambaut A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol 2012; 29:1969-73. [PMID: 22367748 PMCID: PMC3408070 DOI: 10.1093/molbev/mss075] [Citation(s) in RCA: 6784] [Impact Index Per Article: 521.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Computational evolutionary biology, statistical phylogenetics and coalescent-based population genetics are becoming increasingly central to the analysis and understanding of molecular sequence data. We present the Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software package version 1.7, which implements a family of Markov chain Monte Carlo (MCMC) algorithms for Bayesian phylogenetic inference, divergence time dating, coalescent analysis, phylogeography and related molecular evolutionary analyses. This package includes an enhanced graphical user interface program called Bayesian Evolutionary Analysis Utility (BEAUti) that enables access to advanced models for molecular sequence and phenotypic trait evolution that were previously available to developers only. The package also provides new tools for visualizing and summarizing multispecies coalescent and phylogeographic analyses. BEAUti and BEAST 1.7 are open source under the GNU lesser general public license and available at http://beast-mcmc.googlecode.com and http://beast.bio.ed.ac.uk
Collapse
Affiliation(s)
- Alexei J Drummond
- Department of Computer Science, University of Auckland, Auckland, New Zealand.
| | | | | | | |
Collapse
|
32
|
Baum BR, Edwards T, Mamuti M, Johnson DA. Phylogenetic relationships among the polyploid and diploid Aegilops species inferred from the nuclear 5S rDNA sequences (Poaceae: Triticeae). Genome 2012; 55:177-93. [PMID: 22338617 DOI: 10.1139/g2012-006] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Phylogenetic inferences of the polyploid Aegilops taxa were drawn based upon the analysis of 909 nuclear 5S rDNA sequences obtained from 15 Aegilops polyploid taxa (531 sequences new to this paper) and 378 sequences from our previous study on the diploid taxa. The 531 sequences can be split into two orthologous groups (unit classes), the long AE1 and short AE1 previously identified in the diploid set. An examination of the relationships between unit classes and their associated haplomes suggests that U haplome sequences found in Ae. umbellulata are the closest to the T sequences found in Amblyopyrum muticum and that sequences of the polyploid species expected to be the M type found in Ae. comos are more similar to the T haplome sequences, except in the three hexaploids Ae. glumiaristata, Ae. juvenalis, and Ae. vavilovii and the tetraploid Ae. crassa where they are found to be similar to the M haplome sequences. These three hexaploid taxa likely originated from the tetraploid Ae. crassa (DM), while the closest taxon to the fourth hexaploid, Ae. recta, is the tetraploid Ae. neglecta (UM). Based upon the distribution of the unit classes, several reticulate phylogenies depicting evolutionary relationships among diploid, tetraploid, and hexaploid taxa were constructed; however, none of these widely used methods could depict the expected reticulate relationship as previously drawn from cytogenetic analyses in this group of allopolyploid species. These results suggest that evolutionary relationships derived from models based upon the assumption of bifurcating species require careful interpretation when these same models are applied to species with reticulate evolution.
Collapse
Affiliation(s)
- B R Baum
- Agriculture and Agri-Food Canada, Eastern Cereal and Oilseed Research Centre, Neatby Building, 960 Carling Avenue, Ottawa, ON K1A 0C6, Canada.
| | | | | | | |
Collapse
|
33
|
Abstract
Large-scale databases are available that contain homologous gene families constructed from hundreds of complete genome sequences from across the three domains of life. Here, we discuss the approaches of increasing complexity aimed at extracting information on the pattern and process of gene family evolution from such datasets. In particular, we consider the models that invoke processes of gene birth (duplication and transfer) and death (loss) to explain the evolution of gene families. First, we review birth-and-death models of family size evolution and their implications in light of the universal features of family size distribution observed across different species and the three domains of life. Subsequently, we proceed to recent developments on models capable of more completely considering information in the sequences of homologous gene families through the probabilistic reconciliation of the phylogenetic histories of individual genes with the phylogenetic history of the genomes in which they have resided. To illustrate the methods and results presented, we use data from the HOGENOM database, demonstrating that the distribution of homologous gene family sizes in the genomes of the eukaryota, archaea, and bacteria exhibits remarkably similar shapes. We show that these distributions are best described by models of gene family size evolution, where for individual genes the death (loss) rate is larger than the birth (duplication and transfer) rate but new families are continually supplied to the genome by a process of origination. Finally, we use probabilistic reconciliation methods to take into consideration additional information from gene phylogenies, and find that, for prokaryotes, the majority of birth events are the result of transfer.
Collapse
|
34
|
Kühnert D, Wu CH, Drummond AJ. Phylogenetic and epidemic modeling of rapidly evolving infectious diseases. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2011; 11:1825-41. [PMID: 21906695 PMCID: PMC7106223 DOI: 10.1016/j.meegid.2011.08.005] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2011] [Revised: 08/09/2011] [Accepted: 08/09/2011] [Indexed: 12/23/2022]
Abstract
Epidemic modeling of infectious diseases has a long history in both theoretical and empirical research. However the recent explosion of genetic data has revealed the rapid rate of evolution that many populations of infectious agents undergo and has underscored the need to consider both evolutionary and ecological processes on the same time scale. Mathematical epidemiology has applied dynamical models to study infectious epidemics, but these models have tended not to exploit--or take into account--evolutionary changes and their effect on the ecological processes and population dynamics of the infectious agent. On the other hand, statistical phylogenetics has increasingly been applied to the study of infectious agents. This approach is based on phylogenetics, molecular clocks, genealogy-based population genetics and phylogeography. Bayesian Markov chain Monte Carlo and related computational tools have been the primary source of advances in these statistical phylogenetic approaches. Recently the first tentative steps have been taken to reconcile these two theoretical approaches. We survey the Bayesian phylogenetic approach to epidemic modeling of infection diseases and describe the contrasts it provides to mathematical epidemiology as well as emphasize the significance of the future unification of these two fields.
Collapse
|
35
|
Leigh JW, Lapointe FJ, Lopez P, Bapteste E. Evaluating phylogenetic congruence in the post-genomic era. Genome Biol Evol 2011; 3:571-87. [PMID: 21712432 PMCID: PMC3156567 DOI: 10.1093/gbe/evr050] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/27/2011] [Indexed: 12/04/2022] Open
Abstract
Congruence is a broadly applied notion in evolutionary biology used to justify multigene phylogeny or phylogenomics, as well as in studies of coevolution, lateral gene transfer, and as evidence for common descent. Existing methods for identifying incongruence or heterogeneity using character data were designed for data sets that are both small and expected to be rarely incongruent. At the same time, methods that assess incongruence using comparison of trees test a null hypothesis of uncorrelated tree structures, which may be inappropriate for phylogenomic studies. As such, they are ill-suited for the growing number of available genome sequences, most of which are from prokaryotes and viruses, either for phylogenomic analysis or for studies of the evolutionary forces and events that have shaped these genomes. Specifically, many existing methods scale poorly with large numbers of genes, cannot accommodate high levels of incongruence, and do not adequately model patterns of missing taxa for different markers. We propose the development of novel incongruence assessment methods suitable for the analysis of the molecular evolution of the vast majority of life and support the investigation of homogeneity of evolutionary process in cases where markers do not share identical tree structures.
Collapse
Affiliation(s)
- Jessica W Leigh
- Department of Mathematics and Statistics, University of Otago, Dunedin, New Zealand.
| | | | | | | |
Collapse
|
36
|
Abstract
Throughout the living world, genetic recombination and nucleotide substitution are the primary processes that create the genetic variation upon which natural selection acts. Just as analyses of substitution patterns can reveal a great deal about evolution, so too can analyses of recombination. Evidence of genetic recombination within the genomes of apparently asexual species can equate with evidence of cryptic sexuality. In sexually reproducing species, nonrandom patterns of sequence exchange can provide direct evidence of population subdivisions that prevent certain individuals from mating. Although an interesting topic in its own right, an important reason for analysing recombination is to account for its potentially disruptive influences on various phylogenetic-based molecular evolution analyses. Specifically, the evolutionary histories of recombinant sequences cannot be accurately described by standard bifurcating phylogenetic trees. Taking recombination into account can therefore be pivotal to the success of selection, molecular clock and various other analyses that require adequate modelling of shared ancestry and draw increased power from accurately inferred phylogenetic trees. Here, we review various computational approaches to studying recombination and provide guidelines both on how to gain insights into this important evolutionary process and on how it can be properly accounted for during molecular evolution studies.
Collapse
Affiliation(s)
- Darren P Martin
- Computational Biology Group, Institute of Infectious Diseases and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | | | | |
Collapse
|
37
|
Yu Y, Than C, Degnan JH, Nakhleh L. Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. Syst Biol 2011; 60:138-49. [PMID: 21248369 DOI: 10.1093/sysbio/syq084] [Citation(s) in RCA: 132] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Analyses of the increasingly available genomic data continue to reveal the extent of hybridization and its role in the evolutionary diversification of various groups of species. We show, through extensive coalescent-based simulations of multilocus data sets on phylogenetic networks, how divergence times before and after hybridization events can result in incomplete lineage sorting with gene tree incongruence signatures identical to those exhibited by hybridization. Evolutionary analysis of such data under the assumption of a species tree model can miss all hybridization events, whereas analysis under the assumption of a species network model would grossly overestimate hybridization events. These issues necessitate a paradigm shift in evolutionary analysis under these scenarios, from a model that assumes a priori a single source of gene tree incongruence to one that integrates multiple sources in a unifying framework. We propose a framework of coalescence within the branches of a phylogenetic network and show how this framework can be used to detect hybridization despite incomplete lineage sorting. We apply the model to simulated data and show that the signature of hybridization can be revealed as long as the interval between the divergence times of the species involved in hybridization is not too small. We reanalyze a data set of 106 loci from 7 in-group Saccharomyces species for which a species tree with no hybridization has been reported in the literature. Our analysis supports the hypothesis that hybridization occurred during the evolution of this group, explaining a large amount of the incongruence in the data. Our findings show that an integrative approach to gene tree incongruence and its reconciliation is needed. Our framework will help in systematically analyzing genomic data for the occurrence of hybridization and elucidating its evolutionary role.
Collapse
Affiliation(s)
- Yun Yu
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | | | | | | |
Collapse
|
38
|
Keck BP, Near TJ. A young clade repeating an old pattern: diversity in Nothonotus darters (Teleostei: Percidae) endemic to the Cumberland River. Mol Ecol 2010; 19:5030-42. [PMID: 20946590 DOI: 10.1111/j.1365-294x.2010.04866.x] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Hypotheses of diversification in eastern North American freshwater fishes have focused primarily on allopatric distributions of species between disjunct highland areas and major river systems. However, these hypotheses do not fully explain the rich diversity of species within highland regions and river systems. Relatively old diversification events at small geographic scales have been observed in the Barcheek Darter subclade that occurs in the Cumberland River drainage (CRD) in Kentucky and Tennessee, United States of America, but it is unknown if this pattern is consistent in other darter subclades. We explored phylogeographic diversity in two species of Nothonotus darters, N. microlepidus and N. sanguifluus, endemic to the CRD to compare phylogenetic patterns between Barcheek Darters and species of Nothonotus. We collected sequence data for a mitochondrial gene (cytb) and three nuclear genes (MLL, S7 and RAG1) from 19 N. microlepidus and 35 N. sanguifluus specimens. Gene trees were estimated using maximum likelihood and Bayesian methods, and a 'species tree' was inferred using a Bayesian method. These trees indicate that species diversity in Nothonotus is underestimated. Five distinct lineages were evident, despite retained ancestral polymorphism and unsampled extirpated populations. Comparison of chronograms for Barcheek Darters and Nothonotus revealed that microendemism resulting from species diversification at small geographic scales in the CRD is a consistent pattern in both old and young darter subclades. Our analyses reveal that geographic isolating mechanisms that result in similar phylogeographic patterns in the CRD are persistent through long expanses of evolutionary time.
Collapse
Affiliation(s)
- Benjamin P Keck
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06520, USA.
| | | |
Collapse
|
39
|
Matsen FA. constNJ: An Algorithm to Reconstruct Sets of Phylogenetic Trees Satisfying Pairwise Topological Constraints. J Comput Biol 2010; 17:799-818. [DOI: 10.1089/cmb.2009.0201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Frederick A. Matsen
- Program in Computational Biology, Fred Hutchinson Cancer Research Center 1100, Seattle, Washington, USA
| |
Collapse
|