1
|
Williams TA, Davin AA, Szánthó LL, Stamatakis A, Wahl NA, Woodcroft BJ, Soo RM, Eme L, Sheridan PO, Gubry-Rangin C, Spang A, Hugenholtz P, Szöllősi GJ. Phylogenetic reconciliation: making the most of genomes to understand microbial ecology and evolution. THE ISME JOURNAL 2024; 18:wrae129. [PMID: 39001714 PMCID: PMC11293204 DOI: 10.1093/ismejo/wrae129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 07/01/2024] [Accepted: 07/12/2024] [Indexed: 07/15/2024]
Abstract
In recent years, phylogenetic reconciliation has emerged as a promising approach for studying microbial ecology and evolution. The core idea is to model how gene trees evolve along a species tree and to explain differences between them via evolutionary events including gene duplications, transfers, and losses. Here, we describe how phylogenetic reconciliation provides a natural framework for studying genome evolution and highlight recent applications including ancestral gene content inference, the rooting of species trees, and the insights into metabolic evolution and ecological transitions they yield. Reconciliation analyses have elucidated the evolution of diverse microbial lineages, from Chlamydiae to Asgard archaea, shedding light on ecological adaptation, host-microbe interactions, and symbiotic relationships. However, there are many opportunities for broader application of the approach in microbiology. Continuing improvements to make reconciliation models more realistic and scalable, and integration of ecological metadata such as habitat, pH, temperature, and oxygen use offer enormous potential for understanding the rich tapestry of microbial life.
Collapse
Affiliation(s)
- Tom A Williams
- School of Biological Sciences, University of Bristol, Bristol BS81TQ, United Kingdom
| | - Adrian A Davin
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 113-0033 Tokyo, Japan
| | - Lénárd L Szánthó
- MTA-ELTE “Lendület” Evolutionary Genomics Research Group, Eötvös University, 1117 Budapest, Hungary
- Model-Based Evolutionary Genomics Unit, Okinawa Institute of Science and Technology Graduate University, 904-0495 Okinawa, Japan
| | - Alexandros Stamatakis
- Biodiversity Computing Group, Institute of Computer Science, Foundation for Research and Technology Hellas, 70013 Heraklion, Greece
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany
| | - Noah A Wahl
- Biodiversity Computing Group, Institute of Computer Science, Foundation for Research and Technology Hellas, 70013 Heraklion, Greece
| | - Ben J Woodcroft
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology (QUT), Translational Research Institute, Woolloongabba, QLD 4102, Australia
| | - Rochelle M Soo
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Laura Eme
- Unité d’Ecologie, Systématique et Evolution, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
| | - Paul O Sheridan
- School of Biological and Chemical Sciences, University of Galway, Galway H91 TK33, Ireland
| | - Cecile Gubry-Rangin
- School of Biological Sciences, University of Aberdeen, Aberdeen AB24 3FX, United Kingdom
| | - Anja Spang
- Department of Marine Microbiology and Biogeochemistry, NIOZ, Royal Netherlands Institute for Sea Research, PO Box 59, 1790 AB Den Burg, The Netherlands
- Department of Evolutionary & Population Biology, Institute for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam, Amsterdam, The Netherlands
| | - Philip Hugenholtz
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Gergely J Szöllősi
- MTA-ELTE “Lendület” Evolutionary Genomics Research Group, Eötvös University, 1117 Budapest, Hungary
- Model-Based Evolutionary Genomics Unit, Okinawa Institute of Science and Technology Graduate University, 904-0495 Okinawa, Japan
- Institute of Evolution, HUN REN Centre for Ecological Research, 1121 Budapest, Hungary
| |
Collapse
|
2
|
Katriel G, Mahanaymi U, Brezner S, Kezel N, Koutschan C, Zeilberger D, Steel M, Snir S. Gene Transfer-Based Phylogenetics: Analytical Expressions and Additivity via Birth-Death Theory. Syst Biol 2023; 72:1403-1417. [PMID: 37862116 DOI: 10.1093/sysbio/syad060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 09/01/2023] [Accepted: 10/05/2023] [Indexed: 10/22/2023] Open
Abstract
The genomic era has opened up vast opportunities in molecular systematics, one of which is deciphering the evolutionary history in fine detail. Under this mass of data, analyzing the point mutations of standard markers is often too crude and slow for fine-scale phylogenetics. Nevertheless, genome dynamics (GD) events provide alternative, often richer information. The synteny index (SI) between a pair of genomes combines gene order and gene content information, allowing the comparison of genomes of unequal gene content, together with order considerations of their common genes. Recently, genome dynamics has been modeled as a continuous-time Markov process, and gene distance in the genome as a birth-death-immigration process. Nevertheless, due to complexities arising in this setting, no precise and provably consistent estimators could be derived, resulting in heuristic solutions. Here, we extend this modeling approach by using techniques from birth-death theory to derive explicit expressions of the system's probabilistic dynamics in the form of rational functions of the model parameters. This, in turn, allows us to infer analytically accurate distances between organisms based on their SI. Subsequently, we establish additivity of this estimated evolutionary distance (a desirable property yielding phylogenetic consistency). Applying the new measure in simulation studies shows that it provides accurate results in realistic settings and even under model extensions such as gene gain/loss or over a tree structure. In the real-data realm, we applied the new formulation to unique data structure that we constructed-the ordered orthology DB-based on a new version of the EggNOG database, to construct a tree with more than 4.5K taxa. To the best of our knowledge, this is the largest gene-order-based tree constructed and it overcomes shortcomings found in previous approaches. Constructing a GD-based tree allows to confirm and contrast findings based on other phylogenetic approaches, as we show.
Collapse
Affiliation(s)
- Guy Katriel
- Department of Mathematics, Braude College of Engineering, Karmiel, Israel
| | - Udi Mahanaymi
- Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| | - Shelly Brezner
- Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| | - Noor Kezel
- Department of Mathematics, University of Haifa, Haifa, Israel
| | | | - Doron Zeilberger
- Department of Mathematics, Rutgers University, New Brunwick, NJ, USA
| | - Mike Steel
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| | - Sagi Snir
- Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| |
Collapse
|
3
|
Ruomeng B, Meihao O, Siru Z, Shichen G, Yixian Z, Junhong C, Ruijie M, Yuan L, Gezhi X, Xingyu C, Shiyi Z, Aihui Z, Fang B. Degradation strategies of pesticide residue: From chemicals to synthetic biology. Synth Syst Biotechnol 2023; 8:302-313. [PMID: 37122957 PMCID: PMC10130697 DOI: 10.1016/j.synbio.2023.03.005] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Revised: 03/16/2023] [Accepted: 03/20/2023] [Indexed: 04/04/2023] Open
Abstract
The past 50 years have witnessed a massive expansion in the demand and application of pesticides. However, pesticides are difficult to be completely degraded without intervention hence the pesticide residue could pose a persistent threat to non-target organisms in many aspects. To aim at the problem of the abuse of pesticide products and excessive pesticide residues in the environment, chemical and biological degradation methods are widely developed but are scaled and insufficient to solve such a pollution. In recent years, bio-degradative tools instructed by synthetic biological principles have been further studied and have paved a way for pesticide degradation. Combining the customized design strategy and standardized assembly mode, the engineering bacteria for multi-dimensional degradation has become an effective tool for pesticide residue degradation. This review introduces the mechanisms and hazards of different pesticides, summarizes the methods applied in the degradation of pesticide residues, and discusses the advantages, applications, and prospects of synthetic biology in degrading pesticide residues.
Collapse
|
4
|
Zaman S, Sledzieski S, Berger B, Wu YC, Bansal MS. virDTL: Viral Recombination Analysis Through Phylogenetic Reconciliation and Its Application to Sarbecoviruses and SARS-CoV-2. J Comput Biol 2023; 30:3-20. [PMID: 36125448 PMCID: PMC10081712 DOI: 10.1089/cmb.2021.0507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
An accurate understanding of the evolutionary history of rapidly-evolving viruses like SARS-CoV-2, responsible for the COVID-19 pandemic, is crucial to tracking and preventing the spread of emerging pathogens. However, viruses undergo frequent recombination, which makes it difficult to trace their evolutionary history using traditional phylogenetic methods. In this study, we present a phylogenetic workflow, virDTL, for analyzing viral evolution in the presence of recombination. Our approach leverages reconciliation methods developed for inferring horizontal gene transfer in prokaryotes and, compared to existing tools, is uniquely able to identify ancestral recombinations while accounting for several sources of inference uncertainty, including in the construction of a strain tree, estimation and rooting of gene family trees, and reconciliation itself. We apply this workflow to the Sarbecovirus subgenus and demonstrate how a principled analysis of predicted recombination gives insight into the evolution of SARS-CoV-2. In addition to providing confirming evidence for the horseshoe bat as its zoonotic origin, we identify several ancestral recombination events that merit further study.
Collapse
Affiliation(s)
- Sumaira Zaman
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut, USA
| | - Samuel Sledzieski
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.,Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Yi-Chieh Wu
- Department of Computer Science, Harvey Mudd College, Claremont, California, USA
| | - Mukul S Bansal
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut, USA.,The Institute for Systems Genomics, University of Connecticut, Storrs, Connecticut, USA
| |
Collapse
|
5
|
Volk A, Lee J. Cyanobacterial blooms: A player in the freshwater environmental resistome with public health relevance? ENVIRONMENTAL RESEARCH 2023; 216:114612. [PMID: 36272588 DOI: 10.1016/j.envres.2022.114612] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 10/11/2022] [Accepted: 10/16/2022] [Indexed: 06/16/2023]
Abstract
Cyanobacterial harmful algal blooms (cyanoHABs) are an ecological concern because of large ecosystem-disrupting blooms and a global public health concern because of the cyanotoxins produced by certain bloom-forming species. Another threat to global public health is the dissemination of antibiotic resistance (AR) in freshwater environmental reservoirs from anthropogenic sources, such as wastewater discharge and urban and agricultural runoff. In this study, cyanobacteria are now hypothesized to play a role in the environmental resistome. A non-systematic literature review of studies using molecular techniques (such as PCR and metagenomic sequencing) was conducted to explore indirect and direct ways cyanobacteria might contribute to environmental AR. Results show cyanobacteria can host antibiotic resistance genes (ARGs) and might promote the spread of ARGs in bacteria due to the significant contribution of mobile genetic elements (MGEs) located in genera such as Microcystis. However, cyanobacteria may promote or inhibit the spread of ARGs in environmental freshwater bacteria due to other factors as well. The purpose of this review is to 1) consider the role of cyanobacteria as AR hosts, since cyanoHABs are historically considered to be a separate problem from AR, and 2) to identify the knowledge gap in understanding cyanobacteria as ARG reservoirs. Cyanobacterial blooms, as well as other biotic (e.g. interactions with protists or cyanophages) and abiotic factors, should be studied further using advanced methods such as shotgun metagenomic and long read sequencing to clarify the extent of their functional ARGs/MGEs and influences on environmental AR.
Collapse
Affiliation(s)
- Abigail Volk
- Environmental Sciences Graduate Program, The Ohio State University, Columbus, OH, United States
| | - Jiyoung Lee
- College of Public Health, Division of Environmental Health Sciences, The Ohio State University, Columbus, OH, United States; Department of Food Science & Technology, The Ohio State University, Columbus, OH, United States; Infectious Diseases Institute, The Ohio State University, Columbus, OH, United States.
| |
Collapse
|
6
|
Affiliation(s)
- Hugo Menet
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
| | - Vincent Daubin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
- * E-mail: (VD); (ET)
| | - Eric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
- Inria, centre de recherche de Lyon, Villeurbanne, France
- * E-mail: (VD); (ET)
| |
Collapse
|
7
|
Susko E. Complex statistical modelling for phylogenetic inference. CAN J STAT 2022. [DOI: 10.1002/cjs.11741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Edward Susko
- Department of Mathematics and Statistics Dalhousie University Halifax Nova Scotia Canada B3H 3J5
| |
Collapse
|
8
|
Immunoglobulin heavy constant gamma gene evolution is modulated by both the divergent and birth-and-death evolutionary models. Primates 2022; 63:611-625. [DOI: 10.1007/s10329-022-01019-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 08/31/2022] [Indexed: 11/27/2022]
|
9
|
Harris BJ, Sheridan PO, Davín AA, Gubry-Rangin C, Szöllősi GJ, Williams TA. Rooting Species Trees Using Gene Tree-Species Tree Reconciliation. Methods Mol Biol 2022; 2569:189-211. [PMID: 36083449 DOI: 10.1007/978-1-0716-2691-7_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Interpreting phylogenetic trees requires a root, which provides the direction of evolution and polarizes ancestor-descendant relationships. But inferring the root using genetic data is difficult, particularly in cases where the closest available outgroup is only distantly related, which are common for microbes. In this chapter, we present a workflow for estimating rooted species trees and the evolutionary history of the gene families that evolve within them using probabilistic gene tree-species tree reconciliation. We illustrate the pipeline using a small dataset of prokaryotic genomes, for which the example scripts can be run using modest computer resources. We describe the rooting method used in this work in the context or other rooting strategies and discuss some of the limitations and opportunities presented by probabilistic gene tree-species tree reconciliation methods.
Collapse
Affiliation(s)
- Brogan J Harris
- School of Biological Sciences, University of Bristol, Bristol, UK
| | - Paul O Sheridan
- School of Biological Sciences, University of Bristol, Bristol, UK
- School of Biological Sciences, University of Aberdeen, Aberdeen, UK
| | - Adrián A Davín
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | | | - Gergely J Szöllősi
- Dept. of Biological Physics, Eötvös Loránd University, Budapest, Hungary
- MTA-ELTE "Lendület" Evolutionary Genomics Research Group, Budapest, Hungary
- Institute of Evolution, Centre for Ecological Research, Budapest, Hungary
| | - Tom A Williams
- School of Biological Sciences, University of Bristol, Bristol, UK.
| |
Collapse
|
10
|
Bansal MS. Deciphering Microbial Gene Family Evolution Using Duplication-Transfer-Loss Reconciliation and RANGER-DTL. Methods Mol Biol 2022; 2569:233-252. [PMID: 36083451 DOI: 10.1007/978-1-0716-2691-7_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Phylogenetic reconciliation has emerged as a principled, highly effective technique for investigating the origin, spread, and evolutionary history of microbial gene families. Proper application of phylogenetic reconciliation requires a clear understanding of potential pitfalls and sources of error, and knowledge of the most effective reconciliation-based tools and protocols to use to maximize accuracy. In this book chapter, we provide a brief overview of Duplication-Transfer-Loss (DTL) reconciliation, the standard reconciliation model used to study microbial gene families and provide a step-by-step computational protocol to maximize the accuracy of DTL reconciliation and minimize false-positive evolutionary inferences.
Collapse
Affiliation(s)
- Mukul S Bansal
- Department of Computer Science & Engineering, University of Connecticut, Storrs, CT, USA.
| |
Collapse
|
11
|
Improved Duplication-Transfer-Loss Reconciliation with Extinct and Unsampled Lineages. ALGORITHMS 2021. [DOI: 10.3390/a14080231] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Duplication-Transfer-Loss (DTL) reconciliation is a widely used computational technique for understanding gene family evolution and inferring horizontal gene transfer (transfer for short) in microbes. However, most existing models and implementations of DTL reconciliation cannot account for the effect of unsampled or extinct species lineages on the evolution of gene families, likely affecting their accuracy. Accounting for the presence and possible impact of any unsampled species lineages, including those that are extinct, is especially important for inferring and studying horizontal transfer since many genes in the species lineages represented in the reconciliation analysis are likely to have been acquired through horizontal transfer from unsampled lineages. While models of DTL reconciliation that account for transfer from unsampled lineages have already been proposed, they use a relatively simple framework for transfer from unsampled lineages and cannot explicitly infer the location on the species tree of each unsampled or extinct lineage associated with an identified transfer event. Furthermore, there does not yet exist any systematic studies to assess the impact of accounting for unsampled lineages on the accuracy of DTL reconciliation. In this work, we address these deficiencies by (i) introducing an extended DTL reconciliation model, called the DTLx reconciliation model, that accounts for unsampled and extinct species lineages in a new, more functional manner compared to existing models, (ii) showing that optimal reconciliations under the new DTLx reconciliation model can be computed just as efficiently as under the fastest DTL reconciliation model, (iii) providing an efficient algorithm for sampling optimal DTLx reconciliations uniformly at random, (iv) performing the first systematic simulation study to assess the impact of accounting for unsampled lineages on the accuracy of DTL reconciliation, and (v) comparing the accuracies of inferring transfers from unsampled lineages under our new model and the only other previously proposed parsimony-based model for this problem.
Collapse
|
12
|
Indirect identification of horizontal gene transfer. J Math Biol 2021; 83:10. [PMID: 34218334 PMCID: PMC8254804 DOI: 10.1007/s00285-021-01631-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 04/06/2021] [Accepted: 06/13/2021] [Indexed: 12/04/2022]
Abstract
Several implicit methods to infer horizontal gene transfer (HGT) focus on pairs of genes that have diverged only after the divergence of the two species in which the genes reside. This situation defines the edge set of a graph, the later-divergence-time (LDT) graph, whose vertices correspond to genes colored by their species. We investigate these graphs in the setting of relaxed scenarios, i.e., evolutionary scenarios that encompass all commonly used variants of duplication-transfer-loss scenarios in the literature. We characterize LDT graphs as a subclass of properly vertex-colored cographs, and provide a polynomial-time recognition algorithm as well as an algorithm to construct a relaxed scenario that explains a given LDT. An edge in an LDT graph implies that the two corresponding genes are separated by at least one HGT event. The converse is not true, however. We show that the complete xenology relation is described by an rs-Fitch graph, i.e., a complete multipartite graph satisfying constraints on the vertex coloring. This class of vertex-colored graphs is also recognizable in polynomial time. We finally address the question “how much information about all HGT events is contained in LDT graphs” with the help of simulations of evolutionary scenarios with a wide range of duplication, loss, and HGT events. In particular, we show that a simple greedy graph editing scheme can be used to efficiently detect HGT events that are implicitly contained in LDT graphs.
Collapse
|
13
|
New Approaches for Inferring Phylogenies in the Presence of Paralogs. Trends Genet 2021; 37:174-187. [DOI: 10.1016/j.tig.2020.08.012] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 08/13/2020] [Accepted: 08/19/2020] [Indexed: 12/18/2022]
|
14
|
Sacko O, Barnes CL, Greene LH, Lee JW. Survivability of Wild-Type and Genetically Engineered Thermosynechococcus elongatus BP1 with Different Temperature Conditions. APPLIED BIOSAFETY 2020; 25:104-117. [PMID: 36035080 PMCID: PMC9387736 DOI: 10.1177/1535676019896640] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2025]
Abstract
INTRODUCTION Thermosynechococcus elongatus BP1 is a thermophilic strain of cyanobacteria that has an optimum growth at 57°C, and according to previous analysis by Yamaoka et al, T elongatus BP1 cannot survive at a temperature below 30°C. This suggests that the thermophilic property of this strain may be used as a natural biosafety feature to limit the spread of genetically engineered (GE) organisms in the environment if physical containment fails. OBJECTIVE To further explore the growth and survivability range of T elongatus BP1, we report a growth and survivability assay of wild-type and GE T elongatus BP1 strains under different conditions. METHODS Wild-type and GE T elongatus BP1 cultures were prepared and incubated in the laboratory (high temperatures and constant light source) and greenhouse conditions (lower/varied temperatures and sunlight) for 4 weeks. The cell density was monitored weekly by measuring the optical density at 730 nm (OD730). To assess the survivability, a sample of each culture was added to fresh media, placed in laboratory conditions (42.2°C and 30 µE m-2 s-1) in multi-well plates and observed for growth for up to three weeks. Lastly, the number of viable cells were determined by plating a diluted sample of the culture on solid media and counting colony-forming units (CFU) after 1 day, 2 weeks and 4 weeks of incubation in laboratory or greenhouse conditions. RESULTS Our experimental results demonstrated that growth was hindered but that the cells did not entirely die within 2 to 4 weeks at warm temperatures (31.42°C-36.27°C). The study also showed that 2 weeks of exposure to cool temperature conditions (15.44°C-25.30°C) was enough to cause complete death of GE T elongatus BP1. However, it took 2 to 4 weeks for the wild-type T elongatus BP1 cells to die. CONCLUSION This study revealed that the thermophilic feature of the T elongatus BP1 may be used as an effective biosafety mechanism at a cool temperature between 15.44°C and 25.30°C but may not be able to serve as a biosafety mechanism at warmer temperatures.
Collapse
Affiliation(s)
- Oumar Sacko
- Department of Chemistry and Biochemistry, Old Dominion University, Norfolk, VA, USA
- Authors Oumar Sacko and Cherrelle L. Barnes contributed equally to this article
| | - Cherrelle L. Barnes
- Department of Chemistry and Biochemistry, Old Dominion University, Norfolk, VA, USA
- Authors Oumar Sacko and Cherrelle L. Barnes contributed equally to this article
| | - Lesley H. Greene
- Department of Chemistry and Biochemistry, Old Dominion University, Norfolk, VA, USA
| | - James W. Lee
- Department of Chemistry and Biochemistry, Old Dominion University, Norfolk, VA, USA
| |
Collapse
|
15
|
Delabre M, El-Mabrouk N, Huber KT, Lafond M, Moulton V, Noutahi E, Castellanos MS. Evolution through segmental duplications and losses: a Super-Reconciliation approach. Algorithms Mol Biol 2020; 15:12. [PMID: 32508979 PMCID: PMC7249433 DOI: 10.1186/s13015-020-00171-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Accepted: 05/05/2020] [Indexed: 02/02/2023] Open
Abstract
The classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is not appropriate for genes grouped into syntenic blocks, which are more plausibly the result of a concerted evolution. Here, we introduce the Super-Reconciliation problem which consists in inferring a history of segmental duplication and loss events (involving a set of neighboring genes) leading to a set of present-day syntenies from a single ancestral one. In other words, we extend the traditional Duplication-Loss reconciliation problem of a single gene tree, to a set of trees, accounting for segmental duplications and losses. Existency of a Super-Reconciliation depends on individual gene tree consistency. In addition, ignoring rearrangements implies that existency also depends on gene order consistency. We first show that the problem of reconstructing a most parsimonious Super-Reconciliation, if any, is NP-hard and give an exact exponential-time algorithm to solve it. Alternatively, we show that accounting for rearrangements in the evolutionary model, but still only minimizing segmental duplication and loss events, leads to an exact polynomial-time algorithm. We finally assess time efficiency of the former exponential time algorithm for the Duplication-Loss model on simulated datasets, and give a proof of concept on the opioid receptor genes.
Collapse
|
16
|
Wade T, Rangel LT, Kundu S, Fournier GP, Bansal MS. Assessing the accuracy of phylogenetic rooting methods on prokaryotic gene families. PLoS One 2020; 15:e0232950. [PMID: 32413061 PMCID: PMC7228096 DOI: 10.1371/journal.pone.0232950] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Accepted: 04/24/2020] [Indexed: 12/18/2022] Open
Abstract
Almost all standard phylogenetic methods for reconstructing gene trees result in unrooted trees; yet, many of the most useful applications of gene trees require that the gene trees be correctly rooted. As a result, several computational methods have been developed for inferring the root of unrooted gene trees. However, the accuracy of such methods has never been systematically evaluated on prokaryotic gene families, where horizontal gene transfer is often one of the dominant evolutionary events driving gene family evolution. In this work, we address this gap by conducting a thorough comparative evaluation of five different rooting methods using large collections of both simulated and empirical prokaryotic gene trees. Our simulation study is based on 6000 true and reconstructed gene trees on 100 species and characterizes the rooting accuracy of the four methods under 36 different evolutionary conditions and 3 levels of gene tree reconstruction error. The empirical study is based on a large, carefully designed data set of 3098 gene trees from 504 bacterial species (406 Alphaproteobacteria and 98 Cyanobacteria) and reveals insights that supplement those gleaned from the simulation study. Overall, this work provides several valuable insights into the accuracy of the considered methods that will help inform the choice of rooting methods to use when studying microbial gene family evolution. Among other findings, this study identifies parsimonious Duplication-Transfer-Loss (DTL) rooting and Minimal Ancestor Deviation (MAD) rooting as two of the most accurate gene tree rooting methods for prokaryotes and specifies the evolutionary conditions under which these methods are most accurate, demonstrates that DTL rooting is highly sensitive to high evolutionary rates and gene tree error, and that rooting methods based on branch-lengths are generally robust to gene tree reconstruction error.
Collapse
Affiliation(s)
- Taylor Wade
- Department of Computer Science & Engineering, University of Connecticut, Storrs, CT, United States of America
| | - L. Thiberio Rangel
- Department of Earth, Atmospheric & Planetary Sciences, Massachusetts Institute of Technology, Cambridge, MA, United States of America
| | - Soumya Kundu
- Department of Computer Science & Engineering, University of Connecticut, Storrs, CT, United States of America
| | - Gregory P. Fournier
- Department of Earth, Atmospheric & Planetary Sciences, Massachusetts Institute of Technology, Cambridge, MA, United States of America
| | - Mukul S. Bansal
- Department of Computer Science & Engineering, University of Connecticut, Storrs, CT, United States of America
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, United States of America
| |
Collapse
|
17
|
Sevillya G, Doerr D, Lerner Y, Stoye J, Steel M, Snir S. Horizontal Gene Transfer Phylogenetics: A Random Walk Approach. Mol Biol Evol 2020; 37:1470-1479. [PMID: 31845962 DOI: 10.1093/molbev/msz302] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The dramatic decrease in time and cost for generating genetic sequence data has opened up vast opportunities in molecular systematics, one of which is the ability to decipher the evolutionary history of strains of a species. Under this fine systematic resolution, the standard markers are too crude to provide a phylogenetic signal. Nevertheless, among prokaryotes, genome dynamics in the form of horizontal gene transfer (HGT) between organisms and gene loss seem to provide far richer information by affecting both gene order and gene content. The "synteny index" (SI) between a pair of genomes combines these latter two factors, allowing comparison of genomes with unequal gene content, together with order considerations of their common genes. Although this approach is useful for classifying close relatives, no rigorous statistical modeling for it has been suggested. Such modeling is valuable, as it allows observed measures to be transformed into estimates of time periods during evolution, yielding the "additivity" of the measure. To the best of our knowledge, there is no other additivity proof for other gene order/content measures under HGT. Here, we provide a first statistical model and analysis for the SI measure. We model the "gene neighborhood" as a "birth-death-immigration" process affected by the HGT activity over the genome, and analytically relate the HGT rate and time to the expected SI. This model is asymptotic and thus provides accurate results, assuming infinite size genomes. Therefore, we also developed a heuristic model following an "exponential decay" function, accounting for biologically realistic values, which performed well in simulations. Applying this model to 1,133 prokaryotes partitioned to 39 clusters by the rank of genus yields that the average number of genome dynamics events per gene in the phylogenetic depth of genus is around half with significant variability between genera. This result extends and confirms similar results obtained for individual genera in different manners.
Collapse
Affiliation(s)
- Gur Sevillya
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| | - Daniel Doerr
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Yael Lerner
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| | - Jens Stoye
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Mike Steel
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| | - Sagi Snir
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| |
Collapse
|
18
|
Cao Y, Trivellone V, Dietrich CH. A timetree for phytoplasmas (Mollicutes) with new insights on patterns of evolution and diversification. Mol Phylogenet Evol 2020; 149:106826. [PMID: 32283136 DOI: 10.1016/j.ympev.2020.106826] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Revised: 02/12/2020] [Accepted: 04/07/2020] [Indexed: 11/16/2022]
Abstract
The first comprehensive timetree is presented for phytoplasmas, a diverse group of obligate intracellular bacteria restricted to phloem sieve elements of vascular plants and tissues of their hemipteran insect vectors. Maximum likelihood-based phylogenetic analysis of DNA sequence data from the 16S rRNA and methionine aminopeptidase (map) genes yielded well resolved estimates of phylogenetic relationships among major phytoplasma lineages, 16Sr groups and known strains of phytoplasmas. Age estimates for divergences among two major lineages of Mollicutes based on a previous comprehensive bacterial timetree were used to calibrate an initial 16S timetree. A separate timetree was estimated based on the more rapidly-evolving map gene, with an internal calibration based on a recent divergence within two related 16Sr phytoplasma subgroups in group 16SrV thought to have been driven by the introduction of the North American leafhopper vector Scaphoideus titanus Ball into Europe during the early part of the 20th century. Combining the resulting divergence time estimates into a final 16S timetree suggests that evolutionary rates have remained relatively constant overall through the evolution of phytoplasmas and that the origin of this lineage, at ~641 million years ago (Ma), preceded the origin of land plants and hemipteran insects. Nevertheless, the crown group of phytoplasmas is estimated to have begun diversifying ~316 Ma, roughly coinciding with the origin of seed plants and Hemiptera. Some phytoplasma groups apparently associated with particular plant families or insect vector lineages generally arose more recently than their respective hosts and vectors, suggesting that vector-mediated host shifts have been an important mechanism in the evolutionary diversification of phytoplasmas. Further progress in understanding macroevolutionary patterns in phytoplasmas is hindered by large gaps in knowledge of the identity of competent vectors and lack of data on phytoplasma associations with non-economically important plants.
Collapse
Affiliation(s)
- Yanghui Cao
- Illinois Natural History Survey, Prairie Research Institute, University of Illinois, Champaign, IL 61820, USA
| | - Valeria Trivellone
- Illinois Natural History Survey, Prairie Research Institute, University of Illinois, Champaign, IL 61820, USA.
| | - Christopher H Dietrich
- Illinois Natural History Survey, Prairie Research Institute, University of Illinois, Champaign, IL 61820, USA
| |
Collapse
|
19
|
Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models. J Math Biol 2020; 80:1353-1388. [PMID: 32060618 PMCID: PMC7052048 DOI: 10.1007/s00285-019-01465-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 11/18/2019] [Indexed: 10/28/2022]
Abstract
Given a set of species whose evolution is represented by a species tree, a gene family is a group of genes having evolved from a single ancestral gene. A gene family evolves along the branches of a species tree through various mechanisms, including-but not limited to-speciation ([Formula: see text]), gene duplication ([Formula: see text]), gene loss ([Formula: see text]), and horizontal gene transfer ([Formula: see text]). The reconstruction of a gene tree representing the evolution of a gene family constrained by a species tree is an important problem in phylogenomics. However, unlike in the multispecies coalescent evolutionary model that considers only speciation and incomplete lineage sorting events, very little is known about the search space for gene family histories accounting for gene duplication, gene loss and horizontal gene transfer (the [Formula: see text]-model). In this work, we introduce the notion of evolutionary histories defined as a binary ordered rooted tree describing the evolution of a gene family, constrained by a species tree in the [Formula: see text]-model. We provide formal grammars describing the set of all evolutionary histories that are compatible with a given species tree, whether it is ranked or unranked. These grammars allow us, using either analytic combinatorics or dynamic programming, to efficiently compute the number of histories of a given size, and also to generate random histories of a given size under the uniform distribution. We apply these tools to obtain exact asymptotics for the number of gene family histories for two species trees, the rooted caterpillar and complete binary tree, as well as estimates of the range of the exponential growth factor of the number of histories for random species trees of size up to 25. Our results show that including horizontal gene transfers induce a dramatic increase of the number of evolutionary histories. We also show that, within ranked species trees, the number of evolutionary histories in the [Formula: see text]-model is almost independent of the species tree topology. These results establish firm foundations for the development of ensemble methods for the prediction of reconciliations.
Collapse
|
20
|
TreeSolve: Rapid Error-Correction of Microbial Gene Trees. ALGORITHMS FOR COMPUTATIONAL BIOLOGY 2020. [PMCID: PMC7197061 DOI: 10.1007/978-3-030-42266-0_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
Gene tree reconstruction is an important problem in phylogenetics. However, gene sequences often lack sufficient information to confidently distinguish between competing gene tree topologies. To overcome this limitation, the best gene tree reconstruction methods use a known species tree topology to guide the reconstruction of the gene tree. While such species-tree-aware gene tree reconstruction methods have been repeatedly shown to result in vastly more accurate gene trees, the most accurate of these methods often have prohibitively high computational costs. In this work, we introduce a highly computationally efficient and robust species-tree-aware method, named TreeSolve, for microbial gene tree reconstruction. TreeSolve works by collapsing weakly supported edges of the input gene tree, resulting in a non-binary gene tree, and then using new algorithms and techniques to optimally resolve the non-binary gene trees with respect to the given species tree in an appropriately and dynamically constrained search space. Using thousands of real and simulated gene trees, we demonstrate that TreeSolve significantly outperforms the best existing species-tree-aware methods for microbes in terms of accuracy, speed, or both. Crucially, TreeSolve also implicitly keeps track of multiple optimal gene tree reconstructions and can compute either a single best estimate of the gene tree or multiple distinct estimates. As we demonstrate, aggregating over multiple gene tree candidates helps distinguish between correct and incorrect parts of an error-corrected gene tree. Thus, TreeSolve not only enables rapid gene tree error-correction for large gene trees without compromising on accuracy, but also enables accounting of inference uncertainty.
Collapse
|
21
|
Mawhorter R, Libeskind-Hadas R. Hierarchical clustering of maximum parsimony reconciliations. BMC Bioinformatics 2019; 20:612. [PMID: 31775628 PMCID: PMC6882150 DOI: 10.1186/s12859-019-3223-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 11/14/2019] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Maximum parsimony reconciliation in the duplication-transfer-loss model is a widely-used method for analyzing the evolutionary histories of pairs of entities such as hosts and parasites, symbiont species, and species and genes. While efficient algorithms are known for finding maximum parsimony reconciliations, the number of such reconciliations can be exponential in the size of the trees. Since these reconciliations can differ substantially from one another, making inferences from any one reconciliation may lead to conclusions that are not supported, or may even be contradicted, by other maximum parsimony reconciliations. Therefore, there is a need to find small sets of best representative reconciliations when the space of solutions is large and diverse. RESULTS We provide a general framework for hierarchical clustering the space of maximum parsimony reconciliations. We demonstrate this framework for two specific linkage criteria, one that seeks to maximize the average support of the events found in the reconciliations in each cluster and the other that seeks to minimize the distance between reconciliations in each cluster. We analyze the asymptotic worst-case running times and provide experimental results that demonstrate the viability and utility of this approach. CONCLUSIONS The hierarchical clustering algorithm method proposed here provides a new approach to find a set of representative reconciliations in the potentially vast and diverse space of maximum parsimony reconciliations.
Collapse
Affiliation(s)
- Ross Mawhorter
- Department of Computer Science, Harvey Mudd College, Claremont, California, USA
| | - Ran Libeskind-Hadas
- Department of Computer Science, Harvey Mudd College, Claremont, California, USA.
| |
Collapse
|
22
|
Duchemin W, Gence G, Arigon Chifolleau AM, Arvestad L, Bansal MS, Berry V, Boussau B, Chevenet F, Comte N, Davín AA, Dessimoz C, Dylus D, Hasic D, Mallo D, Planel R, Posada D, Scornavacca C, Szöllosi G, Zhang L, Tannier É, Daubin V. RecPhyloXML: a format for reconciled gene trees. Bioinformatics 2019; 34:3646-3652. [PMID: 29762653 PMCID: PMC6198865 DOI: 10.1093/bioinformatics/bty389] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 05/09/2018] [Indexed: 12/21/2022] Open
Abstract
Motivation A reconciliation is an annotation of the nodes of a gene tree with evolutionary events—for example, speciation, gene duplication, transfer, loss, etc.—along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs. Results Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative—albeit flexible—specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities. Availability and implementation http://phylariane.univ-lyon1.fr/recphyloxml/.
Collapse
Affiliation(s)
- Wandrille Duchemin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France.,MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
| | - Guillaume Gence
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| | - Anne-Muriel Arigon Chifolleau
- LIRMM, Université de Montpellier, CNRS, Montpellier, France.,Institut de Biologie Computationnelle (IBC), Montpellier, France
| | - Lars Arvestad
- Department of Mathematics, Stockholm University, Stockholm, Sweden.,Swedish e-Science Research Centre (SeRC), Stockholm, Sweden
| | - Mukul S Bansal
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA.,Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| | - Vincent Berry
- LIRMM, Université de Montpellier, CNRS, Montpellier, France.,Institut de Biologie Computationnelle (IBC), Montpellier, France.,ISEM, CNRS, Université de Montpellier, IRD, EPHE, Montpellier, France
| | - Bastien Boussau
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| | - François Chevenet
- LIRMM, Université de Montpellier, CNRS, Montpellier, France.,MIVEGEC, CNRS 5290, IRD 224, Université de Montpellier, Montpellier, France
| | - Nicolas Comte
- INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France
| | - Adrián A Davín
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Genetics, Evolution and Environment, University College London, London, UK.,Department of Computer Science, University College London, London, UK.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - David Dylus
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Damir Hasic
- Department of Mathematics, Faculty of Science, University of Sarajevo, Sarajevo, Bosnia and Herzegovina
| | - Diego Mallo
- Virginia G. Piper Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Rémi Planel
- Laboratoire d'Analyse Bio-informatique en Génomique et Métabolisme CNRS-UMR 8030, Commissariat à l'Énergie Atomique (CEA), Institut de Génomique, Genoscope, Evry, France
| | - David Posada
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | - Celine Scornavacca
- Institut de Biologie Computationnelle (IBC), Montpellier, France.,ISEM, CNRS, Université de Montpellier, IRD, EPHE, Montpellier, France
| | - Gergely Szöllosi
- MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
| | - Louxin Zhang
- Department of Mathematics, National University of Singapore, Singapore, Singapore
| | - Éric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France
| | - Vincent Daubin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| |
Collapse
|
23
|
Kundu S, Bansal MS. SaGePhy: an improved phylogenetic simulation framework for gene and subgene evolution. Bioinformatics 2019; 35:3496-3498. [PMID: 30715213 DOI: 10.1093/bioinformatics/btz081] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Revised: 01/21/2019] [Accepted: 01/31/2019] [Indexed: 11/14/2022] Open
Abstract
SUMMARY SaGePhy is a software package for improved phylogenetic simulation of gene and subgene evolution. SaGePhy can be used to generate species trees, gene trees and subgene or (protein) domain trees using a probabilistic birth-death process that allows for gene and subgene duplication, horizontal gene and subgene transfer and gene and subgene loss. SaGePhy implements a range of important features not found in other phylogenetic simulation frameworks/software. These include (i) simulation of subgene or domain level evolution inside one or more gene trees, (ii) simultaneous simulation of both additive and replacing horizontal gene/subgene transfers and (iii) probabilistic sampling of species tree and gene tree nodes, respectively, for gene- and domain-family birth. SaGePhy is open-source, platform independent and written in Java and Python. AVAILABILITY AND IMPLEMENTATION Executables, source code (open-source under the revised BSD license) and a detailed manual are freely available from http://compbio.engr.uconn.edu/software/sagephy/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Soumya Kundu
- Department of Computer Science & Engineering, Storrs, CT, USA
| | - Mukul S Bansal
- Department of Computer Science & Engineering, Storrs, CT, USA.,The Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| |
Collapse
|
24
|
Avino M, Ng GT, He Y, Renaud MS, Jones BR, Poon AFY. Tree shape-based approaches for the comparative study of cophylogeny. Ecol Evol 2019; 9:6756-6771. [PMID: 31312429 PMCID: PMC6618157 DOI: 10.1002/ece3.5185] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 02/21/2019] [Accepted: 03/29/2019] [Indexed: 12/17/2022] Open
Abstract
Cophylogeny is the congruence of phylogenetic relationships between two different groups of organisms due to their long-term interaction. We investigated the use of tree shape distance measures to quantify the degree of cophylogeny. We implemented a reverse-time simulation model of pathogen phylogenies within a fixed host tree, given cospeciation probability, host switching, and pathogen speciation rates. We used this model to evaluate 18 distance measures between host and pathogen trees including two kernel distances that we developed for labeled and unlabeled trees, which use branch lengths and accommodate different size trees. Finally, we used these measures to revisit published cophylogenetic studies, where authors described the observed associations as representing a high or low degree of cophylogeny. Our simulations demonstrated that some measures are more informative than others with respect to specific coevolution parameters especially when these did not assume extreme values. For real datasets, trees' associations projection revealed clustering of high concordance studies suggesting that investigators are describing it in a consistent way. Our results support the hypothesis that measures can be useful for quantifying cophylogeny. This motivates their usage in the field of coevolution and supports the development of simulation-based methods, i.e., approximate Bayesian computation, to estimate the underlying coevolutionary parameters.
Collapse
Affiliation(s)
- Mariano Avino
- Department of Pathology and Laboratory Medicine Western University London Ontario Canada
| | - Garway T Ng
- Department of Pathology and Laboratory Medicine Western University London Ontario Canada
| | - Yiying He
- Department of Pathology and Laboratory Medicine Western University London Ontario Canada
| | - Mathias S Renaud
- Department of Pathology and Laboratory Medicine Western University London Ontario Canada
| | - Bradley R Jones
- BC Centre for Excellence in HIV/AIDS Vancouver British Columbia Canada
| | - Art F Y Poon
- Department of Pathology and Laboratory Medicine Western University London Ontario Canada.,Department of Applied Mathematics Western University London Ontario Canada
| |
Collapse
|
25
|
Zhang C, Ogilvie HA, Drummond AJ, Stadler T. Bayesian Inference of Species Networks from Multilocus Sequence Data. Mol Biol Evol 2019; 35:504-517. [PMID: 29220490 PMCID: PMC5850812 DOI: 10.1093/molbev/msx307] [Citation(s) in RCA: 103] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Reticulate species evolution, such as hybridization or introgression, is relatively common in nature. In the presence of reticulation, species relationships can be captured by a rooted phylogenetic network, and orthologous gene evolution can be modeled as bifurcating gene trees embedded in the species network. We present a Bayesian approach to jointly infer species networks and gene trees from multilocus sequence data. A novel birth-hybridization process is used as the prior for the species network, and we assume a multispecies network coalescent prior for the embedded gene trees. We verify the ability of our method to correctly sample from the posterior distribution, and thus to infer a species network, through simulations. To quantify the power of our method, we reanalyze two large data sets of genes from spruces and yeasts. For the three closely related spruces, we verify the previously suggested homoploid hybridization event in this clade; for the yeast data, we find extensive hybridization events. Our method is available within the BEAST 2 add-on SpeciesNetwork, and thus provides an extensible framework for Bayesian inference of reticulate evolution.
Collapse
Affiliation(s)
- Chi Zhang
- Department of Biosystems Science and Engineering, Eidgenössische Technische Hochschule Zürich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Switzerland.,Key Laboratory of Vertebrate Evolution and Human Origins of Chinese Academy of Sciences, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, China
| | - Huw A Ogilvie
- Division of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australia.,Centre for Computational Evolution, University of Auckland, Auckland, New Zealand
| | - Alexei J Drummond
- Centre for Computational Evolution, University of Auckland, Auckland, New Zealand.,Department of Computer Science, University of Auckland, Auckland, New Zealand
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, Eidgenössische Technische Hochschule Zürich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Switzerland
| |
Collapse
|
26
|
Li L, Bansal MS. An Integrated Reconciliation Framework for Domain, Gene, and Species Level Evolution. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:63-76. [PMID: 29994126 DOI: 10.1109/tcbb.2018.2846253] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The majority of genes in eukaryotes consists of one or more protein domains that can be independently lost or gained during evolution. This gain and loss of protein domains, through domain duplications, transfers, or losses, has important evolutionary and functional consequences. Yet, even though it is well understood that domains evolve inside genes and genes inside species, there do not exist any computational frameworks to simultaneously model the evolution of domains, genes, and species and account for their inter-dependency. Here, we develop an integrated model of domain evolution that explicitly captures the interdependence of domain-, gene-, and species-level evolution. Our model extends the classical phylogenetic reconciliation framework, which infers gene family evolution by comparing gene trees and species trees, by explicitly considering domain-level evolution and decoupling domain-level events from gene-level events. In this paper, we (i) introduce the new integrated reconciliation framework, (ii) prove that the associated optimization problem is NP-hard, (iii) devise an efficient heuristic solution for the problem, (iv) apply our algorithm to a large biological dataset, and (v) demonstrate the impact of using our new computational framework compared to existing approaches. The implemented software is freely available from http://compbio.engr.uconn.edu/software/seadog/.
Collapse
|
27
|
Bansal MS, Kellis M, Kordi M, Kundu S. RANGER-DTL 2.0: rigorous reconstruction of gene-family evolution by duplication, transfer and loss. Bioinformatics 2018; 34:3214-3216. [PMID: 29688310 PMCID: PMC6137995 DOI: 10.1093/bioinformatics/bty314] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2017] [Revised: 03/27/2018] [Accepted: 04/20/2018] [Indexed: 11/30/2022] Open
Abstract
Summary RANGER-DTL 2.0 is a software program for inferring gene family evolution using Duplication-Transfer-Loss reconciliation. This new software is highly scalable and easy to use, and offers many new features not currently available in any other reconciliation program. RANGER-DTL 2.0 has a particular focus on reconciliation accuracy and can account for many sources of reconciliation uncertainty including uncertain gene tree rooting, gene tree topological uncertainty, multiple optimal reconciliations and alternative event cost assignments. RANGER-DTL 2.0 is open-source and written in C++ and Python. Availability and implementation Pre-compiled executables, source code (open-source under GNU GPL) and a detailed manual are freely available from http://compbio.engr.uconn.edu/software/RANGER-DTL/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mukul S Bansal
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - Manolis Kellis
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Broad Institute, Cambridge, MA, USA
| | - Misagh Kordi
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - Soumya Kundu
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| |
Collapse
|
28
|
Paszek J, Gorecki P. Efficient Algorithms for Genomic Duplication Models. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1515-1524. [PMID: 28541223 DOI: 10.1109/tcbb.2017.2706679] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
An important issue in evolutionary molecular biology is to discover genomic duplication episodes and their correspondence to the species tree. Existing approaches vary in the two fundamental aspects: the choice of evolutionary scenarios that model allowed locations of duplications in the species tree, and the rules of clustering gene duplications from gene trees into a single multiple duplication event. Here we study the method of clustering called minimum episodes for several models of allowed evolutionary scenarios with a focus on interval models in which every gene duplication has an interval consisting of allowed locations in the species tree. We present mathematical foundations for general genomic duplication problems. Next, we propose the first linear time and space algorithm for minimum episodes clustering jointly for any interval model and the algorithm for the most general model in which every evolutionary scenario is allowed. We also present a comparative study of different models of genomic duplication based on simulated and empirical datasets. We provided algorithms and tools that could be applied to solve efficiently minimum episodes clustering problems. Our comparative study helps to identify which model is the most reasonable choice in inferring genomic duplication events.
Collapse
|
29
|
Kundu S, Bansal MS. On the impact of uncertain gene tree rooting on duplication-transfer-loss reconciliation. BMC Bioinformatics 2018; 19:290. [PMID: 30367593 PMCID: PMC6101088 DOI: 10.1186/s12859-018-2269-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Background Duplication-Transfer-Loss (DTL) reconciliation is a powerful and increasingly popular technique for studying the evolution of microbial gene families. DTL reconciliation requires the use of rooted gene trees to perform the reconciliation with the species tree, and the standard technique for rooting gene trees is to assign a root that results in the minimum reconciliation cost across all rootings of that gene tree. However, even though it is well understood that many gene trees have multiple optimal roots, only a single optimal root is randomly chosen to create the rooted gene tree and perform the reconciliation. This remains an important overlooked and unaddressed problem in DTL reconciliation, leading to incorrect evolutionary inferences. In this work, we perform an in-depth analysis of the impact of uncertain gene tree rooting on the computed DTL reconciliation and provide the first computational tools to quantify and negate the impact of gene tree rooting uncertainty on DTL reconciliation. Results Our analysis of a large data set of over 4500 gene families from 100 species shows that a large fraction of gene trees have multiple optimal rootings, that these multiple roots often, but not always, appear closely clustered together in the same region of the gene tree, that many aspects of the reconciliation remain conserved across the multiple rootings, that gene tree error has a profound impact on the prevalence and structure of multiple optimal rootings, and that there are specific interesting patterns in the reconciliation of those gene trees that have multiple optimal roots. Conclusions Our results show that unrooted gene trees can be meaningfully reconciled and high-quality evolutionary information can be obtained from them even after accounting for multiple optimal rootings. In addition, the techniques and tools introduced in this paper make it possible to systematically avoid incorrect evolutionary inferences caused by incorrect or uncertain gene tree rooting. These tools have been implemented in the phylogenetic reconciliation software package RANGER-DTL 2.0, freely available from http://compbio.engr.uconn.edu/software/RANGER-DTL/.
Collapse
Affiliation(s)
- Soumya Kundu
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, 06269, USA
| | - Mukul S Bansal
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, 06269, USA. .,Institute for Systems Genomics, University of Connecticut, Storrs, CT, 06269, USA.
| |
Collapse
|
30
|
GATC: a genetic algorithm for gene tree construction under the Duplication-Transfer-Loss model of evolution. BMC Genomics 2018; 19:102. [PMID: 29764363 PMCID: PMC5954287 DOI: 10.1186/s12864-018-4455-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
31
|
Abstract
Biodiversity has always been predominantly microbial, and the scarcity of fossils from bacteria, archaea and microbial eukaryotes has prevented a comprehensive dating of the tree of life. Here, we show that patterns of lateral gene transfer deduced from an analysis of modern genomes encode a novel and abundant source of information about the temporal coexistence of lineages throughout the history of life. We use state-of-the-art species tree-aware phylogenetic methods to reconstruct the history of thousands of gene families and demonstrate that dates implied by gene transfers are consistent with estimates from relaxed molecular clocks in Bacteria, Archaea and Eukarya. We present the order of speciations according to lateral gene transfer data calibrated to geological time for three datasets comprising 40 genomes for Cyanobacteria, 60 genomes for Archaea and 60 genomes for Fungi. An inspection of discrepancies between transfers and clocks and a comparison with mammalian fossils show that gene transfer in microbes is potentially as informative for dating the tree of life as the geological record in macroorganisms.
Collapse
|
32
|
Comparative genomics sheds light on niche differentiation and the evolutionary history of comammox Nitrospira. ISME JOURNAL 2018. [PMID: 29515170 DOI: 10.1038/s41396-018-0083-3] [Citation(s) in RCA: 173] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
The description of comammox Nitrospira spp., performing complete ammonia-to-nitrate oxidation, and their co-occurrence with canonical β-proteobacterial ammonia oxidizing bacteria (β-AOB) in the environment, calls into question the metabolic potential of comammox Nitrospira and the evolutionary history of their ammonia oxidation pathway. We report four new comammox Nitrospira genomes, constituting two novel species, and the first comparative genomic analysis on comammox Nitrospira. Unlike canonical Nitrospira, comammox Nitrospira genomes lack genes for assimilatory nitrite reduction, suggesting that they have lost the potential to use external nitrite nitrogen sources. By contrast, compared to canonical Nitrospira, comammox Nitrospira harbor a higher diversity of urea transporters and copper homeostasis genes and lack cyanate hydratase genes. Additionally, the two comammox clades differ in their ammonium uptake systems. Contrary to β-AOB, comammox Nitrospira genomes have single copies of the two central ammonia oxidation pathway operons. Similar to ammonia oxidizing archaea and some oligotrophic AOB strains, they lack genes involved in nitric oxide reduction. Furthermore, comammox Nitrospira genomes encode genes that might allow efficient growth at low oxygen concentrations. Regarding the evolutionary history of comammox Nitrospira, our analyses indicate that several genes belonging to the ammonia oxidation pathway could have been laterally transferred from β-AOB to comammox Nitrospira. We postulate that the absence of comammox genes in other sublineage II Nitrospira genomes is the result of subsequent loss.
Collapse
|
33
|
Jacox E, Weller M, Tannier E, Scornavacca C. Resolution and reconciliation of non-binary gene trees with transfers, duplications and losses. Bioinformatics 2017; 33:980-987. [PMID: 28073758 DOI: 10.1093/bioinformatics/btw778] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Accepted: 12/02/2016] [Indexed: 11/14/2022] Open
Abstract
Summary Gene trees reconstructed from sequence alignments contain poorly supported branches when the phylogenetic signal in the sequences is insufficient to determine them all. When a species tree is available, the signal of gains and losses of genes can be used to correctly resolve the unsupported parts of the gene history. However finding a most parsimonious binary resolution of a non-binary tree obtained by contracting the unsupported branches is NP-hard if transfer events are considered as possible gene scale events, in addition to gene origination, duplication and loss. We propose an exact, parameterized algorithm to solve this problem in single-exponential time, where the parameter is the number of connected branches of the gene tree that show low support from the sequence alignment or, equivalently, the maximum number of children of any node of the gene tree once the low-support branches have been collapsed. This improves on the best known algorithm by an exponential factor. We propose a way to choose among optimal solutions based on the available information. We show the usability of this principle on several simulated and biological datasets. The results are comparable in quality to several other tested methods having similar goals, but our approach provides a lower running time and a guarantee that the produced solution is optimal. Availability and Implementation Our algorithm has been integrated into the ecceTERA phylogeny package, available at http://mbb.univ-montp2.fr/MBB/download_sources/16__ecceTERA and which can be run online at http://mbb.univ-montp2.fr/MBB/subsection/softExec.php?soft=eccetera . Contact celine.scornavacca@umontpellier.fr. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Edwin Jacox
- ISE-M, Université Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Mathias Weller
- Institut de Biologie Computationnelle (IBC), Montpellier, France.,LIRMM, Université Montpellier, CNRS, Montpellier, France
| | - Eric Tannier
- INRIA Rhône-Alpes, LBBE, Université Lyon 1, Lyon, France
| | - Celine Scornavacca
- ISE-M, Université Montpellier, CNRS, IRD, EPHE, Montpellier, France.,Institut de Biologie Computationnelle (IBC), Montpellier, France
| |
Collapse
|
34
|
Abstract
Most phylogenetic methods are model-based and depend on models of evolution designed to approximate the evolutionary processes. Several methods have been developed to identify suitable models of evolution for phylogenetic analysis of alignments of nucleotide or amino acid sequences and some of these methods are now firmly embedded in the phylogenetic protocol. However, in a disturbingly large number of cases, it appears that these models were used without acknowledgement of their inherent shortcomings. In this chapter, we discuss the problem of model selection and show how some of the inherent shortcomings may be identified and overcome.
Collapse
Affiliation(s)
| | - Vivek Jayaswal
- School of Biomedical Sciences, Queensland University of Technology, Brisbane, QLD, Australia
| | - Faisal M Ababneh
- Department of Mathematics & Statistics, Al-Hussein Bin Talal University, Ma'an, Jordan
| | - John Robinson
- School of Mathematics & Statistics, University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
35
|
Garzón-Ospina D, Forero-Rodríguez J, Patarroyo MA. Evidence of functional divergence in MSP7 paralogous proteins: a molecular-evolutionary and phylogenetic analysis. BMC Evol Biol 2016; 16:256. [PMID: 27894257 PMCID: PMC5126858 DOI: 10.1186/s12862-016-0830-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Accepted: 11/17/2016] [Indexed: 11/10/2022] Open
Abstract
Background The merozoite surface protein 7 (MSP7) is a Plasmodium protein which is involved in parasite invasion; the gene encoding it belongs to a multigene family. It has been proposed that MSP7 paralogues seem to be functionally redundant; however, recent experiments have suggested that they could have different roles. Results The msp7 multigene family has been described in newly available Plasmodium genomes; phylogenetic relationships were established in 12 species by using different molecular evolutionary approaches for assessing functional divergence amongst MSP7 members. Gene expansion and contraction rule msp7 family evolution; however, some members could have had concerted evolution. Molecular evolutionary analysis showed that relaxed and/or intensified selection modulated Plasmodium msp7 paralogous evolution. Furthermore, episodic diversifying selection and changes in evolutionary rates suggested that some paralogous proteins have diverged functionally. Conclusions Even though msp7 has mainly evolved in line with a birth-and-death evolutionary model, gene conversion has taken place between some paralogous genes allowing them to maintain their functional redundancy. On the other hand, the evolutionary rate of some MSP7 paralogs has become altered, as well as undergoing relaxed or intensified (positive) selection, suggesting functional divergence. This could mean that some MSP7s can form different parasite protein complexes and/or recognise different host receptors during parasite invasion. These results highlight the importance of this gene family in the Plasmodium genus. Electronic supplementary material The online version of this article (doi:10.1186/s12862-016-0830-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Diego Garzón-Ospina
- Molecular Biology and Immunology Department, Fundación Instituto de Inmunología de Colombia (FIDIC), Carrera 50#26-20, Bogotá, DC, Colombia.,PhD Programme in Biomedical and Biological Sciences, Universidad del Rosario, Carrera 24#63C-69, Bogotá, DC, Colombia
| | - Johanna Forero-Rodríguez
- Molecular Biology and Immunology Department, Fundación Instituto de Inmunología de Colombia (FIDIC), Carrera 50#26-20, Bogotá, DC, Colombia
| | - Manuel A Patarroyo
- Molecular Biology and Immunology Department, Fundación Instituto de Inmunología de Colombia (FIDIC), Carrera 50#26-20, Bogotá, DC, Colombia. .,School of Medicine and Health Sciences, Universidad del Rosario, Carrera 24#63C-69, Bogotá, DC, Colombia.
| |
Collapse
|
36
|
Khan MA, Mahmudi O, Ullah I, Arvestad L, Lagergren J. Probabilistic inference of lateral gene transfer events. BMC Bioinformatics 2016; 17:431. [PMID: 28185583 PMCID: PMC5123345 DOI: 10.1186/s12859-016-1268-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Background Lateral gene transfer (LGT) is an evolutionary process that has an important role in biology. It challenges the traditional binary tree-like evolution of species and is attracting increasing attention of the molecular biologists due to its involvement in antibiotic resistance. A number of attempts have been made to model LGT in the presence of gene duplication and loss, but reliably placing LGT events in the species tree has remained a challenge. Results In this paper, we propose probabilistic methods that samples reconciliations of the gene tree with a dated species tree and computes maximum a posteriori probabilities. The MCMC-based method uses the probabilistic model DLTRS, that integrates LGT, gene duplication, gene loss, and sequence evolution under a relaxed molecular clock for substitution rates. We can estimate posterior distributions on gene trees and, in contrast to previous work, the actual placement of potential LGT, which can be used to, e.g., identify “highways” of LGT. Conclusions Based on a simulation study, we conclude that the method is able to infer the true LGT events on gene tree and reconcile it to the correct edges on the species tree in most cases. Applied to two biological datasets, containing gene families from Cyanobacteria and Molicutes, we find potential LGTs highways that corroborate other studies as well as previously undetected examples. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1268-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mehmood Alam Khan
- KTH Royal Institute of Technology, School of Computer Science and Communication, Box 1031, Solna, 171 21, Sweden.,Science for Life Laboratory, Box 1031, Solna, 171 21, Sweden
| | - Owais Mahmudi
- KTH Royal Institute of Technology, School of Computer Science and Communication, Box 1031, Solna, 171 21, Sweden.,Science for Life Laboratory, Box 1031, Solna, 171 21, Sweden
| | - Ikram Ullah
- KTH Royal Institute of Technology, School of Computer Science and Communication, Box 1031, Solna, 171 21, Sweden.,Science for Life Laboratory, Box 1031, Solna, 171 21, Sweden
| | - Lars Arvestad
- Science for Life Laboratory, Box 1031, Solna, 171 21, Sweden.,Stockholm University, Dept. of Numerical Analysis and Computer Science, Box 1031, Solna, 171 21, Sweden.,Swedish e-Science Research Centre, Solna, Sweden
| | - Jens Lagergren
- KTH Royal Institute of Technology, School of Computer Science and Communication, Box 1031, Solna, 171 21, Sweden. .,Science for Life Laboratory, Box 1031, Solna, 171 21, Sweden.
| |
Collapse
|
37
|
Lu B, Leong HW. Computational methods for predicting genomic islands in microbial genomes. Comput Struct Biotechnol J 2016; 14:200-6. [PMID: 27293536 PMCID: PMC4887561 DOI: 10.1016/j.csbj.2016.05.001] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2016] [Revised: 05/01/2016] [Accepted: 05/03/2016] [Indexed: 11/02/2022] Open
Abstract
Clusters of genes acquired by lateral gene transfer in microbial genomes, are broadly referred to as genomic islands (GIs). GIs often carry genes important for genome evolution and adaptation to niches, such as genes involved in pathogenesis and antibiotic resistance. Therefore, GI prediction has gradually become an important part of microbial genome analysis. Despite inherent difficulties in identifying GIs, many computational methods have been developed and show good performance. In this mini-review, we first summarize the general challenges in predicting GIs. Then we group existing GI detection methods by their input, briefly describe representative methods in each group, and discuss their advantages as well as limitations. Finally, we look into the potential improvements for better GI prediction.
Collapse
Affiliation(s)
- Bingxin Lu
- Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore 117417, Republic of Singapore
| | - Hon Wai Leong
- Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore 117417, Republic of Singapore
| |
Collapse
|
38
|
Daubin V, Szöllősi GJ. Horizontal Gene Transfer and the History of Life. Cold Spring Harb Perspect Biol 2016; 8:a018036. [PMID: 26801681 DOI: 10.1101/cshperspect.a018036] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Microbes acquire DNA from a variety of sources. The last decades, which have seen the development of genome sequencing, have revealed that horizontal gene transfer has been a major evolutionary force that has constantly reshaped genomes throughout evolution. However, because the history of life must ultimately be deduced from gene phylogenies, the lack of methods to account for horizontal gene transfer has thrown into confusion the very concept of the tree of life. As a result, many questions remain open, but emerging methodological developments promise to use information conveyed by horizontal gene transfer that remains unexploited today.
Collapse
Affiliation(s)
- Vincent Daubin
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, 69000 Lyon, France Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, 69622 Villeurbanne, France
| | | |
Collapse
|
39
|
Abstract
BACKGROUND Discovering the location of gene duplications and multiple gene duplication episodes is a fundamental issue in evolutionary molecular biology. The problem introduced by Guigó et al. in 1996 is to map gene duplication events from a collection of rooted, binary gene family trees onto theirs corresponding rooted binary species tree in such a way that the total number of multiple gene duplication episodes is minimized. There are several models in the literature that specify how gene duplications from gene families can be interpreted as one duplication episode. However, in all duplication episode problems gene trees are rooted. This restriction limits the applicability, since unrooted gene family trees are frequently inferred by phylogenetic methods. RESULTS In this article we show the first solution to the open problem of episode clustering where the input gene family trees are unrooted. In particular, by using theoretical properties of unrooted reconciliation, we show an efficient algorithm that reduces this problem into the episode clustering problems defined for rooted trees. We show theoretical properties of the reduction algorithm and evaluation of empirical datasets. CONCLUSIONS We provided algorithms and tools that were successfully applied to several empirical datasets. In particular, our comparative study shows that we can improve known results on genomic duplication inference from real datasets.
Collapse
Affiliation(s)
- Jarosław Paszek
- University of Warsaw, Institute of Informatics, Banacha 2, Warsaw, 02-097, Poland.
| | - Paweł Górecki
- University of Warsaw, Institute of Informatics, Banacha 2, Warsaw, 02-097, Poland.
| |
Collapse
|
40
|
Semeria M, Tannier E, Guéguen L. Probabilistic modeling of the evolution of gene synteny within reconciled phylogenies. BMC Bioinformatics 2015; 16 Suppl 14:S5. [PMID: 26452018 PMCID: PMC4603630 DOI: 10.1186/1471-2105-16-s14-s5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND Most models of genome evolution concern either genetic sequences, gene content or gene order. They sometimes integrate two of the three levels, but rarely the three of them. Probabilistic models of gene order evolution usually have to assume constant gene content or adopt a presence/absence coding of gene neighborhoods which is blind to complex events modifying gene content. RESULTS We propose a probabilistic evolutionary model for gene neighborhoods, allowing genes to be inserted, duplicated or lost. It uses reconciled phylogenies, which integrate sequence and gene content evolution. We are then able to optimize parameters such as phylogeny branch lengths, or probabilistic laws depicting the diversity of susceptibility of syntenic regions to rearrangements. We reconstruct a structure for ancestral genomes by optimizing a likelihood, keeping track of all evolutionary events at the level of gene content and gene synteny. Ancestral syntenies are associated with a probability of presence.
Collapse
Affiliation(s)
- Magali Semeria
- Laboratoire de Biométrie et Biologie Évolutive UMR CNRS 5558, Université Claude Bernard Lyon 1, 43 boulevard du 11 novembre 1918, 69622 Villeurbanne, France
| | - Eric Tannier
- Laboratoire de Biométrie et Biologie Évolutive UMR CNRS 5558, Université Claude Bernard Lyon 1, 43 boulevard du 11 novembre 1918, 69622 Villeurbanne, France
- INRIA Grenoble Rhône-Alpes, 655 avenue de l'Europe, 38330 Montbonnot, France
| | - Laurent Guéguen
- Laboratoire de Biométrie et Biologie Évolutive UMR CNRS 5558, Université Claude Bernard Lyon 1, 43 boulevard du 11 novembre 1918, 69622 Villeurbanne, France
| |
Collapse
|
41
|
Lartillot N. Probabilistic models of eukaryotic evolution: time for integration. Philos Trans R Soc Lond B Biol Sci 2015; 370:20140338. [PMID: 26323768 PMCID: PMC4571576 DOI: 10.1098/rstb.2014.0338] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/03/2015] [Indexed: 11/12/2022] Open
Abstract
In spite of substantial work and recent progress, a global and fully resolved picture of the macroevolutionary history of eukaryotes is still under construction. This concerns not only the phylogenetic relations among major groups, but also the general characteristics of the underlying macroevolutionary processes, including the patterns of gene family evolution associated with endosymbioses, as well as their impact on the sequence evolutionary process. All these questions raise formidable methodological challenges, calling for a more powerful statistical paradigm. In this direction, model-based probabilistic approaches have played an increasingly important role. In particular, improved models of sequence evolution accounting for heterogeneities across sites and across lineages have led to significant, although insufficient, improvement in phylogenetic accuracy. More recently, one main trend has been to move away from simple parametric models and stepwise approaches, towards integrative models explicitly considering the intricate interplay between multiple levels of macroevolutionary processes. Such integrative models are in their infancy, and their application to the phylogeny of eukaryotes still requires substantial improvement of the underlying models, as well as additional computational developments.
Collapse
Affiliation(s)
- Nicolas Lartillot
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Claude Bernard Lyon 1, F-69622 Villeurbanne Cedex, France
| |
Collapse
|
42
|
Inferring gene duplications, transfers and losses can be done in a discrete framework. J Math Biol 2015; 72:1811-44. [PMID: 26337177 DOI: 10.1007/s00285-015-0930-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Revised: 05/20/2015] [Indexed: 10/23/2022]
Abstract
In the field of phylogenetics, the evolutionary history of a set of organisms is commonly depicted by a species tree-whose internal nodes represent speciation events-while the evolutionary history of a gene family is depicted by a gene tree-whose internal nodes can also represent macro-evolutionary events such as gene duplications and transfers. As speciation events are only part of the events shaping a gene history, the topology of a gene tree can show incongruences with that of the corresponding species tree. These incongruences can be used to infer the macro-evolutionary events undergone by the gene family. This is done by embedding the gene tree inside the species tree and hence providing a reconciliation of those trees. In the past decade, several parsimony-based methods have been developed to infer such reconciliations, accounting for gene duplications ([Formula: see text]), transfers ([Formula: see text]) and losses ([Formula: see text]). The main contribution of this paper is to formally prove an important assumption implicitly made by previous works on these reconciliations, namely that solving the (maximum) parsimony [Formula: see text] reconciliation problem in the discrete framework is equivalent to finding a most parsimonious [Formula: see text] scenario in the continuous framework. In the process, we also prove several intermediate results that are useful on their own and constitute a theoretical toolbox that will likely facilitate future theoretical contributions in the field.
Collapse
|
43
|
Davis CC, Xi Z. Horizontal gene transfer in parasitic plants. CURRENT OPINION IN PLANT BIOLOGY 2015; 26:14-19. [PMID: 26051213 DOI: 10.1016/j.pbi.2015.05.008] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Revised: 05/08/2015] [Accepted: 05/12/2015] [Indexed: 06/04/2023]
Abstract
Horizontal gene transfer (HGT) between species has been a major focus of plant evolutionary research during the past decade. Parasitic plants, which establish a direct connection with their hosts, have provided excellent examples of how these transfers are facilitated via the intimacy of this symbiosis. In particular, phylogenetic studies from diverse clades indicate that parasitic plants represent a rich system for studying this phenomenon. Here, HGT has been shown to be astonishingly high in the mitochondrial genome, and appreciable in the nuclear genome. Although explicit tests remain to be performed, some transgenes have been hypothesized to be functional in their recipient species, thus providing a new perspective on the evolution of novelty in parasitic plants.
Collapse
Affiliation(s)
- Charles C Davis
- Department of Organismic and Evolutionary Biology, Harvard University, 22 Divinity Avenue, Cambridge, MA 02138, USA.
| | - Zhenxiang Xi
- Department of Organismic and Evolutionary Biology, Harvard University, 22 Divinity Avenue, Cambridge, MA 02138, USA
| |
Collapse
|
44
|
Garushyants SK, Kazanov MD, Gelfand MS. Horizontal gene transfer and genome evolution in Methanosarcina. BMC Evol Biol 2015; 15:102. [PMID: 26044078 PMCID: PMC4455057 DOI: 10.1186/s12862-015-0393-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Accepted: 05/29/2015] [Indexed: 12/29/2022] Open
Abstract
Background Genomes of Methanosarcina spp. are among the largest archaeal genomes. One suggested reason for that is massive horizontal gene transfer (HGT) from bacteria. Genes of bacterial origin may be involved in the central metabolism and solute transport, in particular sugar synthesis, sulfur metabolism, phosphate metabolism, DNA repair, transport of small molecules etc. Horizontally transferred (HT) genes are considered to play the key role in the ability of Methanosarcina spp. to inhabit diverse environments. At the moment, genomes of three Methanosarcina spp. have been sequenced, and while these genomes vary in length and number of protein-coding genes, they all have been shown to accumulate HT genes. However, previous estimates had been made when fewer archaeal genomes were known. Moreover, several Methanosarcinaceae genomes from other genera have been sequenced recently. Here, we revise the census of genes of bacterial origin in Methanosarcinaceae. Results About 5 % of Methanosarcina genes have been shown to be horizontally transferred from various bacterial groups to the last common ancestor either of Methanosarcinaceae, or Methanosarcina, or later in the evolution. Simulation of the composition of the NCBI protein non-redundant database for different years demonstrates that the estimates of the HGT rate have decreased drastically since 2002, the year of publication of the first Methanosarcina genome. The phylogenetic distribution of HT gene donors is non-uniform. Most HT genes were transferred from Firmicutes and Proteobacteria, while no HGT events from Actinobacteria to the common ancestor of Methanosarcinaceae were found. About 50 % of HT genes are involved in metabolism. Horizontal transfer of transcription factors is not common, while 46 % of horizontally transferred genes have demonstrated differential expression in a variety of conditions. HGT of complete operons is relatively infrequent and half of HT genes do not belong to operons. Conclusions While genes of bacterial origin are still more frequent in Methanosarcinaceae than in other Archaea, most HGT events described earlier as Methanosarcina-specific seem to have occurred before the divergence of Methanosarcinaceae. Genes horizontally transferred from bacteria to archaea neither tend to be transferred with their regulators, nor in long operons. Electronic supplementary material The online version of this article (doi:10.1186/s12862-015-0393-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sofya K Garushyants
- A.A. Kharkevich Institute for Information Transmission Problems, RAS, Bolshoi Karetny per. 19, build.1, Moscow, 127051, Russia.
| | - Marat D Kazanov
- A.A. Kharkevich Institute for Information Transmission Problems, RAS, Bolshoi Karetny per. 19, build.1, Moscow, 127051, Russia.
| | - Mikhail S Gelfand
- A.A. Kharkevich Institute for Information Transmission Problems, RAS, Bolshoi Karetny per. 19, build.1, Moscow, 127051, Russia. .,Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Vorobievy Gory 1-73, Moscow, 119991, Russia.
| |
Collapse
|
45
|
Abstract
Horizontal or Lateral Gene Transfer (HGT or LGT) is the transmission of portions of genomic DNA between organisms through a process decoupled from vertical inheritance. In the presence of HGT events, different fragments of the genome are the result of different evolutionary histories. This can therefore complicate the investigations of evolutionary relatedness of lineages and species. Also, as HGT can bring into genomes radically different genotypes from distant lineages, or even new genes bearing new functions, it is a major source of phenotypic innovation and a mechanism of niche adaptation. For example, of particular relevance to human health is the lateral transfer of antibiotic resistance and pathogenicity determinants, leading to the emergence of pathogenic lineages. Computational identification of HGT events relies upon the investigation of sequence composition or evolutionary history of genes. Sequence composition-based ("parametric") methods search for deviations from the genomic average, whereas evolutionary history-based ("phylogenetic") approaches identify genes whose evolutionary history significantly differs from that of the host species. The evaluation and benchmarking of HGT inference methods typically rely upon simulated genomes, for which the true history is known. On real data, different methods tend to infer different HGT events, and as a result it can be difficult to ascertain all but simple and clear-cut HGT events.
Collapse
Affiliation(s)
| | - Nives Škunca
- ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Zurich, Switzerland
| | | | - Christophe Dessimoz
- University College London, London, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| |
Collapse
|
46
|
Yan S, Wu G. Large-scale evolutionary analyses on SecB subunits of bacterial sec system. PLoS One 2015; 10:e0120417. [PMID: 25775430 PMCID: PMC4361572 DOI: 10.1371/journal.pone.0120417] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2014] [Accepted: 01/21/2015] [Indexed: 01/10/2023] Open
Abstract
Protein secretion systems are extremely important in bacteria because they are involved in many fundamental cellular processes. Of the various secretion systems, the Sec system is composed of seven different subunits in bacteria, and subunit SecB brings secreted preproteins to subunit SecA, which with SecYEG and SecDF forms a complex for the translocation of secreted preproteins through the inner membrane. Because of the wide existence of Sec system across bacteria, eukaryota, and archaea, each subunit of the Sec system has a complicated evolutionary relationship. Until very recently, 5,162 SecB sequences have been documented in UniProtKB, however no phylogenetic study has been conducted on a large sampling of SecBs from bacterial Sec secretion system, and no statistical study has been conducted on such size of SecBs in order to exhaustively investigate their variances of pairwise p-distance along taxonomic lineage from kingdom to phylum, to class, to order, to family, to genus and to organism. To fill in these knowledge gaps, 3,813 bacterial SecB sequences with full taxonomic lineage from kingdom to organism covering 4 phyla, 11 classes, 41 orders, 82 families, 269 genera, and 3,744 organisms were studied. Phylogenetic analysis revealed how the SecBs evolved without compromising their function with examples of 3-D structure comparison of two SecBs from Proteobacteria, and possible factors that affected the SecB evolution were considered. The average pairwise p-distances showed that the variance varied greatly in each taxonomic group. Finally, the variance was further partitioned into inter- and intra-clan variances, which could correspond to vertical and horizontal gene transfers, with relevance for Achromobacter, Brevundimonas, Ochrobactrum, and Pseudoxanthomonas.
Collapse
Affiliation(s)
- Shaomin Yan
- State Key Laboratory of Non-food Biomass Enzyme Technology, National Engineering Research Center for Non-food Biorefinery, Guangxi Biomass Industrialization Engineering Institute, Guangxi Key Laboratory of Biorefinery, Guangxi Academy of Sciences, 98 Daling Road, Nanning, Guangxi, 530007, China
| | - Guang Wu
- State Key Laboratory of Non-food Biomass Enzyme Technology, National Engineering Research Center for Non-food Biorefinery, Guangxi Biomass Industrialization Engineering Institute, Guangxi Key Laboratory of Biorefinery, Guangxi Academy of Sciences, 98 Daling Road, Nanning, Guangxi, 530007, China
- * E-mail:
| |
Collapse
|
47
|
A role for Tn6029 in the evolution of the complex antibiotic resistance gene loci in genomic island 3 in enteroaggregative hemorrhagic Escherichia coli O104:H4. PLoS One 2015; 10:e0115781. [PMID: 25675217 PMCID: PMC4326458 DOI: 10.1371/journal.pone.0115781] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Accepted: 12/01/2014] [Indexed: 12/25/2022] Open
Abstract
In enteroaggregative hemorrhagic Escherichia coli (EAHEC) O104 the complex antibiotic resistance gene loci (CRL) found in the region of divergence 1 (RD1) within E. coli genomic island 3 (GI3) contains blaTEM-1, strAB, sul2, tet(A)A, and dfrA7 genes encoding resistance to ampicillin, streptomycin, sulfamethoxazole, tetracycline and trimethoprim respectively. The precise arrangement of antibiotic resistance genes and the role of mobile elements that drove the evolutionary events and created the CRL have not been investigated. We used a combination of bioinformatics and iterative BLASTn searches to determine the micro-evolutionary events that likely led to the formation of the CRL in GI3 using the closed genome sequences of EAHEC O104:H4 strains 2011C-3493 and 2009EL-2050 and high quality draft genomes of EAHEC E. coli O104:H4 isolates from sporadic cases not associated with the initial outbreak. Our analyses indicate that the CRL in GI3 evolved from a progenitor structure that contained an In2-derived class 1 integron in a Tn21/Tn1721 hybrid backbone. Within the hybrid backbone, a Tn6029-family transposon, identified here as Tn6029C abuts the sul1 gene in the 3´-Conserved Segment (-CS) of a class 1 integron generating a unique molecular signature that has only previously been observed in pASL01a, a small plasmid found in commensal E. coli in West Africa. From this common progenitor, independent IS26-mediated events created two novel transposons identified here as Tn6029D and Tn6222 in 2011C-3493 and 2009EL-2050 respectively. Analysis of RD1 within GI3 reveals IS26 has played a crucial role in the assembly of regions within the CRL.
Collapse
|
48
|
De Baets K, Littlewood DTJ. The Importance of Fossils in Understanding the Evolution of Parasites and Their Vectors. ADVANCES IN PARASITOLOGY 2015; 90:1-51. [PMID: 26597064 DOI: 10.1016/bs.apar.2015.07.001] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Knowledge concerning the diversity of parasitism and its reach across our current understanding of the tree of life has benefitted considerably from novel molecular phylogenetic methods. However, the timing of events and the resolution of the nature of the intimate relationships between parasites and their hosts in deep time remain problematic. Despite its vagaries, the fossil record provides the only direct evidence of parasites and parasitism in the fossil record of extant and extinct lineages. Here, we demonstrate the potential of the fossil record and other lines of geological evidence to calibrate the origin and evolution of parasitism by combining different kinds of dating evidence with novel molecular clock methodologies. Other novel methods promise to provide additional evidence for the presence or the life habit of pathogens and their vectors, including the discovery and analysis of ancient DNA and other biomolecules, as well as computed tomographic methods.
Collapse
|
49
|
Abstract
Motivation: Traditionally, gene phylogenies have been reconstructed solely on the basis of molecular sequences; this, however, often does not provide enough information to distinguish between statistically equivalent relationships. To address this problem, several recent methods have incorporated information on the species phylogeny in gene tree reconstruction, leading to dramatic improvements in accuracy. Although probabilistic methods are able to estimate all model parameters but are computationally expensive, parsimony methods—generally computationally more efficient—require a prior estimate of parameters and of the statistical support. Results: Here, we present the Tree Estimation using Reconciliation (TERA) algorithm, a parsimony based, species tree aware method for gene tree reconstruction based on a scoring scheme combining duplication, transfer and loss costs with an estimate of the sequence likelihood. TERA explores all reconciled gene trees that can be amalgamated from a sample of gene trees. Using a large scale simulated dataset, we demonstrate that TERA achieves the same accuracy as the corresponding probabilistic method while being faster, and outperforms other parsimony-based methods in both accuracy and speed. Running TERA on a set of 1099 homologous gene families from complete cyanobacterial genomes, we find that incorporating knowledge of the species tree results in a two thirds reduction in the number of apparent transfer events. Availability and implementation: The algorithm is implemented in our program TERA, which is freely available from http://mbb.univ-montp2.fr/MBB/download_sources/16__TERA. Contact:celine.scornavacca@univ-montp2.fr, ssolo@angel.elte.hu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Celine Scornavacca
- ISEM, UM2-CNRS-IRD, Place Eugène Bataillon 34095 Montpellier, France, Institut de Biologie Computationnelle (IBC), 95 rue de la Galéra, 34095 Montpellier, France and ELTE-MTA 'Lendület' Biophysics Research Group 1117 Bp., Pázmány P. stny. 1A., Budapest, Hungary ISEM, UM2-CNRS-IRD, Place Eugène Bataillon 34095 Montpellier, France, Institut de Biologie Computationnelle (IBC), 95 rue de la Galéra, 34095 Montpellier, France and ELTE-MTA 'Lendület' Biophysics Research Group 1117 Bp., Pázmány P. stny. 1A., Budapest, Hungary
| | - Edwin Jacox
- ISEM, UM2-CNRS-IRD, Place Eugène Bataillon 34095 Montpellier, France, Institut de Biologie Computationnelle (IBC), 95 rue de la Galéra, 34095 Montpellier, France and ELTE-MTA 'Lendület' Biophysics Research Group 1117 Bp., Pázmány P. stny. 1A., Budapest, Hungary
| | - Gergely J Szöllősi
- ISEM, UM2-CNRS-IRD, Place Eugène Bataillon 34095 Montpellier, France, Institut de Biologie Computationnelle (IBC), 95 rue de la Galéra, 34095 Montpellier, France and ELTE-MTA 'Lendület' Biophysics Research Group 1117 Bp., Pázmány P. stny. 1A., Budapest, Hungary
| |
Collapse
|