1
|
Exploration of space to achieve scientific breakthroughs. Biotechnol Adv 2020; 43:107572. [PMID: 32540473 DOI: 10.1016/j.biotechadv.2020.107572] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 05/05/2020] [Accepted: 05/29/2020] [Indexed: 12/13/2022]
Abstract
Living organisms adapt to changing environments using their amazing flexibility to remodel themselves by a process called evolution. Environmental stress causes selective pressure and is associated with genetic and phenotypic shifts for better modifications, maintenance, and functioning of organismal systems. The natural evolution process can be used in complement to rational strain engineering for the development of desired traits or phenotypes as well as for the production of novel biomaterials through the imposition of one or more selective pressures. Space provides a unique environment of stressors (e.g., weightlessness and high radiation) that organisms have never experienced on Earth. Cells in the outer space reorganize and develop or activate a range of molecular responses that lead to changes in cellular properties. Exposure of cells to the outer space will lead to the development of novel variants more efficiently than on Earth. For instance, natural crop varieties can be generated with higher nutrition value, yield, and improved features, such as resistance against high and low temperatures, salt stress, and microbial and pest attacks. The review summarizes the literature on the parameters of outer space that affect the growth and behavior of cells and organisms as well as complex colloidal systems. We illustrate an understanding of gravity-related basic biological mechanisms and enlighten the possibility to explore the outer space environment for application-oriented aspects. This will stimulate biological research in the pursuit of innovative approaches for the future of agriculture and health on Earth.
Collapse
|
2
|
Avram O, Rapoport D, Portugez S, Pupko T. M1CR0B1AL1Z3R-a user-friendly web server for the analysis of large-scale microbial genomics data. Nucleic Acids Res 2019; 47:W88-W92. [PMID: 31114912 PMCID: PMC6602433 DOI: 10.1093/nar/gkz423] [Citation(s) in RCA: 84] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 04/29/2019] [Accepted: 05/06/2019] [Indexed: 11/21/2022] Open
Abstract
Large-scale mining and analysis of bacterial datasets contribute to the comprehensive characterization of complex microbial dynamics within a microbiome and among different bacterial strains, e.g., during disease outbreaks. The study of large-scale bacterial evolutionary dynamics poses many challenges. These include data-mining steps, such as gene annotation, ortholog detection, sequence alignment and phylogeny reconstruction. These steps require the use of multiple bioinformatics tools and ad-hoc programming scripts, making the entire process cumbersome, tedious and error-prone due to manual handling. This motivated us to develop the M1CR0B1AL1Z3R web server, a 'one-stop shop' for conducting microbial genomics data analyses via a simple graphical user interface. Some of the features implemented in M1CR0B1AL1Z3R are: (i) extracting putative open reading frames and comparative genomics analysis of gene content; (ii) extracting orthologous sets and analyzing their size distribution; (iii) analyzing gene presence-absence patterns; (iv) reconstructing a phylogenetic tree based on the extracted orthologous set; (v) inferring GC-content variation among lineages. M1CR0B1AL1Z3R facilitates the mining and analysis of dozens of bacterial genomes using advanced techniques, with the click of a button. M1CR0B1AL1Z3R is freely available at https://microbializer.tau.ac.il/.
Collapse
Affiliation(s)
- Oren Avram
- The School of Molecular Cell Biology & Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dana Rapoport
- The School of Molecular Cell Biology & Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Shir Portugez
- The School of Molecular Cell Biology & Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The School of Molecular Cell Biology & Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
3
|
Shiers N, Zwiernik P, Aston JAD, Smith JQ. The correlation space of Gaussian latent tree models and model selection without fitting. Biometrika 2016. [DOI: 10.1093/biomet/asw032] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
4
|
Acosta S, Carela M, Garcia-Gonzalez A, Gines M, Vicens L, Cruet R, Massey SE. DNA Repair Is Associated with Information Content in Bacteria, Archaea, and DNA Viruses. J Hered 2015; 106:644-59. [PMID: 26320243 DOI: 10.1093/jhered/esv055] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2015] [Accepted: 07/07/2015] [Indexed: 11/13/2022] Open
Abstract
The concept of a "proteomic constraint" proposes that DNA repair capacity is positively correlated with the information content of a genome, which can be approximated to the size of the proteome (P). This in turn implies that DNA repair genes are more likely to be present in genomes with larger values of P. This stands in contrast to the common assumption that informational genes have a core function and so are evenly distributed across organisms. We examined the presence/absence of 18 DNA repair genes in bacterial genomes. A positive relationship between gene presence and P was observed for 17 genes in the total dataset, and 16 genes when only nonintracellular bacteria were examined. A marked reduction of DNA repair genes was observed in intracellular bacteria, consistent with their reduced value of P. We also examined archaeal and DNA virus genomes, and show that the presence of DNA repair genes is likewise related to a larger value of P. In addition, the products of the bacterial genes mutY, vsr, and ndk, involved in the correction of GC/AT mutations, are strongly associated with reduced genome GC content. We therefore propose that a reduction in information content leads to a loss of DNA repair genes and indirectly to a reduction in genome GC content in bacteria by exposure to the underlying AT mutation bias. The reduction in P may also indirectly lead to the increase in substitution rates observed in intracellular bacteria via loss of DNA repair genes.
Collapse
Affiliation(s)
- Sharlene Acosta
- From the Department of Biology, University of Puerto Rico-Rio Piedras, PO Box 23360, San Juan 00931, Puerto Rico (Acosta, Carela, Garcia-Gonzalez, Gines, Vicens, Cruet, and Massey)
| | - Miguelina Carela
- From the Department of Biology, University of Puerto Rico-Rio Piedras, PO Box 23360, San Juan 00931, Puerto Rico (Acosta, Carela, Garcia-Gonzalez, Gines, Vicens, Cruet, and Massey)
| | - Aurian Garcia-Gonzalez
- From the Department of Biology, University of Puerto Rico-Rio Piedras, PO Box 23360, San Juan 00931, Puerto Rico (Acosta, Carela, Garcia-Gonzalez, Gines, Vicens, Cruet, and Massey)
| | - Mariela Gines
- From the Department of Biology, University of Puerto Rico-Rio Piedras, PO Box 23360, San Juan 00931, Puerto Rico (Acosta, Carela, Garcia-Gonzalez, Gines, Vicens, Cruet, and Massey)
| | - Luis Vicens
- From the Department of Biology, University of Puerto Rico-Rio Piedras, PO Box 23360, San Juan 00931, Puerto Rico (Acosta, Carela, Garcia-Gonzalez, Gines, Vicens, Cruet, and Massey)
| | - Ricardo Cruet
- From the Department of Biology, University of Puerto Rico-Rio Piedras, PO Box 23360, San Juan 00931, Puerto Rico (Acosta, Carela, Garcia-Gonzalez, Gines, Vicens, Cruet, and Massey)
| | - Steven E Massey
- From the Department of Biology, University of Puerto Rico-Rio Piedras, PO Box 23360, San Juan 00931, Puerto Rico (Acosta, Carela, Garcia-Gonzalez, Gines, Vicens, Cruet, and Massey).
| |
Collapse
|
5
|
Abstract
Horizontal or Lateral Gene Transfer (HGT or LGT) is the transmission of portions of genomic DNA between organisms through a process decoupled from vertical inheritance. In the presence of HGT events, different fragments of the genome are the result of different evolutionary histories. This can therefore complicate the investigations of evolutionary relatedness of lineages and species. Also, as HGT can bring into genomes radically different genotypes from distant lineages, or even new genes bearing new functions, it is a major source of phenotypic innovation and a mechanism of niche adaptation. For example, of particular relevance to human health is the lateral transfer of antibiotic resistance and pathogenicity determinants, leading to the emergence of pathogenic lineages. Computational identification of HGT events relies upon the investigation of sequence composition or evolutionary history of genes. Sequence composition-based ("parametric") methods search for deviations from the genomic average, whereas evolutionary history-based ("phylogenetic") approaches identify genes whose evolutionary history significantly differs from that of the host species. The evaluation and benchmarking of HGT inference methods typically rely upon simulated genomes, for which the true history is known. On real data, different methods tend to infer different HGT events, and as a result it can be difficult to ascertain all but simple and clear-cut HGT events.
Collapse
Affiliation(s)
| | - Nives Škunca
- ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Zurich, Switzerland
| | | | - Christophe Dessimoz
- University College London, London, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| |
Collapse
|
6
|
Kim T, Hao W. DiscML: an R package for estimating evolutionary rates of discrete characters using maximum likelihood. BMC Bioinformatics 2014; 15:320. [PMID: 25260628 PMCID: PMC4261585 DOI: 10.1186/1471-2105-15-320] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2014] [Accepted: 09/25/2014] [Indexed: 11/17/2022] Open
Abstract
Background The study of discrete characters is crucial for the understanding of evolutionary processes. Even though great advances have been made in the analysis of nucleotide sequences, computer programs for non-DNA discrete characters are often dedicated to specific analyses and lack flexibility. Discrete characters often have different transition rate matrices, variable rates among sites and sometimes contain unobservable states. To obtain the ability to accurately estimate a variety of discrete characters, programs with sophisticated methodologies and flexible settings are desired. Results DiscML performs maximum likelihood estimation for evolutionary rates of discrete characters on a provided phylogeny with the options that correct for unobservable data, rate variations, and unknown prior root probabilities from the empirical data. It gives users options to customize the instantaneous transition rate matrices, or to choose pre-determined matrices from models such as birth-and-death (BD), birth-death-and-innovation (BDI), equal rates (ER), symmetric (SYM), general time-reversible (GTR) and all rates different (ARD). Moreover, we show application examples of DiscML on gene family data and on intron presence/absence data. Conclusion DiscML was developed as a unified R program for estimating evolutionary rates of discrete characters with no restriction on the number of character states, and with flexibility to use different transition models. DiscML is ideal for the analyses of binary (1s/0s) patterns, multi-gene families, and multistate discrete morphological characteristics. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-320) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Weilong Hao
- Department of Biological Sciences, Wayne State University, 48202 Detroit, USA.
| |
Collapse
|
7
|
Lang JM, Darling AE, Eisen JA. Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices. PLoS One 2013; 8:e62510. [PMID: 23638103 PMCID: PMC3636077 DOI: 10.1371/journal.pone.0062510] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2012] [Accepted: 03/26/2013] [Indexed: 11/29/2022] Open
Abstract
Over 3000 microbial (bacterial and archaeal) genomes have been made publically available to date, providing an unprecedented opportunity to examine evolutionary genomic trends and offering valuable reference data for a variety of other studies such as metagenomics. The utility of these genome sequences is greatly enhanced when we have an understanding of how they are phylogenetically related to each other. Therefore, we here describe our efforts to reconstruct the phylogeny of all available bacterial and archaeal genomes. We identified 24, single-copy, ubiquitous genes suitable for this phylogenetic analysis. We used two approaches to combine the data for the 24 genes. First, we concatenated alignments of all genes into a single alignment from which a Maximum Likelihood (ML) tree was inferred using RAxML. Second, we used a relatively new approach to combining gene data, Bayesian Concordance Analysis (BCA), as implemented in the BUCKy software, in which the results of 24 single-gene phylogenetic analyses are used to generate a "primary concordance" tree. A comparison of the concatenated ML tree and the primary concordance (BUCKy) tree reveals that the two approaches give similar results, relative to a phylogenetic tree inferred from the 16S rRNA gene. After comparing the results and the methods used, we conclude that the current best approach for generating a single phylogenetic tree, suitable for use as a reference phylogeny for comparative analyses, is to perform a maximum likelihood analysis of a concatenated alignment of conserved, single-copy genes.
Collapse
Affiliation(s)
- Jenna Morgan Lang
- Department of Medical Microbiology and Immunology and Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America
- Department of Energy Joint Genome Institute, Walnut Creek, California, United States of America
| | - Aaron E. Darling
- Department of Medical Microbiology and Immunology and Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America
| | - Jonathan A. Eisen
- Department of Medical Microbiology and Immunology and Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America
- Department of Energy Joint Genome Institute, Walnut Creek, California, United States of America
| |
Collapse
|
8
|
Eveleigh RJ, Meehan CJ, Archibald JM, Beiko RG. Being Aquifex aeolicus: Untangling a hyperthermophile's checkered past. Genome Biol Evol 2013; 5:2478-97. [PMID: 24281050 PMCID: PMC3879981 DOI: 10.1093/gbe/evt195] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/22/2013] [Indexed: 12/20/2022] Open
Abstract
Lateral gene transfer (LGT) is an important factor contributing to the evolution of prokaryotic genomes. The Aquificae are a hyperthermophilic bacterial group whose genes show affiliations to many other lineages, including the hyperthermophilic Thermotogae, the Proteobacteria, and the Archaea. Previous phylogenomic analyses focused on Aquifex aeolicus identified Thermotogae and Aquificae either as successive early branches or sisters in a rooted bacterial phylogeny, but many phylogenies and cellular traits have suggested a stronger affiliation with the Epsilonproteobacteria. Different scenarios for the evolution of the Aquificae yield different phylogenetic predictions. Here, we outline these scenarios and consider the fit of the available data, including three sequenced Aquificae genomes, to different sets of predictions. Evidence from phylogenetic profiles and trees suggests that the Epsilonproteobacteria have the strongest affinities with the three Aquificae analyzed. However, this pattern is shown by only a minority of encoded proteins, and the Archaea, many lineages of thermophilic bacteria, and members of genus Clostridium and class Deltaproteobacteria also show strong connections to the Aquificae. The phylogenetic affiliations of different functional subsystems showed strong biases: Most but not all genes implicated in the core translational apparatus tended to group Aquificae with Thermotogae, whereas a wide range of metabolic and cellular processes strongly supported the link between Aquificae and Epsilonproteobacteria. Depending on which sets of genes are privileged, either Thermotogae or Epsilonproteobacteria is the most plausible adjacent lineage to the Aquificae. Both scenarios require massive sharing of genes to explain the history of this enigmatic group, whose history is further complicated by specific affinities of different members of Aquificae to different partner lineages.
Collapse
Affiliation(s)
- Robert J.M. Eveleigh
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
- Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Conor J. Meehan
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
- Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada
| | - John M. Archibald
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Robert G. Beiko
- Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
9
|
Merhej V, Raoult D. Rhizome of life, catastrophes, sequence exchanges, gene creations, and giant viruses: how microbial genomics challenges Darwin. Front Cell Infect Microbiol 2012; 2:113. [PMID: 22973559 PMCID: PMC3428605 DOI: 10.3389/fcimb.2012.00113] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2012] [Accepted: 08/06/2012] [Indexed: 11/29/2022] Open
Abstract
Darwin's theory about the evolution of species has been the object of considerable dispute. In this review, we have described seven key principles in Darwin's book The Origin of Species and tried to present how genomics challenge each of these concepts and improve our knowledge about evolution. Darwin believed that species evolution consists on a positive directional selection ensuring the “survival of the fittest.” The most developed state of the species is characterized by increasing complexity. Darwin proposed the theory of “descent with modification” according to which all species evolve from a single common ancestor through a gradual process of small modification of their vertical inheritance. Finally, the process of evolution can be depicted in the form of a tree. However, microbial genomics showed that evolution is better described as the “biological changes over time.” The mode of change is not unidirectional and does not necessarily favors advantageous mutations to increase fitness it is rather subject to random selection as a result of catastrophic stochastic processes. Complexity is not necessarily the completion of development: several complex organisms have gone extinct and many microbes including bacteria with intracellular lifestyle have streamlined highly effective genomes. Genomes evolve through large events of gene deletions, duplications, insertions, and genomes rearrangements rather than a gradual adaptative process. Genomes are dynamic and chimeric entities with gene repertoires that result from vertical and horizontal acquisitions as well as de novo gene creation. The chimeric character of microbial genomes excludes the possibility of finding a single common ancestor for all the genes recorded currently. Genomes are collections of genes with different evolutionary histories that cannot be represented by a single tree of life (TOL). A forest, a network or a rhizome of life may be more accurate to represent evolutionary relationships among species.
Collapse
Affiliation(s)
- Vicky Merhej
- URMITE, UM63, CNRS 7278, IRD 198, INSERM U1095, Aix Marseille Université Marseille, France
| | | |
Collapse
|
10
|
Meinel T, Krause A. Meta-analysis of general bacterial subclades in whole-genome phylogenies using tree topology profiling. Evol Bioinform Online 2012; 8:489-525. [PMID: 22915837 PMCID: PMC3422217 DOI: 10.4137/ebo.s9642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
In the last two decades, a large number of whole-genome phylogenies have been inferred to reconstruct the Tree of Life (ToL). Underlying data models range from gene or functionality content in species to phylogenetic gene family trees and multiple sequence alignments of concatenated protein sequences. Diversity in data models together with the use of different tree reconstruction techniques, disruptive biological effects and the steadily increasing number of genomes have led to a huge diversity in published phylogenies. Comparison of those and, moreover, identification of the impact of inference properties (underlying data model, inference technique) on particular reconstructions is almost impossible. In this work, we introduce tree topology profiling as a method to compare already published whole-genome phylogenies. This method requires visual determination of the particular topology in a drawn whole-genome phylogeny for a set of particular bacterial clans. For each clan, neighborhoods to other bacteria are collected into a catalogue of generalized alternative topologies. Particular topology alternatives found for an ordered list of bacterial clans reveal a topology profile that represents the analyzed phylogeny. To simulate the inhomogeneity of published gene content phylogenies we generate a set of seven phylogenies using different inference techniques and the SYSTERS-PhyloMatrix data model. After tree topology profiling on in total 54 selected published and newly inferred phylogenies, we separate artefactual from biologically meaningful phylogenies and associate particular inference results (phylogenies) with inference background (inference techniques as well as data models). Topological relationships of particular bacterial species groups are presented. With this work we introduce tree topology profiling into the scientific field of comparative phylogenomics.
Collapse
Affiliation(s)
- Thomas Meinel
- Charité-University Medicine Berlin, Institute for Physiology, Structural Bioinformatics Group, Thielallee 71, 14195 Berlin, Germany
| | | |
Collapse
|
11
|
Georgiades K, Raoult D. How microbiology helps define the rhizome of life. Front Cell Infect Microbiol 2012; 2:60. [PMID: 22919651 PMCID: PMC3417629 DOI: 10.3389/fcimb.2012.00060] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2012] [Accepted: 04/16/2012] [Indexed: 01/24/2023] Open
Abstract
In contrast to the tree of life (TOF) theory, species are mosaics of gene sequences with different origins. Observations of the extensive lateral sequence transfers in all organisms have demonstrated that the genomes of all life forms are collections of genes with different evolutionary histories that cannot be represented by a single TOF. Moreover, genes themselves commonly have several origins due to recombination. The human genome is not free from recombination events, so it is a mosaic like other organisms' genomes. Recent studies have demonstrated evidence for the integration of parasitic DNA into the human genome. Lateral transfer events have been accepted as major contributors of genome evolution in free-living bacteria. Furthermore, the accumulation of genomic sequence data provides evidence for extended genetic exchanges in intracellular bacteria and suggests that such events constitute an agent that promotes and maintains all bacterial species. Archaea and viruses also form chimeras containing primarily bacterial but also eukaryotic sequences. In addition to lateral transfers, orphan genes are indicative of the fact that gene creation is a permanent and unsettled phenomenon. Currently, a rhizome may more adequately represent the multiplicity and de novo creation of a genome. We wanted to confirm that the term “rhizome” in evolutionary biology applies to the entire cellular life history. This view of evolution should resemble a clump of roots representing the multiple origins of the repertoires of the genes of each species.
Collapse
Affiliation(s)
- Kalliopi Georgiades
- Faculté de Médecine La Timone, Unité de Recherche en Maladies Infectieuses Tropical Emergentes (URMITE), CNRS-IRD UMR 6236-198, Université de la Méditerranée Marseille, France
| | | |
Collapse
|
12
|
Cohen O, Pupko T. Inference of gain and loss events from phyletic patterns using stochastic mapping and maximum parsimony--a simulation study. Genome Biol Evol 2011; 3:1265-75. [PMID: 21971516 PMCID: PMC3215202 DOI: 10.1093/gbe/evr101] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/27/2011] [Indexed: 12/26/2022] Open
Abstract
Bacterial evolution is characterized by frequent gain and loss events of gene families. These events can be inferred from phyletic pattern data-a compact representation of gene family repertoire across multiple genomes. The maximum parsimony paradigm is a classical and prevalent approach for the detection of gene family gains and losses mapped on specific branches. We and others have previously developed probabilistic models that aim to account for the gain and loss stochastic dynamics. These models are a critical component of a methodology termed stochastic mapping, in which probabilities and expectations of gain and loss events are estimated for each branch of an underlying phylogenetic tree. In this work, we present a phyletic pattern simulator in which the gain and loss dynamics are assumed to follow a continuous-time Markov chain along the tree. Various models and options are implemented to make the simulation software useful for a large number of studies in which binary (presence/absence) data are analyzed. Using this simulation software, we compared the ability of the maximum parsimony and the stochastic mapping approaches to accurately detect gain and loss events along the tree. Our simulations cover a large array of evolutionary scenarios in terms of the propensities for gene family gains and losses and the variability of these propensities among gene families. Although in all simulation schemes, both methods obtain relatively low levels of false positive rates, stochastic mapping outperforms maximum parsimony in terms of true positive rates. We further studied the factors that influence the performance of both methods. We find, for example, that the accuracy of maximum parsimony inference is substantially reduced when the goal is to map gain and loss events along internal branches of the phylogenetic tree. Furthermore, the accuracy of stochastic mapping is reduced with smaller data sets (limited number of gene families) due to unreliable estimation of branch lengths. Our simulator and simulation results are additionally relevant for the analysis of other types of binary-coded data, such as the existence of homologues restriction sites, gaps, and introns, to name a few. Both the simulation software and the inference methodology are freely available at a user-friendly server: http://gloome.tau.ac.il/.
Collapse
Affiliation(s)
- Ofir Cohen
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
- National Evolutionary Synthesis Center, Durham, North Carolina
| |
Collapse
|
13
|
Cohen O, Gophna U, Pupko T. The Complexity Hypothesis Revisited: Connectivity Rather Than Function Constitutes a Barrier to Horizontal Gene Transfer. Mol Biol Evol 2010; 28:1481-9. [DOI: 10.1093/molbev/msq333] [Citation(s) in RCA: 146] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
14
|
Sangaralingam A, Susko E, Bryant D, Spencer M. On the artefactual parasitic eubacteria clan in conditioned logdet phylogenies: heterotachy and ortholog identification artefacts as explanations. BMC Evol Biol 2010; 10:343. [PMID: 21062453 PMCID: PMC2992526 DOI: 10.1186/1471-2148-10-343] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2010] [Accepted: 11/09/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Phylogenetic reconstruction methods based on gene content often place all the parasitic and endosymbiotic eubacteria (parasites for short) together in a clan. Many other lines of evidence point to this parasites clan being an artefact. This artefact could be a consequence of the methods used to construct ortholog databases (due to some unknown bias), the methods used to estimate the phylogeny, or both.We test the idea that the parasites clan is an ortholog identification artefact by analyzing three different ortholog databases (COG, TRIBES, and OFAM), which were constructed using different methods, and are thus unlikely to share the same biases. In each case, we estimate a phylogeny using an improved version of the conditioned logdet distance method. If the parasites clan appears in trees from all three databases, it is unlikely to be an ortholog identification artefact.Accelerated loss of a subset of gene families in parasites (a form of heterotachy) may contribute to the difficulty of estimating a phylogeny from gene content data. We test the idea that heterotachy is the underlying reason for the estimation of an artefactual parasites clan by applying two different mixture models (phylogenetic and non-phylogenetic), in combination with conditioned logdet. In these models, there are two categories of gene families, one of which has accelerated loss in parasites. Distances are estimated separately from each category by conditioned logdet. This should reduce the tendency for tree estimation methods to group the parasites together, if heterotachy is the underlying reason for estimation of the parasites clan. RESULTS The parasites clan appears in conditioned logdet trees estimated from all three databases. This makes it less likely to be an artefact of database construction. The non-phylogenetic mixture model gives trees without a parasites clan. However, the phylogenetic mixture model still results in a tree with a parasites clan. Thus, it is not entirely clear whether heterotachy is the underlying reason for the estimation of a parasites clan. Simulation studies suggest that the phylogenetic mixture model approach may be unsuccessful because the model of gene family gain and loss it uses does not adequately describe the real data. CONCLUSIONS The most successful methods for estimating a reliable phylogenetic tree for parasitic and endosymbiotic eubacteria from gene content data are still ad-hoc approaches such as the SHOT distance method. however, the improved conditioned logdet method we developed here may be useful for non-parasites and can be accessed at http://www.liv.ac.uk/~cgrbios/cond_logdet.html.
Collapse
Affiliation(s)
- Ajanthah Sangaralingam
- Centre of Haemato-Oncology, Institute of Cancer, Bart's and the London School of Medicine (QMUL), Charterhouse Square, London EC1M 6BQ, UK
| | | | | | | |
Collapse
|
15
|
Coscollá M, Comas I, González-Candelas F. Quantifying Nonvertical Inheritance in the Evolution of Legionella pneumophila. Mol Biol Evol 2010; 28:985-1001. [PMID: 20961962 DOI: 10.1093/molbev/msq278] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Affiliation(s)
- Mireia Coscollá
- Unidad Mixta de Investigación Genómica y Salud CSISP-UV/Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Valencia, Spain
| | | | | |
Collapse
|
16
|
Abstract
Bacterial gene content variation during the course of evolution has been widely acknowledged and its pattern has been actively modeled in recent years. Gene truncation or gene pseudogenization also plays an important role in shaping bacterial genome content. Truncated genes could also arise from small-scale lateral gene transfer events. Unfortunately, the information of truncated genes has not been considered in any existing mathematical models on gene content variation. In this study, we developed a model to incorporate truncated genes. Maximum-likelihood estimates (MLEs) of the new model reveal fast rates of gene insertions/deletions on recent branches, suggesting a fast turnover of many recently transferred genes. The estimates also suggest that many truncated genes are in the process of being eliminated from the genome. Furthermore, we demonstrate that the ignorance of truncated genes in the estimation does not lead to a systematic bias but rather has a more complicated effect. Analysis using the new model not only provides more accurate estimates on gene gains/losses (or insertions/deletions), but also reduces any concern of a systematic bias from applying simplified models to bacterial genome evolution. Although not a primary purpose, the model incorporating truncated genes could be potentially used for phylogeny reconstruction using gene family content.
Collapse
|
17
|
Abstract
The contribution of horizontal gene transfer to evolution has been controversial since it was suggested to be a force driving evolution in the microbial world. In this paper, I review the current standpoint on horizontal gene transfer in evolutionary thinking and discuss how important horizontal gene transfer is in evolution in the broad sense, and particularly in prokaryotic evolution. I review recent literature, asking, first, which processes are involved in the evolutionary success of transferred genes and, secondly, about the extent of horizontal gene transfer towards different evolutionary times. Moreover, I discuss the feasibility of reconstructing ancient phylogenetic relationships in the face of horizontal gene transfer. Finally, I discuss how horizontal gene transfer fits in the current neo-Darwinian evolutionary paradigm and conclude there is a need for a new evolutionary paradigm that includes horizontal gene transfer as well as other mechanisms in the explanation of evolution.
Collapse
Affiliation(s)
- Luis Boto
- Departamento Biodiversidad y Biología Evolutiva, Museo Nacional Ciencias Naturales, CSIC, C/José Gutierrez Abascal 2, 28006 Madrid, Spain.
| |
Collapse
|
18
|
Cohen O, Pupko T. Inference and characterization of horizontally transferred gene families using stochastic mapping. Mol Biol Evol 2009; 27:703-13. [PMID: 19808865 PMCID: PMC2822287 DOI: 10.1093/molbev/msp240] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Macrogenomic events, in which genes are gained and lost, play a pivotal evolutionary role in microbial evolution. Nevertheless, probabilistic-evolutionary models describing such events and methods for their robust inference are considerably less developed than existing methodologies for analyzing site-specific sequence evolution. Here, we present a novel method for the inference of gains and losses of gene families. First, we develop probabilistic-evolutionary models describing the dynamics of gene-family content, which are more biologically realistic than previously suggested models. In our likelihood-based models, gains and losses are represented by transitions between presence and absence, given an underlying phylogeny. We employ a mixture-model approach in which we allow both the gain rate and the loss rate to vary among gene families. Second, we use these models together with the analytic implementation of stochastic mapping to infer branch-specific events. Our novel methodology allows us to infer and quantify horizontal gene transfer (HGT) events. This enables us to rank various gene families and lineages according to their propensity to undergo gains and losses. Applying our methodology to 4,873 gene families shows that: 1) the novel mixture models describe the observed variability in gene-family content among microbes significantly better than previous models; 2) The stochastic mapping approach enables accurate inference of gain and loss events based on simulations; 3) At least 34% of the gene families analyzed are inferred to have experienced HGT at least once during their evolution; and 4) Gene families that were inferred to experience HGT are both enriched and depleted with respect to specific functional categories.
Collapse
Affiliation(s)
- Ofir Cohen
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | | |
Collapse
|
19
|
Abstract
The notion that all prokaryotes belong to genomically and phenomically cohesive clusters that we might legitimately call "species" is a contentious one. At issue are (1) whether such clusters actually exist; (2) what species definition might most reliably identify them, if they do; and (3) what species concept -- by which is meant a genetic and ecological theory of speciation -- might best explain species existence and rationalize a species definition, if we could agree on one. We review existing theories and some relevant data. We conclude that microbiologists now understand in some detail the various genetic, population, and ecological processes that effect the evolution of prokaryotes. There will be on occasion circumstances under which these, working together, will form groups of related organisms sufficiently like each other that we might all agree to call them "species," but there is no reason that this must always be so. Thus, there is no principled way in which questions about prokaryotic species, such as how many there are, how large their populations are, or how globally they are distributed, can be answered. These questions can, however, be reformulated so that metagenomic methods and thinking will meaningfully address the biological patterns and processes whose understanding is our ultimate target.
Collapse
|
20
|
Isambert H, Stein RR. On the need for widespread horizontal gene transfers under genome size constraint. Biol Direct 2009; 4:28. [PMID: 19703318 PMCID: PMC2740843 DOI: 10.1186/1745-6150-4-28] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2009] [Accepted: 08/25/2009] [Indexed: 11/20/2022] Open
Abstract
Background While eukaryotes primarily evolve by duplication-divergence expansion (and reduction) of their own gene repertoire with only rare horizontal gene transfers, prokaryotes appear to evolve under both gene duplications and widespread horizontal gene transfers over long evolutionary time scales. But, the evolutionary origin of this striking difference in the importance of horizontal gene transfers remains by and large a mystery. Hypothesis We propose that the abundance of horizontal gene transfers in free-living prokaryotes is a simple but necessary consequence of two opposite effects: i) their apparent genome size constraint compared to typical eukaryote genomes and ii) their underlying genome expansion dynamics through gene duplication-divergence evolution, as demonstrated by the presence of many tandem and block repeated genes. In principle, this combination of genome size constraint and underlying duplication expansion should lead to a coalescent-like process with extensive turnover of functional genes. This would, however, imply the unlikely, systematic reinvention of functions from discarded genes within independent phylogenetic lineages. Instead, we propose that the long-term evolutionary adaptation of free-living prokaryotes must have resulted in the emergence of efficient non-phylogenetic pathways to circumvent gene loss. Implications This need for widespread horizontal gene transfers due to genome size constraint implies, in particular, that prokaryotes must remain under strong selection pressure in order to maintain the long-term evolutionary adaptation of their "mutualized" gene pool, beyond the inevitable turnover of individual prokaryote species. By contrast, the absence of genome size constraint for typical eukaryotes has presumably relaxed their need for widespread horizontal gene transfers and strong selection pressure. Yet, the resulting loss of genetic functions, due to weak selection pressure and inefficient gene recovery mechanisms, must have ultimately favored the emergence of more complex life styles and ecological integration of many eukaryotes. Reviewers This article was reviewed by Pierre Pontarotti, Eugene V Koonin and Sergei Maslov.
Collapse
Affiliation(s)
- Hervé Isambert
- Institut Curie, CNRS UMR168, 11 rue P, & M, Curie, 75005 Paris, France.
| | | |
Collapse
|
21
|
Abstract
Lateral gene transfer (LGT) and gene rearrangement are essential for shaping bacterial genomes during evolution. Separate attention has been focused on understanding the process of lateral gene transfer and the process of gene translocation. However, little is known about how gene translocation affects laterally transferred genes. Here we have examined gene translocations and lateral gene transfers in closely related genome pairs. The results reveal that translocated genes undergo elevated rates of evolution and gene translocation tends to take place preferentially in recently acquired genes. Translocated genes have a high probability to be truncated, suggesting that translocation followed by truncation/deletion might play an important role in the fast turnover of laterally transferred genes. Furthermore, more recently acquired genes have a higher proportion of genes on the leading strand, suggesting a strong strand bias of lateral gene transfer.
Collapse
|
22
|
Spencer M, Sangaralingam A. A phylogenetic mixture model for gene family loss in parasitic bacteria. Mol Biol Evol 2009; 26:1901-8. [PMID: 19435739 DOI: 10.1093/molbev/msp102] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Gene families are frequently gained and lost from prokaryotic genomes. It is widely believed that the rate of loss was accelerated for some but not all gene families in lineages that became parasites or endosymbionts. This leads to a form of heterotachy that may be responsible for the poor performance of phylogeny estimation based on gene content. We describe a mixture model that accounts for this heterotachy. We show that this model fits data on the distribution of gene families across bacteria from the COG database much better than previous models. However, it still favors an artifactual tree topology in which parasites form a clade over the more plausible 16S topology. In contrast to a previous model of genome dynamics, our model suggests that the ancestral bacterium had a small genome. We suggest that models of gene family gain and loss are likely to be more useful for understanding genome dynamics than for estimating phylogenetic trees.
Collapse
Affiliation(s)
- Matthew Spencer
- School of Biological Sciences, University of Liverpool, Liverpool, UK.
| | | |
Collapse
|
23
|
Didelot X, Darling A, Falush D. Inferring genomic flux in bacteria. Genes Dev 2009; 19:306-17. [PMID: 19015321 PMCID: PMC2652212 DOI: 10.1101/gr.082263.108] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2008] [Accepted: 10/29/2008] [Indexed: 11/24/2022]
Abstract
Acquisition and loss of genetic material are essential forces in bacterial microevolution. They have been repeatedly linked with adaptation of lineages to new lifestyles, and in particular, pathogenicity. Comparative genomics has the potential to elucidate this genetic flux, but there are many methodological challenges involved in inferring evolutionary events from collections of genome sequences. Here we describe a model-based method for using whole-genome sequences to infer the patterns of genome content evolution. A fundamental property of our model is that it allows the rates at which genetic elements are gained or lost to vary in time and from one lineage to another. Our approach is purely sequence based, and does not rely on gene identification. We show how inference can be performed under our model and illustrate its use on three datasets from Francisella tularensis, Streptococcus pyogenes, and Escherichia coli. In all three examples, we found interesting variations in the rates of genetic material gain and loss, which strongly correlate with their lifestyle. The algorithms we describe are implemented in a computer software named GenoPlast.
Collapse
Affiliation(s)
- Xavier Didelot
- Department of Statistics, University of Warwick, Coventry CV4 7AL, United Kingdom.
| | | | | |
Collapse
|
24
|
Abstract
Bacteria experience a continual influx of novel genetic material from a wide range of sources and yet their genomes remain relatively small. This aspect of bacterial evolution indicates that most newly arriving sequences are rapidly eliminated; however, numerous new genes persist, as evident from the presence of unique genes in almost all bacterial genomes. This review summarizes the methods for identifying new genes in bacterial genomes and examines the features that promote the retention and elimination of these evolutionary novelties.
Collapse
Affiliation(s)
- Chih-Horng Kuo
- Department of Ecology & Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | | |
Collapse
|