1
|
Aouad M, Flandrois JP, Jauffrit F, Gouy M, Gribaldo S, Brochier-Armanet C. A divide-and-conquer phylogenomic approach based on character supermatrices resolves early steps in the evolution of the Archaea. BMC Ecol Evol 2022; 22:1. [PMID: 34986784 PMCID: PMC8734073 DOI: 10.1186/s12862-021-01952-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 11/22/2021] [Indexed: 11/28/2022] Open
Abstract
Background The recent rise in cultivation-independent genome sequencing has provided key material to explore uncharted branches of the Tree of Life. This has been particularly spectacular concerning the Archaea, projecting them at the center stage as prominently relevant to understand early stages in evolution and the emergence of fundamental metabolisms as well as the origin of eukaryotes. Yet, resolving deep divergences remains a challenging task due to well-known tree-reconstruction artefacts and biases in extracting robust ancient phylogenetic signal, notably when analyzing data sets including the three Domains of Life. Among the various strategies aimed at mitigating these problems, divide-and-conquer approaches remain poorly explored, and have been primarily based on reconciliation among single gene trees which however notoriously lack ancient phylogenetic signal. Results We analyzed sub-sets of full supermatrices covering the whole Tree of Life with specific taxonomic sampling to robustly resolve different parts of the archaeal phylogeny in light of their current diversity. Our results strongly support the existence and early emergence of two main clades, Cluster I and Cluster II, which we name Ouranosarchaea and Gaiarchaea, and we clarify the placement of important novel archaeal lineages within these two clades. However, the monophyly and branching of the fast evolving nanosized DPANN members remains unclear and worth of further study. Conclusions We inferred a well resolved rooted phylogeny of the Archaea that includes all recently described phyla of high taxonomic rank. This phylogeny represents a valuable reference to study the evolutionary events associated to the early steps of the diversification of the archaeal domain. Beyond the specifics of archaeal phylogeny, our results demonstrate the power of divide-and-conquer approaches to resolve deep phylogenetic relationships, which should be applied to progressively resolve the entire Tree of Life. Supplementary Information The online version contains supplementary material available at 10.1186/s12862-021-01952-0.
Collapse
Affiliation(s)
- Monique Aouad
- Université de Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, 43 bd du 11 novembre 1918, 69622, Villeurbanne, France.,École Supérieure de Biologie-Biochimie-Biotechnologies, Université Catholique de Lyon, 10 place des archives, 69002, Lyon, France
| | - Jean-Pierre Flandrois
- Université de Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, 43 bd du 11 novembre 1918, 69622, Villeurbanne, France
| | - Frédéric Jauffrit
- Université de Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, 43 bd du 11 novembre 1918, 69622, Villeurbanne, France.,Technology Research Department, Innovation Unit, bioMérieux SA, Marcy Étoile, France
| | - Manolo Gouy
- Université de Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, 43 bd du 11 novembre 1918, 69622, Villeurbanne, France
| | - Simonetta Gribaldo
- Department of Microbiology, Unit "Evolutionary Biology of the Microbial Cell", UMR2001, Institut Pasteur, Paris, France.
| | - Céline Brochier-Armanet
- Université de Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, 43 bd du 11 novembre 1918, 69622, Villeurbanne, France.
| |
Collapse
|
2
|
Abstract
Previous reports have shown that environmental temperature impacts proteome evolution in Bacteria and Archaea. However, it is unknown whether thermoadaptation mainly occurs via the sequential accumulation of substitutions, massive horizontal gene transfers, or both. Measuring the real contribution of amino acid substitution to thermoadaptation is challenging, because of confounding environmental and genetic factors (e.g., pH, salinity, genomic G + C content) that also affect proteome evolution. Here, using Methanococcales, a major archaeal lineage, as a study model, we show that optimal growth temperature is the major factor affecting variations in amino acid frequencies of proteomes. By combining phylogenomic and ancestral sequence reconstruction approaches, we disclose a sequential substitutional scheme in which lysine plays a central role by fine tuning the pool of arginine, serine, threonine, glutamine, and asparagine, whose frequencies are strongly correlated with optimal growth temperature. Finally, we show that colonization to new thermal niches is not associated with high amounts of horizontal gene transfers. Altogether, although the acquisition of a few key proteins through horizontal gene transfer may have favored thermoadaptation in Methanococcales, our findings support sequential amino acid substitutions as the main factor driving thermoadaptation.
Collapse
Affiliation(s)
- Michel Lecocq
- Laboratoire de Biométrie et Biologie Évolutive, Université de Lyon, Université Lyon 1, CNRS, UMR5558, Villeurbanne, France
| | - Mathieu Groussin
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Manolo Gouy
- Laboratoire de Biométrie et Biologie Évolutive, Université de Lyon, Université Lyon 1, CNRS, UMR5558, Villeurbanne, France
| | - Céline Brochier-Armanet
- Laboratoire de Biométrie et Biologie Évolutive, Université de Lyon, Université Lyon 1, CNRS, UMR5558, Villeurbanne, France
| |
Collapse
|
3
|
Blanquart S, Groussin M, Le Roy A, Szöllosi GJ, Girard E, Franzetti B, Gouy M, Madern D. Resurrection of Ancestral Malate Dehydrogenases Reveals the Evolutionary History of Halobacterial Proteins : Deciphering Gene Trajectories and Changes in Biochemical Properties. Mol Biol Evol 2021; 38:3754-3774. [PMID: 33974066 PMCID: PMC8382911 DOI: 10.1093/molbev/msab146] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Extreme halophilic Archaea thrive in high salt, where, through proteomic adaptation, they cope with the strong osmolarity and extreme ionic conditions of their environment. In spite of wide fundamental interest, however, studies providing insights into this adaptation are scarce, because of practical difficulties inherent to the purification and characterization of halophilic enzymes. In this work, we describe the evolutionary history of malate dehydrogenases (MalDH) within Halobacteria (a class of the Euryarchaeota phylum). We resurrected nine ancestors along the inferred halobacterial MalDH phylogeny, including the Last Common Ancestral MalDH of Halobacteria (LCAHa) and compared their biochemical properties with those of five modern halobacterial MalDHs. We monitored the stability of these various MalDHs, their oligomeric states and enzymatic properties, as a function of concentration for different salts in the solvent. We found that a variety of evolutionary processes such as amino acid replacement, gene duplication, loss of MalDH gene and replacement owing to horizontal transfer resulted in significant differences in solubility, stability and catalytic properties between these enzymes in the three Halobacteriales, Haloferacales and Natrialbales orders since the LCAHa MalDH.We also showed how a stability trade-off might favor the emergence of new properties during adaptation to diverse environmental conditions. Altogether, our results suggest a new view of halophilic protein adaptation in Archaea.
Collapse
Affiliation(s)
| | - Mathieu Groussin
- Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, 43 bd du 11 novembre 1918, Villeurbanne, F-69622, France.,Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, Massachusetts, 02139, USA
| | - Aline Le Roy
- Univ Grenoble Alpes, CNRS, CEA, IBS, Grenoble, F-38000, France
| | - Gergely J Szöllosi
- Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, 43 bd du 11 novembre 1918, Villeurbanne, F-69622, France.,MTA-ELTE "Lendulet" Evolutionary Genomics Research Group, Budapest, H-1117, Hungary
| | - Eric Girard
- Univ Grenoble Alpes, CNRS, CEA, IBS, Grenoble, F-38000, France
| | - Bruno Franzetti
- Univ Grenoble Alpes, CNRS, CEA, IBS, Grenoble, F-38000, France
| | - Manolo Gouy
- Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, 43 bd du 11 novembre 1918, Villeurbanne, F-69622, France
| | | |
Collapse
|
4
|
Comte N, Morel B, Hasić D, Guéguen L, Boussau B, Daubin V, Penel S, Scornavacca C, Gouy M, Stamatakis A, Tannier E, Parsons DP. Treerecs: an integrated phylogenetic tool, from sequences to reconciliations. Bioinformatics 2021; 36:4822-4824. [PMID: 33085745 DOI: 10.1093/bioinformatics/btaa615] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 06/22/2020] [Accepted: 07/09/2020] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION Gene and species tree reconciliation methods are used to interpret gene trees, root them and correct uncertainties that are due to scarcity of signal in multiple sequence alignments. So far, reconciliation tools have not been integrated in standard phylogenetic software and they either lack performance on certain functions, or usability for biologists. RESULTS We present Treerecs, a phylogenetic software based on duplication-loss reconciliation. Treerecs is simple to install and to use. It is fast and versatile, has a graphic output, and can be used along with methods for phylogenetic inference on multiple alignments like PLL and Seaview. AVAILABILITY AND IMPLEMENTATION Treerecs is open-source. Its source code (C++, AGPLv3) and manuals are available from https://project.inria.fr/treerecs/.
Collapse
Affiliation(s)
- Nicolas Comte
- Inria Grenoble Rhône-Alpes, 38334 Montbonnot, France
| | - Benoit Morel
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Damir Hasić
- Department of Mathematics, University of Sarajevo, Sarajevo, Bosnia and Herzegovina
| | - Laurent Guéguen
- Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | - Bastien Boussau
- Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | - Vincent Daubin
- Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | - Simon Penel
- Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | - Celine Scornavacca
- ISEM, CNRS, Université de Montpellier, IRD, EPHE, Montpellier 34000, France
| | - Manolo Gouy
- Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.,Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Eric Tannier
- Inria Grenoble Rhône-Alpes, 38334 Montbonnot, France.,Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | | |
Collapse
|
5
|
Gouy M, Tannier E, Comte N, Parsons DP. Seaview Version 5: A Multiplatform Software for Multiple Sequence Alignment, Molecular Phylogenetic Analyses, and Tree Reconciliation. Methods Mol Biol 2021; 2231:241-260. [PMID: 33289897 DOI: 10.1007/978-1-0716-1036-7_15] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
We present Seaview version 5, a multiplatform program to perform multiple alignment and phylogenetic tree building from molecular sequence data. Seaview provides network access to sequence databases, alignment with arbitrary algorithm, parsimony, distance and maximum likelihood tree building with PhyML, and display, printing, and copy-to-clipboard or to SVG files of rooted or unrooted, binary or multifurcating phylogenetic trees. While Seaview is primarily a program providing a graphical user interface to guide the user into performing desired analyses, Seaview possesses also a command-line mode adequate for user-provided scripts. Seaview version 5 introduces the ability to reconcile a gene tree with a reference species tree and use this reconciliation to root and rearrange the gene tree. Seaview is freely available at http://doua.prabi.fr/software/seaview .
Collapse
Affiliation(s)
- Manolo Gouy
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, Villeurbanne, France.
| | - Eric Tannier
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, Villeurbanne, France
- INRIA Grenoble-Rhône-Alpes, Montbonnot, France
| | | | | |
Collapse
|
6
|
Aouad M, Taib N, Oudart A, Lecocq M, Gouy M, Brochier-Armanet C. Extreme halophilic archaea derive from two distinct methanogen Class II lineages. Mol Phylogenet Evol 2018; 127:46-54. [DOI: 10.1016/j.ympev.2018.04.011] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2017] [Revised: 03/12/2018] [Accepted: 04/09/2018] [Indexed: 10/17/2022]
|
7
|
|
8
|
Jauffrit F, Penel S, Delmotte S, Rey C, de Vienne DM, Gouy M, Charrier JP, Flandrois JP, Brochier-Armanet C. RiboDB Database: A Comprehensive Resource for Prokaryotic Systematics. Mol Biol Evol 2016; 33:2170-2. [PMID: 27189556 DOI: 10.1093/molbev/msw088] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Ribosomal proteins (r-proteins) are increasingly used as an alternative to ribosomal rRNA for prokaryotic systematics. However, their routine use is difficult because r-proteins are often not or wrongly annotated in complete genome sequences, and there is currently no dedicated exhaustive database of r-proteins. RiboDB aims at fulfilling this gap. This weekly updated comprehensive database allows the fast and easy retrieval of r-protein sequences from publicly available complete prokaryotic genome sequences. The current version of RiboDB contains 90 r-proteins from 3,750 prokaryotic complete genomes encompassing 38 phyla/major classes and 1,759 different species. RiboDB is accessible at http://ribodb.univ-lyon1.fr and through ACNUC interfaces.
Collapse
Affiliation(s)
- Frédéric Jauffrit
- Univ Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Èvolutive, 43 bd du 11 novembre 1918, F-69622, Villeurbanne, France Technology Research Department, Innovation Unit, bioMérieux SA, Marcy L'Etoile, France
| | - Simon Penel
- Univ Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Èvolutive, 43 bd du 11 novembre 1918, F-69622, Villeurbanne, France
| | - Stéphane Delmotte
- Univ Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Èvolutive, 43 bd du 11 novembre 1918, F-69622, Villeurbanne, France
| | - Carine Rey
- Univ Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Èvolutive, 43 bd du 11 novembre 1918, F-69622, Villeurbanne, France Laboratoire de Biologie et de Modélisation de la Cellule, École Normale Supérieure De Lyon, CNRS UMR 5239, UCBL1, IFR128, Lyon, France Master BioSciences, Département de Biologie, École Normale Supérieure de Lyon, Université de Lyon, UCB Lyon1, Lyon, France
| | - Damien M de Vienne
- Univ Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Èvolutive, 43 bd du 11 novembre 1918, F-69622, Villeurbanne, France
| | - Manolo Gouy
- Univ Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Èvolutive, 43 bd du 11 novembre 1918, F-69622, Villeurbanne, France
| | | | - Jean-Pierre Flandrois
- Univ Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Èvolutive, 43 bd du 11 novembre 1918, F-69622, Villeurbanne, France
| | - Céline Brochier-Armanet
- Univ Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Èvolutive, 43 bd du 11 novembre 1918, F-69622, Villeurbanne, France
| |
Collapse
|
9
|
Groussin M, Boussau B, Szöllõsi G, Eme L, Gouy M, Brochier-Armanet C, Daubin V. Gene Acquisitions from Bacteria at the Origins of Major Archaeal Clades Are Vastly Overestimated. Mol Biol Evol 2015; 33:305-10. [PMID: 26541173 PMCID: PMC4866543 DOI: 10.1093/molbev/msv249] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
In a recent article, Nelson-Sathi et al. (NS) report that the origins of major archaeal lineages (MAL) correspond to massive group-specific gene acquisitions via HGT from bacteria (Nelson-Sathi et al. 2015. Origins of major archaeal clades correspond to gene acquisitions from bacteria. Nature 517(7532):77-80.). If correct, this would have fundamental implications for the process of diversification in microbes. However, a reexamination of these data and results shows that the methodology used by NS systematically inflates the number of genes acquired at the root of each MAL, and incorrectly assumes bacterial origins for these genes. A reanalysis of their data with appropriate phylogenetic models accounting for the dynamics of gene gain and loss between lineages supports the continuous acquisition of genes over long periods in the evolution of Archaea.
Collapse
Affiliation(s)
- Mathieu Groussin
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA
| | - Bastien Boussau
- Université de Lyon, Lyon, France Université Lyon 1, Villeurbanne, France CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne, France
| | - Gergely Szöllõsi
- ELTE-MTA "Lendület" Biophysics Research Group, Budapest, Hungary
| | - Laura Eme
- Centre for Comparative Genomics and Evolutionary Bioinformatics, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Canada
| | - Manolo Gouy
- Université de Lyon, Lyon, France Université Lyon 1, Villeurbanne, France CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne, France
| | - Céline Brochier-Armanet
- Université de Lyon, Lyon, France Université Lyon 1, Villeurbanne, France CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne, France
| | - Vincent Daubin
- Université de Lyon, Lyon, France Université Lyon 1, Villeurbanne, France CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne, France
| |
Collapse
|
10
|
Flandrois JP, Perrière G, Gouy M. leBIBIQBPP: a set of databases and a webtool for automatic phylogenetic analysis of prokaryotic sequences. BMC Bioinformatics 2015; 16:251. [PMID: 26264559 PMCID: PMC4531848 DOI: 10.1186/s12859-015-0692-z] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2015] [Accepted: 07/31/2015] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Estimating the phylogenetic position of bacterial and archaeal organisms by genetic sequence comparisons is considered as the gold-standard in taxonomy. This is also a way to identify the species of origin of the sequence. The quality of the reference database used in such analyses is crucial: the database must reflect the up-to-date bacterial nomenclature and accurately indicate the species of origin of its sequences. DESCRIPTION leBIBI(QBPP) is a web tool taking as input a series of nucleotide sequences belonging to one of a set of reference markers (e.g., SSU rRNA, rpoB, groEL2) and automatically retrieving closely related sequences, aligning them, and performing phylogenetic reconstruction using an approximate maximum likelihood approach. The system returns a set of quality parameters and, if possible, a suggested taxonomic assigment for the input sequences. The reference databases are extracted from GenBank and present four degrees of stringency, from the "superstringent" degree (one type strain per species) to the loosely parsed degree ("lax" database). A set of one hundred to more than a thousand sequences may be analyzed at a time. The speed of the process has been optimized through careful hardware selection and database design. CONCLUSION leBIBI(QBPP) is a powerful tool helping biologists to position bacterial or archaeal sequence commonly used markers in a phylogeny. It is a diagnostic tool for clinical, industrial and environmental microbiology laboratory, as well as an exploratory tool for more specialized laboratories. Its main advantages, relatively to comparable systems are: i) the use of a broad set of databases covering diverse markers with various degrees of stringency; ii) the use of an approximate Maximum Likelihood approach for phylogenetic reconstruction; iii) a speed compatible with on-line usage; and iv) providing fully documented results to help the user in decision making.
Collapse
Affiliation(s)
- Jean-Pierre Flandrois
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Claude Bernard - Lyon 1, 43 bd. du 11 Novembre 1918, Villeurbanne, 69622, France.
| | - Guy Perrière
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Claude Bernard - Lyon 1, 43 bd. du 11 Novembre 1918, Villeurbanne, 69622, France.
| | - Manolo Gouy
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Claude Bernard - Lyon 1, 43 bd. du 11 Novembre 1918, Villeurbanne, 69622, France.
| |
Collapse
|
11
|
Groussin M, Hobbs JK, Szöllősi GJ, Gribaldo S, Arcus VL, Gouy M. Toward more accurate ancestral protein genotype-phenotype reconstructions with the use of species tree-aware gene trees. Mol Biol Evol 2014; 32:13-22. [PMID: 25371435 PMCID: PMC4271536 DOI: 10.1093/molbev/msu305] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The resurrection of ancestral proteins provides direct insight into how natural selection has shaped proteins found in nature. By tracing substitutions along a gene phylogeny, ancestral proteins can be reconstructed in silico and subsequently synthesized in vitro. This elegant strategy reveals the complex mechanisms responsible for the evolution of protein functions and structures. However, to date, all protein resurrection studies have used simplistic approaches for ancestral sequence reconstruction (ASR), including the assumption that a single sequence alignment alone is sufficient to accurately reconstruct the history of the gene family. The impact of such shortcuts on conclusions about ancestral functions has not been investigated. Here, we show with simulations that utilizing information on species history using a model that accounts for the duplication, horizontal transfer, and loss (DTL) of genes statistically increases ASR accuracy. This underscores the importance of the tree topology in the inference of putative ancestors. We validate our in silico predictions using in vitro resurrection of the LeuB enzyme for the ancestor of the Firmicutes, a major and ancient bacterial phylum. With this particular protein, our experimental results demonstrate that information on the species phylogeny results in a biochemically more realistic and kinetically more stable ancestral protein. Additional resurrection experiments with different proteins are necessary to statistically quantify the impact of using species tree-aware gene trees on ancestral protein phenotypes. Nonetheless, our results suggest the need for incorporating both sequence and DTL information in future studies of protein resurrections to accurately define the genotype-phenotype space in which proteins diversify.
Collapse
Affiliation(s)
- Mathieu Groussin
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, UMR5558, Villeurbanne, France
| | - Joanne K Hobbs
- Department of Biological Sciences, University of Waikato, Hamilton, New Zealand
| | - Gergely J Szöllősi
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, UMR5558, Villeurbanne, France ELTE-MTA "Lendület" Biophysics Research Group, Pázmány, Budapest, Hungary
| | - Simonetta Gribaldo
- Unité de Biologie Moléculaire du Gène chez les Extrêmophiles, Département de Microbiologie, Institut Pasteur, Paris cedex, France
| | - Vickery L Arcus
- Department of Biological Sciences, University of Waikato, Hamilton, New Zealand
| | - Manolo Gouy
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, UMR5558, Villeurbanne, France
| |
Collapse
|
12
|
Abstract
The evolutionary origin of eukaryotes is a question of great interest for which many different hypotheses have been proposed. These hypotheses predict distinct patterns of evolutionary relationships for individual genes of the ancestral eukaryotic genome. The availability of numerous completely sequenced genomes covering the three domains of life makes it possible to contrast these predictions with empirical data. We performed a systematic analysis of the phylogenetic relationships of ancestral eukaryotic genes with archaeal and bacterial genes. In contrast with previous studies, we emphasize the critical importance of methods accounting for statistical support, horizontal gene transfer, and gene loss, and we disentangle the processes underlying the phylogenomic pattern we observe. We first recover a clear signal indicating that a fraction of the bacteria-like eukaryotic genes are of alphaproteobacterial origin. Then, we show that the majority of bacteria-related eukaryotic genes actually do not point to a relationship with a specific bacterial taxonomic group. We also provide evidence that eukaryotes branch close to the last archaeal common ancestor. Our results demonstrate that there is no phylogenetic support for hypotheses involving a fusion with a bacterium other than the ancestor of mitochondria. Overall, they leave only two possible interpretations, respectively, based on the early-mitochondria hypotheses, which suppose an early endosymbiosis of an alphaproteobacterium in an archaeal host and on the slow-drip autogenous hypothesis, in which early eukaryotic ancestors were particularly prone to horizontal gene transfers.
Collapse
Affiliation(s)
- Nicolas C Rochette
- Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, Université de Lyon, Universite Claude Bernard Lyon 1, Villeurbanne, France
| | | | | |
Collapse
|
13
|
Abstract
Several lines of evidence such as the basal location of thermophilic lineages in large-scale phylogenetic trees and the ancestral sequence reconstruction of single enzymes or large protein concatenations support the conclusion that the ancestors of the bacterial and archaeal domains were thermophilic organisms which were adapted to hot environments during the early stages of the Earth. A parsimonious reasoning would therefore suggest that the last universal common ancestor (LUCA) was also thermophilic. Various authors have used branch-wise non-homogeneous evolutionary models that better capture the variation of molecular compositions among lineages to accurately reconstruct the ancestral G + C contents of ribosomal RNAs and the ancestral amino acid composition of highly conserved proteins. They confirmed the thermophilic nature of the ancestors of Bacteria and Archaea but concluded that LUCA, their last common ancestor, was a mesophilic organism having a moderate optimal growth temperature. In this letter, we investigate the unknown nature of the phylogenetic signal that informs ancestral sequence reconstruction to support this non-parsimonious scenario. We find that rate variation across sites of molecular sequences provides information at different time scales by recording the oldest adaptation to temperature in slow-evolving regions and subsequent adaptations in fast-evolving ones.
Collapse
Affiliation(s)
- Mathieu Groussin
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, UMR5558, Villeurbanne, France.
| | | | | | | | | |
Collapse
|
14
|
Gouy M, Rousselle Y, Bastianelli D, Lecomte P, Bonnal L, Roques D, Efile JC, Rocher S, Daugrois J, Toubi L, Nabeneza S, Hervouet C, Telismart H, Denis M, Thong-Chane A, Glaszmann JC, Hoarau JY, Nibouche S, Costet L. Experimental assessment of the accuracy of genomic selection in sugarcane. Theor Appl Genet 2013; 126:2575-86. [PMID: 23907359 DOI: 10.1007/s00122-013-2156-z] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2013] [Accepted: 07/12/2013] [Indexed: 05/09/2023]
Abstract
Sugarcane cultivars are interspecific hybrids with an aneuploid, highly heterozygous polyploid genome. The complexity of the sugarcane genome is the main obstacle to the use of marker-assisted selection in sugarcane breeding. Given the promising results of recent studies of plant genomic selection, we explored the feasibility of genomic selection in this complex polyploid crop. Genetic values were predicted in two independent panels, each composed of 167 accessions representing sugarcane genetic diversity worldwide. Accessions were genotyped with 1,499 DArT markers. One panel was phenotyped in Reunion Island and the other in Guadeloupe. Ten traits concerning sugar and bagasse contents, digestibility and composition of the bagasse, plant morphology, and disease resistance were used. We used four statistical predictive models: bayesian LASSO, ridge regression, reproducing kernel Hilbert space, and partial least square regression. The accuracy of the predictions was assessed through the correlation between observed and predicted genetic values by cross validation within each panel and between the two panels. We observed equivalent accuracy among the four predictive models for a given trait, and marked differences were observed among traits. Depending on the trait concerned, within-panel cross validation yielded median correlations ranging from 0.29 to 0.62 in the Reunion Island panel and from 0.11 to 0.5 in the Guadeloupe panel. Cross validation between panels yielded correlations ranging from 0.13 for smut resistance to 0.55 for brix. This level of correlations is promising for future implementations. Our results provide the first validation of genomic selection in sugarcane.
Collapse
Affiliation(s)
- M Gouy
- eRcane, 97494, Sainte-Clotilde, La Réunion, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Guéguen L, Gaillard S, Boussau B, Gouy M, Groussin M, Rochette NC, Bigot T, Fournier D, Pouyet F, Cahais V, Bernard A, Scornavacca C, Nabholz B, Haudry A, Dachary L, Galtier N, Belkhir K, Dutheil JY. Bio++: Efficient Extensible Libraries and Tools for Computational Molecular Evolution. Mol Biol Evol 2013; 30:1745-50. [DOI: 10.1093/molbev/mst097] [Citation(s) in RCA: 132] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
|
16
|
Abstract
Most models of nucleotide or amino acid substitution used in phylogenetic studies assume that the evolutionary process has been homogeneous across lineages and that composition of nucleotides or amino acids has remained the same throughout the tree. These oversimplified assumptions are refuted by the observation that compositional variability characterizes extant biological sequences. Branch-heterogeneous models of protein evolution that account for compositional variability have been developed, but are not yet in common use because of the large number of parameters required, leading to high computational costs and potential overparameterization. Here, we present a new branch-nonhomogeneous and nonstationary model of protein evolution that captures more accurately the high complexity of sequence evolution. This model, henceforth called Correspondence and likelihood analysis (COaLA), makes use of a correspondence analysis to reduce the number of parameters to be optimized through maximum likelihood, focusing on most of the compositional variation observed in the data. The model was thoroughly tested on both simulated and biological data sets to show its high performance in terms of data fitting and CPU time. COaLA efficiently estimates ancestral amino acid frequencies and sequences, making it relevant for studies aiming at reconstructing and resurrecting ancestral amino acid sequences. Finally, we applied COaLA on a concatenate of universal amino acid sequences to confirm previous results obtained with a nonhomogeneous Bayesian model regarding the early pattern of adaptation to optimal growth temperature, supporting the mesophilic nature of the Last Universal Common Ancestor.
Collapse
Affiliation(s)
- M Groussin
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, UMR5558, Villeurbanne, France.
| | | | | |
Collapse
|
17
|
Lebeau A, Gouy M, Daunay MC, Wicker E, Chiroleu F, Prior P, Frary A, Dintinger J. Genetic mapping of a major dominant gene for resistance to Ralstonia solanacearum in eggplant. Theor Appl Genet 2013; 126:143-58. [PMID: 22930132 DOI: 10.1007/s00122-012-1969-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2012] [Accepted: 08/16/2012] [Indexed: 05/24/2023]
Abstract
Resistance of eggplant against Ralstonia solanacearum phylotype I strains was assessed in a F(6) population of recombinant inbred lines (RILs) derived from a intra-specific cross between S. melongena MM738 (susceptible) and AG91-25 (resistant). Resistance traits were determined as disease score, percentage of wilted plants, and stem-based bacterial colonization index, as assessed in greenhouse experiments conducted in Réunion Island, France. The AG91-25 resistance was highly efficient toward strains CMR134, PSS366 and GMI1000, but only partial toward the highly virulent strain PSS4. The partial resistance found against PSS4 was overcome under high inoculation pressure, with heritability estimates from 0.28 to 0.53, depending on the traits and season. A genetic map was built with 119 AFLP, SSR and SRAP markers positioned on 18 linkage groups (LG), for a total length of 884 cM, and used for quantitative trait loci (QTL) analysis. A major dominant gene, named ERs1, controlled the resistance to strains CMR134, PSS366, and GMI1000. Against strain PSS4, this gene was not detected, but a significant QTL involved in delay of disease progress was detected on another LG. The possible use of the major resistance gene ERs1 in marker-assisted selection and the prospects offered for academic studies of a possible gene for gene system controlling resistance to bacterial wilt in solanaceous plants are discussed.
Collapse
Affiliation(s)
- A Lebeau
- CIRAD, UMR Peuplements végétaux et Bioagresseurs en Milieu Tropical (PVBMT), 7 chemin de l'IRAT, 97410 Saint Pierre, La Réunion, France
| | | | | | | | | | | | | | | |
Collapse
|
18
|
Abstract
Comparisons of gene trees and species trees are key to understanding major processes of genome evolution such as gene duplication and loss. Because current methods to reconstruct phylogenies fail to model the two-way dependency between gene trees and the species tree, they often misrepresent gene and species histories. We present a new probabilistic model to jointly infer rooted species and gene trees for dozens of genomes and thousands of gene families. We use simulations to show that this method accurately infers the species tree and gene trees, is robust to misspecification of the models of sequence and gene family evolution, and provides a precise historic record of gene duplications and losses throughout genome evolution. We simultaneously reconstruct the history of mammalian species and their genes based on 36 completely sequenced genomes, and use the reconstructed gene trees to infer the gene content and organization of ancestral mammalian genomes. We show that our method yields a more accurate picture of ancestral genomes than the trees available in the authoritative database Ensembl.
Collapse
Affiliation(s)
- Bastien Boussau
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, UMR 5558, Villeurbanne F-69622, France.
| | | | | | | | | | | |
Collapse
|
19
|
Groussin M, Gouy M. Adaptation to Environmental Temperature Is a Major Determinant of Molecular Evolutionary Rates in Archaea. Mol Biol Evol 2011; 28:2661-74. [DOI: 10.1093/molbev/msr098] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
|
20
|
Gouy M, Guindon S, Gascuel O. SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol 2010; 27:221-224. [PMID: 19854763 DOI: 10.1093/molbev] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2023] Open
Abstract
We present SeaView version 4, a multiplatform program designed to facilitate multiple alignment and phylogenetic tree building from molecular sequence data through the use of a graphical user interface. SeaView version 4 combines all the functions of the widely used programs SeaView (in its previous versions) and Phylo_win, and expands them by adding network access to sequence databases, alignment with arbitrary algorithm, maximum-likelihood tree building with PhyML, and display, printing, and copy-to-clipboard of rooted or unrooted, binary or multifurcating phylogenetic trees. In relation to the wide present offer of tools and algorithms for phylogenetic analyses, SeaView is especially useful for teaching and for occasional users of such software. SeaView is freely available at http://pbil.univ-lyon1.fr/software/seaview.
Collapse
|
21
|
Arigon AM, Perriere G, Gouy M. Proposals for classification methods dedicated to biological data. IJBET 2010. [DOI: 10.1504/ijbet.2010.029649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
22
|
Gouy M, Guindon S, Gascuel O. SeaView Version 4: A Multiplatform Graphical User Interface for Sequence Alignment and Phylogenetic Tree Building. Mol Biol Evol 2009; 27:221-4. [PMID: 19854763 DOI: 10.1093/molbev/msp259] [Citation(s) in RCA: 3794] [Impact Index Per Article: 252.9] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
23
|
Boussau B, Guéguen L, Gouy M. A mixture model and a hidden markov model to simultaneously detect recombination breakpoints and reconstruct phylogenies. Evol Bioinform Online 2009; 5:67-79. [PMID: 19812727 PMCID: PMC2747125 DOI: 10.4137/ebo.s2242] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Homologous recombination is a pervasive biological process that affects sequences in all living organisms and viruses. In the presence of recombination, the evolutionary history of an alignment of homologous sequences cannot be properly depicted by a single bifurcating tree: some sites have evolved along a specific phylogenetic tree, others have followed another path. Methods available to analyse recombination in sequences usually involve an analysis of the alignment through sliding-windows, or are particularly demanding in computational resources, and are often limited to nucleotide sequences. In this article, we propose and implement a Mixture Model on trees and a phylogenetic Hidden Markov Model to reveal recombination breakpoints while searching for the various evolutionary histories that are present in an alignment known to have undergone homologous recombination. These models are sufficiently efficient to be applied to dozens of sequences on a single desktop computer, and can handle equivalently nucleotide or protein sequences. We estimate their accuracy on simulated sequences and test them on real data.
Collapse
Affiliation(s)
- Bastien Boussau
- Université de Lyon, université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, 43 boulevard du 11 novembre 1918, Villeurbanne F-69622, France.
| | | | | |
Collapse
|
24
|
Penel S, Arigon AM, Dufayard JF, Sertier AS, Daubin V, Duret L, Gouy M, Perrière G. Databases of homologous gene families for comparative genomics. BMC Bioinformatics 2009; 10 Suppl 6:S3. [PMID: 19534752 PMCID: PMC2697650 DOI: 10.1186/1471-2105-10-s6-s3] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Background Comparative genomics is a central step in many sequence analysis studies, from gene annotation and the identification of new functional regions in genomes, to the study of evolutionary processes at the molecular level (speciation, single gene or whole genome duplications, etc.) and phylogenetics. In that context, databases providing users high quality homologous families and sequence alignments as well as phylogenetic trees based on state of the art algorithms are becoming indispensable. Methods We developed an automated procedure allowing massive all-against-all similarity searches, gene clustering, multiple alignments computation, and phylogenetic trees construction and reconciliation. The application of this procedure to a very large set of sequences is possible through parallel computing on a large computer cluster. Results Three databases were developed using this procedure: HOVERGEN, HOGENOM and HOMOLENS. These databases share the same architecture but differ in their content. HOVERGEN contains sequences from vertebrates, HOGENOM is mainly devoted to completely sequenced microbial organisms, and HOMOLENS is devoted to metazoan genomes from Ensembl. Access to the databases is provided through Web query forms, a general retrieval system and a client-server graphical interface. The later can be used to perform tree-pattern based searches allowing, among other uses, to retrieve sets of orthologous genes. The three databases, as well as the software required to build and query them, can be used or downloaded from the PBIL (Pôle Bioinformatique Lyonnais) site at .
Collapse
Affiliation(s)
- Simon Penel
- Laboratoire de Biométrie et Biologie Evolutive, CNRS, Université Claude Bernard - Lyon 1, 43 bd, du 11 Novembre 1918, 69622 Villeurbanne Cedex, France.
| | | | | | | | | | | | | | | |
Collapse
|
25
|
Boussau B, Blanquart S, Necsulea A, Lartillot N, Gouy M. Parallel adaptations to high temperatures in the Archaean eon. Nature 2008; 456:942-5. [DOI: 10.1038/nature07393] [Citation(s) in RCA: 173] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2008] [Accepted: 09/01/2008] [Indexed: 11/09/2022]
|
26
|
Boussau B, Guéguen L, Gouy M. Accounting for horizontal gene transfers explains conflicting hypotheses regarding the position of aquificales in the phylogeny of Bacteria. BMC Evol Biol 2008; 8:272. [PMID: 18834516 PMCID: PMC2584045 DOI: 10.1186/1471-2148-8-272] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2008] [Accepted: 10/03/2008] [Indexed: 01/09/2023] Open
Abstract
Background Despite a large agreement between ribosomal RNA and concatenated protein phylogenies, the phylogenetic tree of the bacterial domain remains uncertain in its deepest nodes. For instance, the position of the hyperthermophilic Aquificales is debated, as their commonly observed position close to Thermotogales may proceed from horizontal gene transfers, long branch attraction or compositional biases, and may not represent vertical descent. Indeed, another view, based on the analysis of rare genomic changes, places Aquificales close to epsilon-Proteobacteria. Results To get a whole genome view of Aquifex relationships, all trees containing sequences from Aquifex in the HOGENOM database were surveyed. This study revealed that Aquifex is most often found as a neighbour to Thermotogales. Moreover, informational genes, which appeared to be less often transferred to the Aquifex lineage than non-informational genes, most often placed Aquificales close to Thermotogales. To ensure these results did not come from long branch attraction or compositional artefacts, a subset of carefully chosen proteins from a wide range of bacterial species was selected for further scrutiny. Among these genes, two phylogenetic hypotheses were found to be significantly more likely than the others: the most likely hypothesis placed Aquificales as a neighbour to Thermotogales, and the second one with epsilon-Proteobacteria. We characterized the genes that supported each of these two hypotheses, and found that differences in rates of evolution or in amino-acid compositions could not explain the presence of two incongruent phylogenetic signals in the alignment. Instead, evidence for a large Horizontal Gene Transfer between Aquificales and epsilon-Proteobacteria was found. Conclusion Methods based on concatenated informational proteins and methods based on character cladistics led to different conclusions regarding the position of Aquificales because this lineage has undergone many horizontal gene transfers. However, if a tree of vertical descent can be reconstructed for Bacteria, our results suggest Aquificales should be placed close to Thermotogales.
Collapse
Affiliation(s)
- Bastien Boussau
- Université de Lyon; Université Lyon 1; CNRS; INRIA; Laboratoire de Biométrie et Biologie Evolutive, 43 boulevard du 11 novembre 1918, Villeurbanne F-69622, France.
| | | | | |
Collapse
|
27
|
Dumitrescu O, Tristan A, Meugnier H, Bes M, Gouy M, Etienne J, Lina G, Vandenesch F. Polymorphism of theStaphylococcus aureusPanton‐Valentine Leukocidin Genes and Its Possible Link with the Fitness of Community‐Associated Methicillin‐ResistantS. aureus. J Infect Dis 2008; 198:792-4. [DOI: 10.1086/590914] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
|
28
|
|
29
|
Arigon AM, Perrière G, Gouy M. Automatic identification of large collections of protein-coding or rRNA sequences. Biochimie 2007; 90:609-14. [PMID: 17920750 DOI: 10.1016/j.biochi.2007.08.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2007] [Accepted: 08/24/2007] [Indexed: 11/30/2022]
Abstract
The number of available genomic sequences is growing very fast, due to the development of massive sequencing techniques. Sequence identification is needed and contributes to the assessment of gene and species evolutionary relationships. Automated bioinformatics tools are thus necessary to carry out these identification operations in an accurate and fast way. We developed HoSeqI (Homologous Sequence Identification), a software environment allowing this kind of automated sequence identification using homologous gene family databases. HoSeqI is accessible through a Web interface (http://pbil.univ-lyon1.fr/software/HoSeqI/) allowing to identify one or several sequences and to visualize resulting alignments and phylogenetic trees. We also implemented another application, MultiHoSeqI, to quickly add a large set of sequences to a family database in order to identify them, to update the database, or to help automatic genome annotation. Lately, we developed an application, ChiSeqI (Chimeric Sequence Identification), to automate the processes of identification of bacterial 16S ribosomal RNA sequences and of detection of chimeric sequences.
Collapse
|
30
|
Abstract
The ACNUC biological sequence database system provides powerful and fast query and extraction capabilities to a variety of nucleotide and protein sequence databases. The collection of ACNUC databases served by the Pôle Bio-Informatique Lyonnais includes the EMBL, GenBank, RefSeq and UniProt nucleotide and protein sequence databases and a series of other sequence databases that support comparative genomics analyses: HOVERGEN and HOGENOM containing families of homologous protein-coding genes from vertebrate and prokaryotic genomes, respectively; Ensembl and Genome Reviews for analyses of prokaryotic and of selected eukaryotic genomes. This report describes the main features of the ACNUC system and the access to ACNUC databases from any internet-connected computer. Such access was made possible by the definition of a remote ACNUC access protocol and the implementation of Application Programming Interfaces between the C, Python and R languages and this communication protocol. Two retrieval programs for ACNUC databases, Query_win, with a graphical user interface and raa_query, with a command line interface, are also described. Altogether, these bioinformatics tools provide users with either ready-to-use means of querying remote sequence databases through a variety of selection criteria, or a simple way to endow application programs with an extensive access to these databases. Remote access to ACNUC databases is open to all and fully documented (http://pbil.univ-lyon1.fr/databases/acnuc/acnuc.html).
Collapse
Affiliation(s)
- Manolo Gouy
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, 69622 Villeurbanne Cedex, France.
| | | |
Collapse
|
31
|
Abstract
Recent advances in heuristics have made maximum likelihood phylogenetic tree estimation tractable for hundreds of sequences. Noticeably, these algorithms are currently limited to reversible models of evolution, in which Felsenstein's pulley principle applies. In this paper we show that by reorganizing the way likelihood is computed, one can efficiently compute the likelihood of a tree from any of its nodes with a nonreversible model of DNA sequence evolution, and hence benefit from cutting-edge heuristics. This computational trick can be used with reversible models of evolution without any extra cost. We then introduce nhPhyML, the adaptation of the nonhomogeneous nonstationary model of Galtier and Gouy (1998; Mol. Biol. Evol. 15:871-879) to the structure of PhyML, as well as an approximation of the model in which the set of equilibrium frequencies is limited. This new version shows good results both in terms of exploration of the space of tree topologies and ancestral G+C content estimation. We eventually apply it to rRNA sequences slowly evolving sites and conclude that the model and a wider taxonomic sampling still do not plead for a hyperthermophilic last universal common ancestor.
Collapse
Affiliation(s)
- Bastien Boussau
- Laboratoire de Biométrie et Biologie Evolutive (UMR 5558); CNRS, Université Lyon 1, Villeurbanne Cedex, France.
| | | |
Collapse
|
32
|
Abstract
UNLABELLED We present a web service allowing to automatically assign sequences to homologous gene families from a set of databases. After identification of the most similar gene family to the query sequence, this sequence is added to the whole alignment and the phylogenetic tree of the family is rebuilt. Thus, the phylogenetic position of the query sequence in its gene family can be easily identified. AVAILABILITY http://pbil.univ-lyon1.fr/software/HoSeqI/.
Collapse
Affiliation(s)
- Anne-Muriel Arigon
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Claude-Bernard, Lyon 1, 43 boulevard du 11 Novembre 1918, 69622 Villeurbanne Cedex, France.
| | | | | |
Collapse
|
33
|
Lefébure T, Douady CJ, Gouy M, Gibert J. Relationship between morphological taxonomy and molecular divergence within Crustacea: proposal of a molecular threshold to help species delimitation. Mol Phylogenet Evol 2006; 40:435-47. [PMID: 16647275 DOI: 10.1016/j.ympev.2006.03.014] [Citation(s) in RCA: 252] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2005] [Revised: 03/07/2006] [Accepted: 03/08/2006] [Indexed: 11/29/2022]
Abstract
With today's technology for production of molecular sequences, DNA taxonomy and barcoding arose as a new tool for evolutionary biology and ecology. However, their validities still need to be empirically evaluated. Of most importance is the strength of the correlation between morphological taxonomy and molecular divergence and the possibility to define some molecular thresholds. Here, we report measurements of this correlation for two mitochondrial genes (COI and 16S rRNA) within the sub-phylum Crustacea. Perl scripts were developed to ensure objectivity, reproducibility, and exhaustiveness of our tests. Our analysis reveals a general correlation between molecular divergence and taxonomy. This correlation is particularly high for shallow taxonomic levels allowing us to propose a COI universal crustacean threshold to help species delimitation. At higher taxonomic levels this correlation decreases, particularly when comparing different families. Those results plead for DNA use in taxonomy and suggest an operational method to help crustacean species delimitation that is linked to the phylogenetic species definition. This pragmatic tool is expected to fine tune the present classification, and not, as some would have believed, to tear it apart.
Collapse
Affiliation(s)
- T Lefébure
- Laboratoire d'Ecologie des Hydrosystèmes Fluviaux, UMR-CNRS 5023, Université Claude Bernard Lyon 1, F-69622 Villeurbanne Cedex, France.
| | | | | | | |
Collapse
|
34
|
Lefébure T, Douady CJ, Gouy M, Trontelj P, Briolay J, Gibert J. Phylogeography of a subterranean amphipod reveals cryptic diversity and dynamic evolution in extreme environments. Mol Ecol 2006; 15:1797-806. [PMID: 16689899 DOI: 10.1111/j.1365-294x.2006.02888.x] [Citation(s) in RCA: 122] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Extreme conditions in subsurface are suspected to be responsible for morphological convergences, and so to bias biodiversity assessment. Subterranean organisms are also considered as having poor dispersal abilities that in turn generate a large number of endemic species when habitat is fragmented. Here we test these general hypotheses using the subterranean amphipod Niphargus virei. All our phylogenetic analyses (Bayesian, maximum likelihood and distance), based on two independent genes (28S and COI), revealed the same tripartite structure. N. virei populations from Benelux, Jura region and the rest of France appeared as independent evolutionary units. Molecular rates estimated via global or Bayesian relaxed clock suggest that this split is at least 13 million years old and accredit the cryptic diversity hypothesis. Moreover, the geographical distribution of these lineages showed some evidence of recent dispersal through apparent vicariant barrier. In consequence, we argue that future analyses of evolution and biogeography in subsurface, or more generally in extreme environments, should consider dispersal ability as an evolving trait and morphology as a potentially biased marker.
Collapse
Affiliation(s)
- T Lefébure
- Laboratoire d'Ecologie des Hydrosystèmes Fluviaux, UMR-CNRS 5023, Université Claude Bernard Lyon I. F. 69622 Villeurbanne Cedex, France.
| | | | | | | | | | | |
Collapse
|
35
|
Grassot J, Gouy M, Perrière G, Mouchiroud G. Origin and Molecular Evolution of Receptor Tyrosine Kinases with Immunoglobulin-Like Domains. Mol Biol Evol 2006; 23:1232-41. [PMID: 16551648 DOI: 10.1093/molbev/msk007] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Receptor tyrosine kinases (RTKs) are involved in the control of fundamental cellular processes in metazoans. In vertebrates, RTK could be grouped in distinct classes based on the nature of their cognate ligand and modular composition of their extracellular domain. RTK with immunoglobulin-like domains (IG-like RTK) encompass several RTK classes and have been found in early metazoans, including sponges. Evolution of IG-like RTK is characterized by extended molecular and functional diversification, which prompted us to study their evolutionary history. For that purpose, a nonredundant data set including annotated protein sequences of IG-like RTK (n = 85) was built, representing 19 species ranging from sponges to humans. Phylogenetic trees were generated from alignment of conserved regions using maximum likelihood approach. Molecular phylogeny strongly suggests that IG-like RTK diversification occurred according to a complex scenario. In particular, we propose that specific cis duplications of a common ancestor to both platelet-derived growth factor receptor (class III) and vascular endothelial growth factor receptor (class V) families preceded two trans duplications. In contrast, other IG-like RTK genes, like Musk and PTK7, apparently did not evolve by duplications, whereas fibroblast growth factor receptors (class IV) evolved through two rounds of trans duplications. The proposed model of IG-like RTK evolution is supported by high bootstrap values and by the clustering of genes encoding class III and class V RTKs at specific chromosomal locations in mouse and human genomes.
Collapse
Affiliation(s)
- Julien Grassot
- Centre de Génétique Moléculaire et Cellulaire, UMR Centre National de la Recherche Scientifique 5534, Université Claude Bernard-Lyon 1, Villeurbanne, France
| | | | | | | |
Collapse
|
36
|
Aouacheria A, Navratil V, Wen W, Jiang M, Mouchiroud D, Gautier C, Gouy M, Zhang M. In silico whole-genome scanning of cancer-associated nonsynonymous SNPs and molecular characterization of a dynein light chain tumour variant. Oncogene 2005; 24:6133-42. [PMID: 15897869 DOI: 10.1038/sj.onc.1208745] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Last decade has led to the accumulation of large amounts of data on cancer genetics, opening an unprecedented access to the mapping of cancer genes in the human genome. Single-nucleotide polymorphisms (SNPs), the most common form of DNA variation in humans, emerge as an invaluable tool for cancer association studies. These genotypic markers can be used to assay how alleles of candidate genes correlate with the malignant phenotype, and may provide new clues into the genetic modifications that characterize cancer onset. In this cancer-oriented study, we detail an SNP mining strategy based on the analysis of expressed sequence tags among publicly available databases. Our whole-genome approach provides a comprehensive and unbiased description of nonsynonymous SNPs (nsSNPs) in tumoral versus normal tissues. To gain further insights into the possible relationships between genetic variation and altered phenotype, locations of a subset of nsSNPs were mapped onto protein domains known to be critical for protein function. Computational methods were also used to predict the potential impact of these cancer-associated nsSNPs on protein structure and function. We illustrate our approach through the detailed biochemical and structural characterization of a previously unknown cancer-associated mutation (G79C) affecting the 8 kDa dynein light chain (DNCL1).
Collapse
Affiliation(s)
- Abdel Aouacheria
- Laboratoire de Biométric et Biologie Evolutive, CNRS UMR 5558, Université Claude Bernard Lyon 1, F-69622 Villeurbanne Cedex, France.
| | | | | | | | | | | | | | | |
Collapse
|
37
|
Calteau A, Gouy M, Perrière G. Horizontal transfer of two operons coding for hydrogenases between bacteria and archaea. J Mol Evol 2005; 60:557-65. [PMID: 15983865 DOI: 10.1007/s00239-004-0094-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2004] [Accepted: 11/19/2004] [Indexed: 11/27/2022]
Abstract
Using a phylogenetic approach, we discovered three putative horizontal transfers between bacterial and archaeal species involving large clusters of genes. One transfer involves an operon of 13 genes, called mbx, which probably was transferred into the genome of Thermotoga maritima from a species belonging or close to the Pyrococcus genus. The two others implied an operon of six genes, called ech, transferred independently to the genomes of Thermoanaerobacter tengcongensis and Desulfovibrio gigas, from a species belonging or close to the Methanosarcina genus. All these transfers affected operons coding for multisubunit membrane-bound (NiFe) hydrogenases involved in the energy metabolism of the donor genomes. The functionality of the transferred operons has not been experimentally demonstrated for T. maritima, whereas in D. gigas and T. tengcongensis the encoded multisubunit hydrogenase could have a role in energy conservation. This report adds several cases of horizontal gene transfers among hydrogenases already described.
Collapse
Affiliation(s)
- Alexandra Calteau
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Claude Bernard--Lyon 1, Villeurbanne, France
| | | | | |
Collapse
|
38
|
Aouacheria A, Brunet F, Gouy M. Phylogenomics of Life-Or-Death Switches in Multicellular Animals: Bcl-2, BH3-Only, and BNip Families of Apoptotic Regulators. Mol Biol Evol 2005; 22:2395-416. [PMID: 16093567 DOI: 10.1093/molbev/msi234] [Citation(s) in RCA: 94] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
In this report, we conducted a comprehensive survey of Bcl-2 family members, a divergent group of proteins that regulate programmed cell death by an evolutionarily conserved mechanism. Using comparative sequence analysis, we found novel sequences in mammals, nonmammalian vertebrates, and in a number of invertebrates. We then asked what conclusions could be drawn from phyletic distribution, intron/exon structures, sequence/structure relationships, and phylogenetic analyses within the updated Bcl-2 family. First, multidomain members having a sequence pattern consistent with the conservation of the Bcl-X(L)/Bax/Bid topology appear to be restricted to multicellular animals and may share a common ancestry. Next, BNip proteins, which were originally identified based on their ability to bind to E1B 19K/Bcl-2 proteins, form three independent monophyletic branches with different evolutionary history. Lastly, a set of Bcl-2 homology 3-only proteins with unrelated secondary structures seems to have evolved after the origin of Metazoa and exhibits diverse expansion after speciation during vertebrate evolution.
Collapse
Affiliation(s)
- Abdel Aouacheria
- Laboratoire de Biométrie et Biologie Evolutive, Université Claude Bernard Lyon 1, 69622 Villeurbanne Cedex, France.
| | | | | |
Collapse
|
39
|
Thomarat F, Vivarès CP, Gouy M. Phylogenetic analysis of the complete genome sequence of Encephalitozoon cuniculi supports the fungal origin of microsporidia and reveals a high frequency of fast-evolving genes. J Mol Evol 2005; 59:780-91. [PMID: 15599510 DOI: 10.1007/s00239-004-2673-0] [Citation(s) in RCA: 86] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2003] [Accepted: 06/29/2004] [Indexed: 10/26/2022]
Abstract
Microsporidia are unicellular eukaryotes living as obligate intracellular parasites. Lacking mitochondria, they were initially considered as having diverged before the endosymbiosis at the origin of mitochondria. That microsporidia were primitively amitochondriate was first questioned by the discovery of microsporidial sequences homologous to genes encoding mitochondrial proteins and then refuted by the identification of remnants of mitochondria in their cytoplasm. Various molecular phylogenies also cast doubt on the early divergence of microsporidia, these organisms forming a monophyletic group with or within the fungi. The 2001 proteins putatively encoded by the complete genome of Encephalitozoon cuniculi provided powerful data to test this hypothesis. Phylogenetic analysis of 99 proteins selected as adequate phylogenetic markers indicated that the E. cuniculi sequences having the lowest evolutionary rates preferentially clustered with fungal sequences or, more rarely, with both animal and fungal sequences. Because sequences with low evolutionary rates are less sensitive to the long-branch attraction artifact, we concluded that microsporidia are evolutionarily related to fungi. This analysis also allowed comparing the accuracy of several phylogenetic algorithms for a fast-evolving lineage with real rather than simulated sequences.
Collapse
Affiliation(s)
- Fabienne Thomarat
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Claude Bernard Lyon I, 43 boulevard du 11 Novembre 1918, 69622 Villeurbanne, France
| | | | | |
Collapse
|
40
|
Aurell H, Farge P, Meugnier H, Gouy M, Forey F, Lina G, Vandenesch F, Etienne J, Jarraud S. Clinical and environmental isolates of Legionella pneumophila serogroup 1 cannot be distinguished by sequence analysis of two surface protein genes and three housekeeping genes. Appl Environ Microbiol 2005; 71:282-9. [PMID: 15640199 PMCID: PMC544207 DOI: 10.1128/aem.71.1.282-289.2005] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
We used gene sequencing to determine whether clinical (sporadic, epidemic, and endemic) and environmental isolates of Legionella pneumophila serogroup (sg) 1 belong to specific lineages. A total of 178 clinical and environmental L. pneumophila sg 1 isolates, defined by pulsed-field gel electrophoresis and epidemiological data as sporadic, epidemic, or endemic, were analyzed for polymorphisms in five gene fragments. The fragments belonged to three housekeeping genes (coding for aconitase [acn], aspartate-beta-semialdehyde dehydrogenase [asd], and RNA polymerase beta subunit [rpoB]) and two surface protein genes (coding for the macrophage infectivity potentiator [mip] and the major outer membrane protein [mompS]). The phylogenetic tree inferred from sequence polymorphisms of the five genes identified two large clusters, one consisting of 133 poorly differentiated strains and containing two smaller clusters (10 and 2 strains) unrelated to each other and the other consisting of 42 strains. Clinical and environmental isolates could not be distinguished on this basis, and no link between genetic background and epidemiological type was found, suggesting that other factors are responsible for differences in pathogenicity.
Collapse
Affiliation(s)
- Helena Aurell
- Centre National de Référence des Legionella, INSERM E-0230, Laboratoire de Bactériologie, Faculté de Médecine Laennec IFR 62, 7 rue Guillaume Paradin, 69372 Lyon cedex 08, France
| | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Dufayard JF, Duret L, Penel S, Gouy M, Rechenmann F, Perrière G. Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases. Bioinformatics 2005; 21:2596-603. [PMID: 15713731 DOI: 10.1093/bioinformatics/bti325] [Citation(s) in RCA: 132] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Comparative sequence analysis is widely used to study genome function and evolution. This approach first requires the identification of homologous genes and then the interpretation of their homology relationships (orthology or paralogy). To provide help in this complex task, we developed three databases of homologous genes containing sequences, multiple alignments and phylogenetic trees: HOBACGEN, HOVERGEN and HOGENOM. In this paper, we present two new tools for automating the search for orthologs or paralogs in these databases. RESULTS First, we have developed and implemented an algorithm to infer speciation and duplication events by comparison of gene and species trees (tree reconciliation). Second, we have developed a general method to search in our databases the gene families for which the tree topology matches a peculiar tree pattern. This algorithm of unordered tree pattern matching has been implemented in the FamFetch graphical interface. With the help of a graphical editor, the user can specify the topology of the tree pattern, and set constraints on its nodes and leaves. Then, this pattern is compared with all the phylogenetic trees of the database, to retrieve the families in which one or several occurrences of this pattern are found. By specifying ad hoc patterns, it is therefore possible to identify orthologs in our databases.
Collapse
|
42
|
Huguet V, Gouy M, Normand P, Zimpfer JF, Fernandez MP. Molecular phylogeny of Myricaceae: a reexamination of host-symbiont specificity. Mol Phylogenet Evol 2005; 34:557-68. [PMID: 15683929 DOI: 10.1016/j.ympev.2004.11.018] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2004] [Revised: 08/06/2004] [Accepted: 11/16/2004] [Indexed: 11/18/2022]
Abstract
The phylogeny of 13 species of Myricaceae, the most ancient actinorhizal family involved in a nitrogen-fixing symbiosis with the actinomycete Frankia, was established by the analysis of their rbcL gene and 18S-26S ITS. The phylogenetic position of those species was then compared to their specificity of association with Frankia in their natural habitat and to their nodulation potential determined on greenhouse-grown seedlings. The results showed that Genus Myrica, including M. gale and M. hartwegii, and Genus Comptonia, including C. peregrina, belong to a phylogenetic cluster distinct from the other Myrica species transferred in a new genus, Morella. This grouping parallels the natural specificity of each cluster with Comptonia-Myrica and Morella being nodulated by two phylogenetically divergent clusters of Frankia strains, the Alnus and Elaeagnaceae-infective strains clusters, respectively. Under laboratory conditions, Comptonia and Morella had a nodulation potential larger than under natural conditions. From this study it appears that the Myricaceae are split into two different specificity groups. It can be hypothesized that the early divergence of the genera led to the selection of genetically diverse Frankia strains which is contradictory to the earlier proposal that evolution has proceeded toward narrower promiscuity within the family.
Collapse
Affiliation(s)
- Valérie Huguet
- Ecologie Microbienne, UMR CNRS 5557, Université Claude Bernard Lyon 1, 69622 Villeurbanne Cedex, France
| | | | | | | | | |
Collapse
|
43
|
Aouacheria A, Cluzel C, Lethias C, Gouy M, Garrone R, Exposito JY. Invertebrate Data Predict an Early Emergence of Vertebrate Fibrillar Collagen Clades and an Anti-incest Model. J Biol Chem 2004; 279:47711-9. [PMID: 15358765 DOI: 10.1074/jbc.m408950200] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Fibrillar collagens are involved in the formation of striated fibrils and are present from the first multicellular animals, sponges, to humans. Recently, a new evolutionary model for fibrillar collagens has been suggested (Boot-Handford, R. P., Tuckwell, D. S., Plumb, D. A., Farrington Rock, C., and Poulsom, R. (2003) J. Biol. Chem. 278, 31067-31077). In this model, a rare genomic event leads to the formation of the founder vertebrate fibrillar collagen gene prior to the early vertebrate genome duplications and the radiation of the vertebrate fibrillar collagen clades (A, B, and C). Here, we present the modular structure of the fibrillar collagen chains present in different invertebrates from the protostome Anopheles gambiae to the chordate Ciona intestinalis. From their modular structure and the use of a triple helix instead of C-propeptide sequences in phylogenetic analyses, we were able to show that the divergence of A and B clades arose early during evolution because alpha chains related to these clades are present in protostomes. Moreover, the event leading to the divergence of B and C clades from a founder gene arose before the appearance of vertebrates; altogether these data contradict the Boot-Handford model. Moreover, they indicate that all the key steps required for the formation of fibrils of variable structure and functionality arose step by step during invertebrate evolution.
Collapse
Affiliation(s)
- Abdel Aouacheria
- Institut de Biologie et Chimie des Protéines, CNRS, Unité Mixte de Recherche 5086, Institut Fédératif de Recherche 128 BioSciences Lyon-Gerland, Université Claude Bernard-Lyon 1, 7 Passage du Vercors, 69367 Lyon Cedex 07, France
| | | | | | | | | | | |
Collapse
|
44
|
Le Roux F, Gay M, Lambert C, Nicolas JL, Gouy M, Berthe F. Phylogenetic study and identification of Vibrio splendidus-related strains based on gyrB gene sequences. Dis Aquat Organ 2004; 58:143-150. [PMID: 15109135 DOI: 10.3354/dao058143] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Different strains related to Vibrio splendidus have been associated with infection of aquatic animals. An epidemiological study of V. splendidus strains associated with Crassostrea gigas mortalities demonstrated genetic diversity within this group and suggested its polyphyletic nature. Recently 4 species, V. lentus, V. chagasii, V. pomeroyi and V. kanaloae, phenotypically related to V. splendidus, have been described, although biochemical methods do not clearly discriminate species within this group. Here, we propose a polyphasic approach to investigate their taxonomic relationships. Phylogenetic analysis of V. splendidus-related strains was carried out using the nucleotide sequences of 16S ribosomal DNA (16S rDNA) and gyrase B subunit (gyrB) genes. Species delineation based on 16S rDNA-sequencing is limited because of divergence between cistrons, roughly equivalent to divergence between strains. Despite a high level of sequence similarity, strains were separated into 2 clades. In the phylogenetic tree constructed on the basis of gyrB gene sequences, strains were separated into 5 independent clusters containing V. splendidus, V. lentus, V. chagasii-type strains and a putative new genomic species. This phylogenetic grouping was almost congruent with that based on DNA-DNA hybridisation analysis. V. pomeroyi, V. kanaloae and V. tasmaniensis-type strains clustered together in a fifth clade. The gyrB gene-sequencing approach is discussed as an alternative for investigating the taxonomy of Vibrio species.
Collapse
Affiliation(s)
- Fredérique Le Roux
- Laboratoire de Génétique et Pathologie, Institut français de recherche pour l'exploitation de la mer, 17390 La Tremblade, France.
| | | | | | | | | | | |
Collapse
|
45
|
Perrière G, Combet C, Penel S, Blanchet C, Thioulouse J, Geourjon C, Grassot J, Charavay C, Gouy M, Duret L, Deléage G. Integrated databanks access and sequence/structure analysis services at the PBIL. Nucleic Acids Res 2003; 31:3393-9. [PMID: 12824334 PMCID: PMC168937 DOI: 10.1093/nar/gkg530] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The World Wide Web server of the PBIL (Pôle Bioinformatique Lyonnais) provides on-line access to sequence databanks and to many tools of nucleic acid and protein sequence analyses. This server allows to query nucleotide sequence banks in the EMBL and GenBank formats and protein sequence banks in the SWISS-PROT and PIR formats. The query engine on which our data bank access is based is the ACNUC system. It allows the possibility to build complex queries to access functional zones of biological interest and to retrieve large sequence sets. Of special interest are the unique features provided by this system to query the data banks of gene families developed at the PBIL. The server also provides access to a wide range of sequence analysis methods: similarity search programs, multiple alignments, protein structure prediction and multivariate statistics. An originality of this server is the integration of these two aspects: sequence retrieval and sequence analysis. Indeed, thanks to the introduction of re-usable lists, it is possible to perform treatments on large sets of data. The PBIL server can be reached at: http://pbil.univ-lyon1.fr.
Collapse
Affiliation(s)
- Guy Perrière
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS no. 5558, Université Claude Bernard, Lyon 1, 43 bd du 11 Novembre 1918, 69622 Villeurbanne Cedex, France.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Abstract
The DNA sequences of the 11 linear chromosomes of the approximately 2.9 Mbp genome of Encephalitozoon cuniculi, an obligate intracellular parasite of mammals, include approximately 2000 putative protein-coding genes. The compactness of this genome is associated with the length reduction of various genes. Essential functions are dependent on a minimal set of genes. Phylogenetic analysis supports the hypotheses that microsporidia are related to fungi and have retained a mitochondrion-derived organelle, the mitosome.
Collapse
Affiliation(s)
- Christian P Vivarès
- Parasitologie Moléculaire et Cellulaire, LBP - UMR CNRS 6023, Université Blaise Pascal, 63117 Aubiere Cedex, France
| | | | | | | |
Collapse
|
47
|
Daubin V, Gouy M, Perrière G. Bacterial molecular phylogeny using supertree approach. Genome Inform 2002; 12:155-64. [PMID: 11791234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
Abstract
It has been claimed that complete genome sequences would clarify phylogenetic relationships between organisms but, up to now, no satisfying approach has been proposed to use efficiently these data. For instance, if the coding of presence or absence of genes in complete genomes gives interesting results, it does not take into account the phylogenetic information contained in sequences and ignores hidden paralogy by using a similarity-based definition of orthology. Also, concatenation of sequences of different genes takes hardly in consideration the specific evolutionary rate of each gene. At last, building a consensus tree is strongly limited by the low number of genes shared among all organisms. Here, we use a new method based on supertree construction, which permits to cumulate in one supertree the information and statistical support of hundreds of trees from orthologous gene families and to build the phylogeny of 33 prokaryotes and four eukaryotes with completely sequenced genomes. This approach gives a robust supertree, which demonstrates that a phylogeny of prokaryotic species is conceivable and challenges the hypothesis of a thermophilic origin of bacteria and present-day life. The results are compatible with the hypothesis of a core of genes for which lateral transfers are rare but they raise doubts on the widely admitted "complexity hypothesis" which predicts that this core is mainly implicated in informational processes.
Collapse
Affiliation(s)
- V Daubin
- Laboratoire de Biometrie et Biologie Eolutive, UMR CNRS 5558, Universite Claude Bernard - Lyon 1, 43 bd. du 11 Novembre 1918, 69622 Villeurbanne Cedex, France.
| | | | | |
Collapse
|
48
|
Abstract
This paper presents an algorithm, DCFold, that automatically predicts the common secondary structure of a set of aligned homologous RNA sequences. It is based on the comparative approach. Helices are searched in one of the sequences, called the 'target sequence', and compared to the helices in the other sequences, called the 'test sequences'. Our algorithm searches in the target sequence for palindromes that have a high probability to define helices that are conserved in the test sequences. This selection of significant palindromes is based on criteria that take into account their length and their mutation rate. A recursive search of helices, starting from these likely ones, is implemented using the 'divide and conquer' approach. Indeed, as pseudo-knots are not searched by DCFold, a selected palindrome (p, p') makes possible to divide the initial sequence into two sequences, the internal one and the one resulting from the concatenation of the two external ones. New palindromes can be searched independently in these subsequences. This algorithm was run on ribosomal RNA sequences and recovered very efficiently their common secondary structures.
Collapse
Affiliation(s)
- Fariza Tahi
- Laboratoire La.M.I.-UMR 8042, CNRS/Université Val-d'Essonne, Genopole, Evry, France.
| | | | | |
Collapse
|
49
|
Abstract
It has been claimed that complete genome sequences would clarify phylogenetic relationships between organisms, but up to now, no satisfying approach has been proposed to use efficiently these data. For instance, if the coding of presence or absence of genes in complete genomes gives interesting results, it does not take into account the phylogenetic information contained in sequences and ignores hidden paralogies by using a BLAST reciprocal best hit definition of orthology. In addition, concatenation of sequences of different genes as well as building of consensus trees only consider the few genes that are shared among all organisms. Here we present an attempt to use a supertree method to build the phylogenetic tree of 45 organisms, with special focus on bacterial phylogeny. This led us to perform a phylogenetic study of congruence of tree topologies, which allows the identification of a core of genes supporting similar species phylogeny. We then used this core of genes to infer a tree. This phylogeny presents several differences with the rRNA phylogeny, notably for the position of hyperthermophilic bacteria.
Collapse
Affiliation(s)
- Vincent Daubin
- Laboratoire de Biométrie et Biologie Evolutive, Unité Mixte de Recherche Centre National de la Recherche Scientifique, Université Claude Bernard - Lyon 1, 69622 Villeurbanne Cedex, France
| | | | | |
Collapse
|
50
|
Abstract
We analyzed the distribution of 54 families of transposable elements (TEs; transposons, LTR retrotransposons, and non-LTR retrotransposons) in the chromosomes of Drosophila melanogaster, using data from the sequenced genome. The density of LTR and non-LTR retrotransposons (RNA-based elements) was high in regions with low recombination rates, but there was no clear tendency to parallel the recombination rate. However, the density of transposons (DNA-based elements) was significantly negatively correlated with recombination rate. The accumulation of TEs in regions of reduced recombination rate is compatible with selection acting against TEs, as selection is expected to be weaker in regions with lower recombination. The differences in the relationship between recombination rate and TE density that exist between chromosome arms suggest that TE distribution depends on specific characteristics of the chromosomes (chromatin structure, distribution of other sequences), the TEs themselves (transposition mechanism), and the species (reproductive system, effective population size, etc.), that have differing influences on the effect of natural selection acting against the TE insertions.
Collapse
Affiliation(s)
- Carène Rizzon
- Laboratoire de Biométrie et Biologie Evolutive, Unité Mixte de Recherche Centre National de la Recherche Scientifique 5558, Université Lyon 1, Cedex, France
| | | | | | | |
Collapse
|