1
|
Lemane T, Lezzoche N, Lecubin J, Pelletier E, Lescot M, Chikhi R, Peterlongo P. Indexing and real-time user-friendly queries in terabyte-sized complex genomic datasets with kmindex and ORA. Nat Comput Sci 2024; 4:104-109. [PMID: 38413777 DOI: 10.1038/s43588-024-00596-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 01/16/2024] [Indexed: 02/29/2024]
Abstract
Public sequencing databases contain vast amounts of biological information, yet they are largely underutilized as it is challenging to efficiently search them for any sequence(s) of interest. We present kmindex, an approach that can index thousands of metagenomes and perform sequence searches in a fraction of a second. The index construction is an order of magnitude faster than previous methods, while search times are two orders of magnitude faster. With negligible false positive rates below 0.01%, kmindex outperforms the precision of existing approaches by four orders of magnitude. Here we demonstrate the scalability of kmindex by successfully indexing 1,393 marine seawater metagenome samples from the Tara Oceans project. Additionally, we introduce the publicly accessible web server Ocean Read Atlas, which enables real-time queries on the Tara Oceans dataset.
Collapse
Affiliation(s)
- Téo Lemane
- Univ. Rennes, Inria, CNRS, IRISA - UMR 6074, Rennes, France.
- Génomique Métabolique, Genoscope, Institut de Biologie François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, Evry, France.
| | - Nolan Lezzoche
- Aix-Marseille Université, Université de Toulon, IRD, CNRS, Mediterranean Institute of Oceanography (MIO), UM 110, Marseille, France
| | | | - Eric Pelletier
- Génomique Métabolique, Genoscope, Institut de Biologie François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, Evry, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara Oceans GO-SEE, CNRS, Paris, France
| | - Magali Lescot
- Aix-Marseille Université, Université de Toulon, IRD, CNRS, Mediterranean Institute of Oceanography (MIO), UM 110, Marseille, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara Oceans GO-SEE, CNRS, Paris, France
| | - Rayan Chikhi
- Institut Pasteur, Université Paris Cité, G5 Sequence Bioinformatics, Paris, France
| | | |
Collapse
|
2
|
Vernette C, Lecubin J, Sánchez P, Sunagawa S, Delmont TO, Acinas SG, Pelletier E, Hingamp P, Lescot M. The Ocean Gene Atlas v2.0: online exploration of the biogeography and phylogeny of plankton genes. Nucleic Acids Res 2022; 50:W516-W526. [PMID: 35687095 PMCID: PMC9252727 DOI: 10.1093/nar/gkac420] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 04/27/2022] [Accepted: 05/11/2022] [Indexed: 11/13/2022] Open
Abstract
Testing hypothesis about the biogeography of genes using large data resources such as Tara Oceans marine metagenomes and metatranscriptomes requires significant hardware resources and programming skills. The new release of the ‘Ocean Gene Atlas’ (OGA2) is a freely available intuitive online service to mine large and complex marine environmental genomic databases. OGA2 datasets available have been extended and now include, from the Tara Oceans portfolio: (i) eukaryotic Metagenome-Assembled-Genomes (MAGs) and Single-cell Assembled Genomes (SAGs) (10.2E+6 coding genes), (ii) version 2 of Ocean Microbial Reference Gene Catalogue (46.8E+6 non-redundant genes), (iii) 924 MetaGenomic Transcriptomes (7E+6 unigenes), (iv) 530 MAGs from an Arctic MAG catalogue (1E+6 genes) and (v) 1888 Bacterial and Archaeal Genomes (4.5E+6 genes), and an additional dataset from the Malaspina 2010 global circumnavigation: (vi) 317 Malaspina Deep Metagenome Assembled Genomes (0.9E+6 genes). Novel analyses enabled by OGA2 include phylogenetic tree inference to visualize user queries within their context of sequence homologues from both the marine environmental dataset and the RefSeq database. An Application Programming Interface (API) now allows users to query OGA2 using command-line tools, hence providing local workflow integration. Finally, gene abundance can be interactively filtered directly on map displays using any of the available environmental variables. Ocean Gene Atlas v2.0 is freely-available at: https://tara-oceans.mio.osupytheas.fr/ocean-gene-atlas/.
Collapse
Affiliation(s)
- Caroline Vernette
- Aix-Marseille Université, Université de Toulon, IRD, CNRS, Mediterranean Institute of Oceanography (MIO) UM 110, Marseille, France.,Research Federation for the study of Global Ocean systems ecology and evolution, FR2022/Tara Oceans-GOSEE, Paris, France
| | | | - Pablo Sánchez
- Department of Marine Biology and Oceanography, Institute of Marine Sciences (ICM), CSIC, Barcelona, Spain
| | | | - Shinichi Sunagawa
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zurich, Zurich, Switzerland
| | - Tom O Delmont
- Research Federation for the study of Global Ocean systems ecology and evolution, FR2022/Tara Oceans-GOSEE, Paris, France.,Génomique Métabolique, Genoscope, Institut de Biologie François-Jacob, CEA, CNRS, Univ Evry, Univ Paris-Saclay, 91057 Evry, France
| | - Silvia G Acinas
- Department of Marine Biology and Oceanography, Institute of Marine Sciences (ICM), CSIC, Barcelona, Spain
| | - Eric Pelletier
- Research Federation for the study of Global Ocean systems ecology and evolution, FR2022/Tara Oceans-GOSEE, Paris, France.,Génomique Métabolique, Genoscope, Institut de Biologie François-Jacob, CEA, CNRS, Univ Evry, Univ Paris-Saclay, 91057 Evry, France
| | - Pascal Hingamp
- Aix-Marseille Université, Université de Toulon, IRD, CNRS, Mediterranean Institute of Oceanography (MIO) UM 110, Marseille, France
| | - Magali Lescot
- Aix-Marseille Université, Université de Toulon, IRD, CNRS, Mediterranean Institute of Oceanography (MIO) UM 110, Marseille, France.,Research Federation for the study of Global Ocean systems ecology and evolution, FR2022/Tara Oceans-GOSEE, Paris, France
| |
Collapse
|
3
|
Delmont TO, Gaia M, Hinsinger DD, Frémont P, Vanni C, Fernandez-Guerra A, Eren AM, Kourlaiev A, d'Agata L, Clayssen Q, Villar E, Labadie K, Cruaud C, Poulain J, Da Silva C, Wessner M, Noel B, Aury JM, de Vargas C, Bowler C, Karsenti E, Pelletier E, Wincker P, Jaillon O, Acinas SG, Bork P, Karsenti E, Bowler C, Sardet C, Stemmann L, de Vargas C, Wincker P, Lescot M, Babin M, Gorsky G, Grimsley N, Guidi L, Hingamp P, Jaillon O, Kandels S, Iudicone D, Ogata H, Pesant S, Sullivan MB, Not F, Lee KB, Boss E, Cochrane G, Follows M, Poulton N, Raes J, Sieracki M, Speich S. Functional repertoire convergence of distantly related eukaryotic plankton lineages abundant in the sunlit ocean. Cell Genom 2022; 2:100123. [PMID: 36778897 PMCID: PMC9903769 DOI: 10.1016/j.xgen.2022.100123] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 12/10/2021] [Accepted: 04/04/2022] [Indexed: 12/20/2022]
Abstract
Marine planktonic eukaryotes play critical roles in global biogeochemical cycles and climate. However, their poor representation in culture collections limits our understanding of the evolutionary history and genomic underpinnings of planktonic ecosystems. Here, we used 280 billion Tara Oceans metagenomic reads from polar, temperate, and tropical sunlit oceans to reconstruct and manually curate more than 700 abundant and widespread eukaryotic environmental genomes ranging from 10 Mbp to 1.3 Gbp. This genomic resource covers a wide range of poorly characterized eukaryotic lineages that complement long-standing contributions from culture collections while better representing plankton in the upper layer of the oceans. We performed the first, to our knowledge, comprehensive genome-wide functional classification of abundant unicellular eukaryotic plankton, revealing four major groups connecting distantly related lineages. Neither trophic modes of plankton nor its vertical evolutionary history could completely explain the functional repertoire convergence of major eukaryotic lineages that coexisted within oceanic currents for millions of years.
Collapse
Affiliation(s)
- Tom O. Delmont
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France,Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France,Corresponding author
| | - Morgan Gaia
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France,Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Damien D. Hinsinger
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France,Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Paul Frémont
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France,Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Chiara Vanni
- Microbial Genomics and Bioinformatics Research Group, Max Planck Institute for Marine Microbiology, Bremen, Germany
| | - Antonio Fernandez-Guerra
- Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| | - A. Murat Eren
- Helmholtz Institute for Functional Marine Biodiversity at Oldenburg, Germany
| | - Artem Kourlaiev
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France,Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Leo d'Agata
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France,Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Quentin Clayssen
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France,Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Emilie Villar
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France
| | - Karine Labadie
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France,Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Corinne Cruaud
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France,Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Julie Poulain
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France,Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Corinne Da Silva
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France,Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Marc Wessner
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France,Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Benjamin Noel
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France,Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Jean-Marc Aury
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France,Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Tara Oceans CoordinatorsSunagawaShinichi12AcinasSilvia G.13BorkPeer141516KarsentiEric171819BowlerChris1718SardetChristian1720StemmannLars1720de VargasColomban1721WinckerPatrick1722LescotMagali1723BabinMarcel1724GorskyGabriel1720GrimsleyNigel172526GuidiLionel1720HingampPascal1723JaillonOlivier1722KandelsStefanie1417IudiconeDaniele27OgataHiroyuki28PesantStéphane2930SullivanMatthew B.313233NotFabrice21LeeKarp-Boss34BossEmmanuel34CochraneGuy35FollowsMichael36PoultonNicole37RaesJeroen383940SierackiMike37SpeichSabrina4142Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, EtH Zürich, Zürich, SwitzerlandDepartment of Marine Biology and Oceanography, Institute of Marine Sciences–CsiC, Barcelona, SpainStructural and Computational Biology, European Molecular Biology Laboratory, Heidelberg, GermanyMax Delbrück Center for Molecular Medicine, Berlin, GermanyDepartment of Bioinformatics, Biocenter, University of Würzburg, Würzburg, GermanyResearch Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOsee, Paris, FranceInstitut de Biologie de l’ENS, Département de Biologie, École Normale Supérieure, CNRS, INSERM, Université PSL, Paris, FranceDirectors’ Research, European Molecular Biology Laboratory, Heidelberg, GermanySorbonne Université, CNRS, Laboratoire D’Océanographie de Villefranche, Villefranche- sur- Mer, FranceSorbonne Université and CNRS, UMR 7144 (AD2M), ECOMAP, Station Biologique de Roscoff, Roscoff, FranceGénomique Métabolique, Genoscope, Institut de Biologie Francois Jacob, Commissariat à l’Énergie Atomique, CNrs, Université Evry, Université Paris- Saclay, Evry, FranceAix Marseille Universit/e, Université de Toulon, CNRS, IRD, MIO UM 110, Marseille, FranceDépartement de Biologie, Québec Océan and Takuvik Joint International Laboratory (UMI 3376), Université Laval (Canada)–CNRS (France), Université Laval, Quebec, QC, CanadaCNRS UMR 7232, Biologie Intégrative des Organismes Marins, Banyuls- sur- Mer, FranceSorbonne Universités Paris 06, OOB UPMC, Banyuls- sur- Mer, FranceStazione Zoologica Anton Dohrn, Naples, ItalyInstitute for Chemical Research, Kyoto University, Kyoto, JapanPaNGaea, University of Bremen, Bremen, GermanyMaruM, Center for Marine Environmental Sciences, University of Bremen, Bremen, GermanyDepartment of Microbiology, The Ohio State University, Columbus, OH, USADepartment of Civil, Environmental and Geodetic Engineering, The Ohio State University, Columbus, OH, USACenter for RNA Biology, The Ohio State University, Columbus, OH, USASchool of Marine Sciences, University of Maine, Orono, ME, USAEuropean Molecular Biology Laboratory, European Bioinformatics Institute, Welcome Trust Genome Campus, Hinxton, Cambridge, UKDepartment of Earth, Atmospheric, and Planetary Sciences, Massachusetts Institute of Technology, Cambridge, MA, USABigelow Laboratory for Ocean Sciences, East Boothbay, ME, USADepartment of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, BelgiumCenter for the Biology of Disease, VIB KU Leuven, Leuven, BelgiumDepartment of Applied Biological Sciences, Vrije Universiteit Brussel, Brussels, BelgiumDepartment of Geosciences, Laboratoire de Météorologie Dynamique, École Normale Supérieure, Paris, FranceOcean Physics Laboratory, University of Western Brittany, Brest, France
| | - Colomban de Vargas
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France,Sorbonne Université and CNRS, UMR 7144 (AD2M), ECOMAP, Station Biologique de Roscoff, Roscoff, France
| | - Chris Bowler
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France,Institut de Biologie de l’ENS, Département de Biologie, École Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France
| | - Eric Karsenti
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France,Sorbonne Université and CNRS, UMR 7144 (AD2M), ECOMAP, Station Biologique de Roscoff, Roscoff, France,Directors’ Research, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Eric Pelletier
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France,Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Patrick Wincker
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France,Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | - Olivier Jaillon
- Génomique Métabolique, Genoscope, Institut François-Jacob, CEA, CNRS, Université d'Evry, Université Paris-Saclay, 91057 Evry, France,Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 75016 Paris, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Vernette C, Henry N, Lecubin J, de Vargas C, Hingamp P, Lescot M. The Ocean barcode atlas: A web service to explore the biodiversity and biogeography of marine organisms. Mol Ecol Resour 2021; 21:1347-1358. [PMID: 33434383 DOI: 10.1111/1755-0998.13322] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 12/09/2020] [Accepted: 01/05/2021] [Indexed: 01/04/2023]
Abstract
The Ocean Barcode Atlas (OBA) is a user friendly web service designed for biologists who wish to explore the biodiversity and biogeography of marine organisms locked in otherwise difficult to mine planetary scale DNA metabarcode data sets. Using just a web browser, a comprehensive picture of the diversity of a taxon or a barcode sequence is visualized graphically on world maps and interactive charts. Interactive results panels allow dynamic threshold adjustments and the display of diversity results in their environmental context measured at the time of sampling (temperature, oxygen, latitude, etc). Ecological analyses such as alpha and beta-diversity plots are produced via publication quality vector graphics representations. Currently, the Ocean Barcode Altas is deployed online with the (i) Tara Oceans eukaryotic 18S-V9 rDNA metabarcodes; (ii) Tara Oceans 16S/18S rRNA mi Tags; and (iii) 16S-V4 V5 metabarcodes collected during the Malaspina-2010 expedition. Additional prokaryotic or eukaryotic plankton barcode data sets will be added upon availability, given they provide the required complement of barcodes (including raw reads to compute barcode abundance) associated with their contextual environmental variables. Ocean Barcode Atlas is a freely-available web service at: http://oba.mio.osupytheas.fr/ocean-atlas/.
Collapse
Affiliation(s)
- Caroline Vernette
- Aix Marseille Université, Université de Toulon, IRD, CNRS, Mediterranean Institute of Oceanography (MIO) UM 110, Marseille, France.,Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara Oceans GOSEE, Paris, France
| | - Nicolas Henry
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara Oceans GOSEE, Paris, France.,Sorbonne Université, CNRS, Station Biologique de Roscoff, AD2M ECOMAP, UMR 7144, Roscoff, France
| | | | - Colomban de Vargas
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara Oceans GOSEE, Paris, France.,Sorbonne Université, CNRS, Station Biologique de Roscoff, AD2M ECOMAP, UMR 7144, Roscoff, France
| | - Pascal Hingamp
- Aix Marseille Université, Université de Toulon, IRD, CNRS, Mediterranean Institute of Oceanography (MIO) UM 110, Marseille, France.,Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara Oceans GOSEE, Paris, France
| | - Magali Lescot
- Aix Marseille Université, Université de Toulon, IRD, CNRS, Mediterranean Institute of Oceanography (MIO) UM 110, Marseille, France.,Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara Oceans GOSEE, Paris, France
| |
Collapse
|
5
|
Vannier T, Hingamp P, Turrel F, Tanet L, Lescot M, Timsit Y. Diversity and evolution of bacterial bioluminescence genes in the global ocean. NAR Genom Bioinform 2020; 2:lqaa018. [PMID: 33575578 PMCID: PMC7671414 DOI: 10.1093/nargab/lqaa018] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Revised: 02/14/2020] [Accepted: 03/06/2020] [Indexed: 12/19/2022] Open
Abstract
Although bioluminescent bacteria are the most abundant and widely distributed of all light-emitting organisms, the biological role and evolutionary history of bacterial luminescence are still shrouded in mystery. Bioluminescence has so far been observed in the genomes of three families of Gammaproteobacteria in the form of canonical lux operons that adopt the CDAB(F)E(G) gene order. LuxA and luxB encode the two subunits of bacterial luciferase responsible for light-emission. Our deep exploration of public marine environmental databases considerably expands this view by providing a catalog of new lux homolog sequences, including 401 previously unknown luciferase-related genes. It also reveals a broader diversity of the lux operon organization, which we observed in previously undescribed configurations such as CEDA, CAED and AxxCE. This expanded operon diversity provides clues for deciphering lux operon evolution and propagation within the bacterial domain. Leveraging quantitative tracking of marine bacterial genes afforded by planetary scale metagenomic sampling, our study also reveals that the novel lux genes and operons described herein are more abundant in the global ocean than the canonical CDAB(F)E(G) operon.
Collapse
Affiliation(s)
- Thomas Vannier
- Aix Marseille Univ, Université de Toulon, CNRS, IRD, MIO UM110, 13288 Marseille, France
- Research Federation for the study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 3 rue Michel-Ange, 75016 Paris, France
| | - Pascal Hingamp
- Aix Marseille Univ, Université de Toulon, CNRS, IRD, MIO UM110, 13288 Marseille, France
- Research Federation for the study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 3 rue Michel-Ange, 75016 Paris, France
| | - Floriane Turrel
- Aix Marseille Univ, Université de Toulon, CNRS, IRD, MIO UM110, 13288 Marseille, France
| | - Lisa Tanet
- Aix Marseille Univ, Université de Toulon, CNRS, IRD, MIO UM110, 13288 Marseille, France
| | - Magali Lescot
- Aix Marseille Univ, Université de Toulon, CNRS, IRD, MIO UM110, 13288 Marseille, France
- Research Federation for the study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 3 rue Michel-Ange, 75016 Paris, France
| | - Youri Timsit
- Aix Marseille Univ, Université de Toulon, CNRS, IRD, MIO UM110, 13288 Marseille, France
- Research Federation for the study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 3 rue Michel-Ange, 75016 Paris, France
| |
Collapse
|
6
|
Villar E, Vannier T, Vernette C, Lescot M, Cuenca M, Alexandre A, Bachelerie P, Rosnet T, Pelletier E, Sunagawa S, Hingamp P. The Ocean Gene Atlas: exploring the biogeography of plankton genes online. Nucleic Acids Res 2019; 46:W289-W295. [PMID: 29788376 PMCID: PMC6030836 DOI: 10.1093/nar/gky376] [Citation(s) in RCA: 80] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 05/02/2018] [Indexed: 12/27/2022] Open
Abstract
The Ocean Gene Atlas is a web service to explore the biogeography of genes from marine planktonic organisms. It allows users to query protein or nucleotide sequences against global ocean reference gene catalogs. With just one click, the abundance and location of target sequences are visualized on world maps as well as their taxonomic distribution. Interactive results panels allow for adjusting cutoffs for alignment quality and displaying the abundances of genes in the context of environmental features (temperature, nutrients, etc.) measured at the time of sampling. The ease of use enables non-bioinformaticians to explore quantitative and contextualized information on genes of interest in the global ocean ecosystem. Currently the Ocean Gene Atlas is deployed with (i) the Ocean Microbial Reference Gene Catalog (OM-RGC) comprising 40 million non-redundant mostly prokaryotic gene sequences associated with both Tara Oceans and Global Ocean Sampling (GOS) gene abundances and (ii) the Marine Atlas of Tara Ocean Unigenes (MATOU) composed of >116 million eukaryote unigenes. Additional datasets will be added upon availability of further marine environmental datasets that provide the required complement of sequence assemblies, raw reads and contextual environmental parameters. Ocean Gene Atlas is a freely-available web service at: http://tara-oceans.mio.osupytheas.fr/ocean-gene-atlas/.
Collapse
Affiliation(s)
- Emilie Villar
- Sorbonne Universités, UPMC Université Paris 06, CNRS, Laboratoire Adaptation et Diversité en Milieu Marin UMR7144, Station Biologique de Roscoff, Roscoff, France.,Aix Marseille Univ, Université de Toulon, CNRS, IRD, MIO UM 110, 13288, Marseille, France
| | - Thomas Vannier
- Aix Marseille Univ, Université de Toulon, CNRS, IRD, MIO UM 110, 13288, Marseille, France
| | - Caroline Vernette
- Aix Marseille Univ, Université de Toulon, CNRS, IRD, MIO UM 110, 13288, Marseille, France
| | - Magali Lescot
- Aix Marseille Univ, Université de Toulon, CNRS, IRD, MIO UM 110, 13288, Marseille, France
| | - Miguelangel Cuenca
- Department of Biology, Institute of Microbiology, ETH Zurich, Zurich, Switzerland
| | - Aurélien Alexandre
- Aix Marseille Univ, Université de Toulon, CNRS, IRD, MIO UM 110, 13288, Marseille, France
| | - Paul Bachelerie
- Aix Marseille Univ, Université de Toulon, CNRS, IRD, MIO UM 110, 13288, Marseille, France
| | - Thomas Rosnet
- Aix Marseille Univ, Université de Toulon, CNRS, IRD, MIO UM 110, 13288, Marseille, France
| | - Eric Pelletier
- Génomique Métabolique, Genoscope, Institut de Biologie François-Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91000 Evry, France
| | - Shinichi Sunagawa
- Department of Biology, Institute of Microbiology, ETH Zurich, Zurich, Switzerland
| | - Pascal Hingamp
- Aix Marseille Univ, Université de Toulon, CNRS, IRD, MIO UM 110, 13288, Marseille, France
| |
Collapse
|
7
|
von Dassow P, John U, Ogata H, Probert I, Bendif EM, Kegel JU, Audic S, Wincker P, Da Silva C, Claverie JM, Doney S, Glover DM, Flores DM, Herrera Y, Lescot M, Garet-Delmas MJ, de Vargas C. Life-cycle modification in open oceans accounts for genome variability in a cosmopolitan phytoplankton. ISME J 2014; 9:1365-77. [PMID: 25461969 PMCID: PMC4438323 DOI: 10.1038/ismej.2014.221] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Revised: 10/08/2014] [Accepted: 10/17/2014] [Indexed: 11/30/2022]
Abstract
Emiliania huxleyi is the most abundant calcifying plankton in modern oceans with substantial intraspecific genome variability and a biphasic life cycle involving sexual alternation between calcified 2N and flagellated 1N cells. We show that high genome content variability in Emiliania relates to erosion of 1N-specific genes and loss of the ability to form flagellated cells. Analysis of 185 E. huxleyi strains isolated from world oceans suggests that loss of flagella occurred independently in lineages inhabiting oligotrophic open oceans over short evolutionary timescales. This environmentally linked physiogenomic change suggests life cycling is not advantageous in very large/diluted populations experiencing low biotic pressure and low ecological variability. Gene loss did not appear to reflect pressure for genome streamlining in oligotrophic oceans as previously observed in picoplankton. Life-cycle modifications might be common in plankton and cause major functional variability to be hidden from traditional taxonomic or molecular markers.
Collapse
Affiliation(s)
- Peter von Dassow
- 1] Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile [2] UMI 3614, Evolutionary Biology and Ecology of Algae, CNRS, UPMC Sorbonne Universités, PUCCh, UACH, Station Biologique de Roscoff, Roscoff, France [3] Instituto Milenio de Oceanografía, Concepción, Chile [4] CNRS UMR 7144 and UMPC, Evolution of Pelagic Ecosystems and Protists (EPEP), CNRS, UPMC, Station Biologique de Roscoff, Roscoff, France
| | - Uwe John
- Alfred Wegener Institute Helmhotz Centre for Polar and Marine Research, Bremerhaven, Germany
| | - Hiroyuki Ogata
- 1] Institute for Chemical Research, Kyoto University, Kyoto, Japan [2] CNRS, Aix-Marseille Université, Laboratoire Information Génomique et Structurale (UMR 7256), Mediterranean Institute of Microbiology (FR 3479), Marseille, France
| | - Ian Probert
- CNRS-UMPC, FR2424, Roscoff Culture Collection, Station Biologique de Roscoff, Roscoff, France
| | - El Mahdi Bendif
- Marine Biological Association of the UK, The Laboratory, Citadel Hill, Plymouth, UK
| | - Jessica U Kegel
- Alfred Wegener Institute Helmhotz Centre for Polar and Marine Research, Bremerhaven, Germany
| | - Stéphane Audic
- CNRS UMR 7144 and UMPC, Evolution of Pelagic Ecosystems and Protists (EPEP), CNRS, UPMC, Station Biologique de Roscoff, Roscoff, France
| | | | | | - Jean-Michel Claverie
- CNRS, Aix-Marseille Université, Laboratoire Information Génomique et Structurale (UMR 7256), Mediterranean Institute of Microbiology (FR 3479), Marseille, France
| | - Scott Doney
- Marine Chemistry and Geochemistry Department, Woods Hole Oceanographic Institution, Woods Hole, MA, USA
| | - David M Glover
- Marine Chemistry and Geochemistry Department, Woods Hole Oceanographic Institution, Woods Hole, MA, USA
| | - Daniella Mella Flores
- 1] Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile [2] UMI 3614, Evolutionary Biology and Ecology of Algae, CNRS, UPMC Sorbonne Universités, PUCCh, UACH, Station Biologique de Roscoff, Roscoff, France
| | - Yeritza Herrera
- Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Magali Lescot
- CNRS, Aix-Marseille Université, Laboratoire Information Génomique et Structurale (UMR 7256), Mediterranean Institute of Microbiology (FR 3479), Marseille, France
| | - Marie-José Garet-Delmas
- CNRS UMR 7144 and UMPC, Evolution of Pelagic Ecosystems and Protists (EPEP), CNRS, UPMC, Station Biologique de Roscoff, Roscoff, France
| | - Colomban de Vargas
- CNRS UMR 7144 and UMPC, Evolution of Pelagic Ecosystems and Protists (EPEP), CNRS, UPMC, Station Biologique de Roscoff, Roscoff, France
| |
Collapse
|
8
|
Philippe N, Legendre M, Doutre G, Couté Y, Poirot O, Lescot M, Arslan D, Seltzer V, Bertaux L, Bruley C, Garin J, Claverie JM, Abergel C. Pandoraviruses: amoeba viruses with genomes up to 2.5 Mb reaching that of parasitic eukaryotes. Science 2013; 341:281-6. [PMID: 23869018 DOI: 10.1126/science.1239181] [Citation(s) in RCA: 379] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Ten years ago, the discovery of Mimivirus, a virus infecting Acanthamoeba, initiated a reappraisal of the upper limits of the viral world, both in terms of particle size (>0.7 micrometers) and genome complexity (>1000 genes), dimensions typical of parasitic bacteria. The diversity of these giant viruses (the Megaviridae) was assessed by sampling a variety of aquatic environments and their associated sediments worldwide. We report the isolation of two giant viruses, one off the coast of central Chile, the other from a freshwater pond near Melbourne (Australia), without morphological or genomic resemblance to any previously defined virus families. Their micrometer-sized ovoid particles contain DNA genomes of at least 2.5 and 1.9 megabases, respectively. These viruses are the first members of the proposed "Pandoravirus" genus, a term reflecting their lack of similarity with previously described microorganisms and the surprises expected from their future study.
Collapse
Affiliation(s)
- Nadège Philippe
- Structural and Genomic Information Laboratory, UMR 7256 CNRS Aix-Marseille Université, 163 Avenue de Luminy, Case 934, 13288 Marseille cedex 9, France
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Audic S, Lescot M, Claverie JM, Cloeckaert A, Zygmunt MS. The genome sequence of Brucella pinnipedialis B2/94 sheds light on the evolutionary history of the genus Brucella. BMC Evol Biol 2011; 11:200. [PMID: 21745361 PMCID: PMC3146883 DOI: 10.1186/1471-2148-11-200] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2011] [Accepted: 07/11/2011] [Indexed: 11/25/2022] Open
Abstract
Background Since the discovery of the Malta fever agent, Brucella melitensis, in the 19th century, six terrestrial mammal-associated Brucella species were recognized over the next century. More recently the number of novel Brucella species has increased and among them, isolation of species B. pinnipedialis and B. ceti from marine mammals raised many questions about their origin as well as on the evolutionary history of the whole genus. Results We report here on the first complete genome sequence of a Brucella strain isolated from marine mammals, Brucella pinnipedialis strain B2/94. A whole gene-based phylogenetic analysis shows that five main groups of host-associated Brucella species rapidly diverged from a likely free-living ancestor close to the recently isolated B. microti. However, this tree lacks the resolution required to resolve the order of divergence of those groups. Comparative analyses focusing on a) genome segments unshared between B. microti and B. pinnipedialis, b) gene deletion/fusion events and c) positions and numbers of Brucella specific IS711 elements in the available Brucella genomes provided enough information to propose a branching order for those five groups. Conclusions In this study, it appears that the closest relatives of marine mammal Brucella sp. are B. ovis and Brucella sp. NVSL 07-0026 isolated from a baboon, followed by B. melitensis and B. abortus strains, and finally the group consisting of B. suis strains, including B. canis and the group consisting of the single B. neotomae species. We were not able, however, to resolve the order of divergence of the two latter groups.
Collapse
Affiliation(s)
- Stéphane Audic
- Laboratoire Information Génomique et Structurale, CNRS-UPR2589, Aix-Marseille University, Institut de Microbiologie de la Méditerranée, IFR-88, Parc Scientifique de Luminy-163 Avenue de Luminy-Case 934-FR-13288, Marseille cedex 09, France.
| | | | | | | | | |
Collapse
|
10
|
Legendre M, Audic S, Poirot O, Hingamp P, Seltzer V, Byrne D, Lartigue A, Lescot M, Bernadac A, Poulain J, Abergel C, Claverie JM. mRNA deep sequencing reveals 75 new genes and a complex transcriptional landscape in Mimivirus. Genome Res 2010; 20:664-74. [PMID: 20360389 DOI: 10.1101/gr.102582.109] [Citation(s) in RCA: 100] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Mimivirus, a virus infecting Acanthamoeba, is the prototype of the Mimiviridae, the latest addition to the nucleocytoplasmic large DNA viruses. The Mimivirus genome encodes close to 1000 proteins, many of them never before encountered in a virus, such as four amino-acyl tRNA synthetases. To explore the physiology of this exceptional virus and identify the genes involved in the building of its characteristic intracytoplasmic "virion factory," we coupled electron microscopy observations with the massively parallel pyrosequencing of the polyadenylated RNA fractions of Acanthamoeba castellanii cells at various time post-infection. We generated 633,346 reads, of which 322,904 correspond to Mimivirus transcripts. This first application of deep mRNA sequencing (454 Life Sciences [Roche] FLX) to a large DNA virus allowed the precise delineation of the 5' and 3' extremities of Mimivirus mRNAs and revealed 75 new transcripts including several noncoding RNAs. Mimivirus genes are expressed across a wide dynamic range, in a finely regulated manner broadly described by three main temporal classes: early, intermediate, and late. This RNA-seq study confirmed the AAAATTGA sequence as an early promoter element, as well as the presence of palindromes at most of the polyadenylation sites. It also revealed a new promoter element correlating with late gene expression, which is also prominent in Sputnik, the recently described Mimivirus "virophage." These results-validated genome-wide by the hybridization of total RNA extracted from infected Acanthamoeba cells on a tiling array (Agilent)--will constitute the foundation on which to build subsequent functional studies of the Mimivirus/Acanthamoeba system.
Collapse
Affiliation(s)
- Matthieu Legendre
- Structural & Genomic Information Laboratory, Centre National de la Recherche Scientifique, UPR2589, Mediterranean Institute of Microbiology IFR88, Aix-Marseille University, Parc Scientifique de Luminy, FR-13288 Marseille, France
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Lescot M, Audic S, Robert C, Nguyen TT, Blanc G, Cutler SJ, Wincker P, Couloux A, Claverie JM, Raoult D, Drancourt M. The genome of Borrelia recurrentis, the agent of deadly louse-borne relapsing fever, is a degraded subset of tick-borne Borrelia duttonii. PLoS Genet 2008; 4:e1000185. [PMID: 18787695 PMCID: PMC2525819 DOI: 10.1371/journal.pgen.1000185] [Citation(s) in RCA: 120] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2008] [Accepted: 07/31/2008] [Indexed: 01/22/2023] Open
Abstract
In an effort to understand how a tick-borne pathogen adapts to the body louse, we sequenced and compared the genomes of the recurrent fever agents Borrelia recurrentis and B. duttonii. The 1,242,163–1,574,910-bp fragmented genomes of B. recurrentis and B. duttonii contain a unique 23-kb linear plasmid. This linear plasmid exhibits a large polyT track within the promoter region of an intact variable large protein gene and a telomere resolvase that is unique to Borrelia. The genome content is characterized by several repeat families, including antigenic lipoproteins. B. recurrentis exhibited a 20.4% genome size reduction and appeared to be a strain of B. duttonii, with a decaying genome, possibly due to the accumulation of genomic errors induced by the loss of recA and mutS. Accompanying this were increases in the number of impaired genes and a reduction in coding capacity, including surface-exposed lipoproteins and putative virulence factors. Analysis of the reconstructed ancestral sequence compared to B. duttonii and B. recurrentis was consistent with the accelerated evolution observed in B. recurrentis. Vector specialization of louse-borne pathogens responsible for major epidemics was associated with rapid genome reduction. The correlation between gene loss and increased virulence of B. recurrentis parallels that of Rickettsia prowazekii, with both species being genomic subsets of less-virulent strains. Borreliae are vector-borne spirochetes that are responsible for Lyme disease and recurrent fevers. We completed the genome sequences of the tick-borne Borrelia duttonii and the louse-borne B. recurrentis. The former of these is responsible for emerging infections that mimic malaria in Africa and in travellers, and the latter is responsible for severe recurrent fever in poor African populations. Diagnostic tools for these pathogens remain poor with regard to sensitivity and specificity due, in part, to the lack of genomic sequences. In this study, we show that the genomic content of B. recurrentis is a subset of that of B. duttonii, the genes of which are undergoing a decay process. These phenomena are common to all louse-borne pathogens compared to their tick-borne counterparts. In B. recurrentis, this process may be due to the inactivation of genes encoding DNA repair mechanisms, implying the accumulation of errors in the genome. The increased virulence of B. recurrentis could not be traced back to specific virulence factors, illustrating the lack of correlation between the virulence of a pathogen and so-called virulence genes. Knowledge of these genomes will allow for the development of new molecular tools that provide a more-accurate, sensitive, and specific diagnosis of these emerging infections.
Collapse
Affiliation(s)
- Magali Lescot
- Structural and Genomic Information Laboratory, CNRS UPR2589, IFR88, Parc Scientifique de Luminy, Marseille, France
| | - Stéphane Audic
- Structural and Genomic Information Laboratory, CNRS UPR2589, IFR88, Parc Scientifique de Luminy, Marseille, France
| | - Catherine Robert
- Unité des Rickettsies, UMR CNRS-IRD 6236, IFR48, Faculté de Médecine, Université de la Méditerranée, Marseille, France
| | - Thi Tien Nguyen
- Unité des Rickettsies, UMR CNRS-IRD 6236, IFR48, Faculté de Médecine, Université de la Méditerranée, Marseille, France
| | - Guillaume Blanc
- Structural and Genomic Information Laboratory, CNRS UPR2589, IFR88, Parc Scientifique de Luminy, Marseille, France
| | - Sally J. Cutler
- School of Health and Bioscience, University of East London, Stratford, London, United Kingdom
| | | | | | - Jean-Michel Claverie
- Structural and Genomic Information Laboratory, CNRS UPR2589, IFR88, Parc Scientifique de Luminy, Marseille, France
| | - Didier Raoult
- Unité des Rickettsies, UMR CNRS-IRD 6236, IFR48, Faculté de Médecine, Université de la Méditerranée, Marseille, France
| | - Michel Drancourt
- Unité des Rickettsies, UMR CNRS-IRD 6236, IFR48, Faculté de Médecine, Université de la Méditerranée, Marseille, France
- * E-mail:
| |
Collapse
|
12
|
Gayral P, Noa-Carrazana JC, Lescot M, Lheureux F, Lockhart BEL, Matsumoto T, Piffanelli P, Iskra-Caruana ML. A single Banana streak virus integration event in the banana genome as the origin of infectious endogenous pararetrovirus. J Virol 2008; 82:6697-710. [PMID: 18417582 PMCID: PMC2447048 DOI: 10.1128/jvi.00212-08] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2008] [Accepted: 04/07/2008] [Indexed: 12/15/2022] Open
Abstract
Sequencing of plant nuclear genomes reveals the widespread presence of integrated viral sequences known as endogenous pararetroviruses (EPRVs). Banana is one of the three plant species known to harbor infectious EPRVs. Musa balbisiana carries integrated copies of Banana streak virus (BSV), which are infectious by releasing virions in interspecific hybrids. Here, we analyze the organization of the EPRV of BSV Goldfinger (BSGfV) present in the wild diploid M. balbisiana cv. Pisang Klutuk Wulung (PKW) revealed by the study of Musa bacterial artificial chromosome resources and interspecific genetic cross. cv. PKW contains two similar EPRVs of BSGfV. Genotyping of these integrants and studies of their segregation pattern show an allelic insertion. Despite the fact that integrated BSGfV has undergone extensive rearrangement, both EPRVs contain the full-length viral genome. The high degree of sequence conservation between the integrated and episomal form of the virus indicates a recent integration event; however, only one allele is infectious. Analysis of BSGfV EPRV segregation among an F1 population from an interspecific genetic cross revealed that these EPRV sequences correspond to two alleles originating from a single integration event. We describe here for the first time the full genomic and genetic organization of the two EPRVs of BSGfV present in cv. PKW in response to the challenge facing both scientists and breeders to identify and generate genetic resources free from BSV. We discuss the consequences of this unique host-pathogen interaction in terms of genetic and genomic plant defenses versus strategies of infectious BSGfV EPRVs.
Collapse
Affiliation(s)
- Philippe Gayral
- CIRAD BIOS, UMR BGPI, Campus International de Baillarguet, TA A-54/K, 34398 Montpellier Cedex 5, France
| | | | | | | | | | | | | | | |
Collapse
|
13
|
Dereeper A, Guignon V, Blanc G, Audic S, Buffet S, Chevenet F, Dufayard JF, Guindon S, Lefort V, Lescot M, Claverie JM, Gascuel O. Phylogeny.fr: robust phylogenetic analysis for the non-specialist. Nucleic Acids Res 2008; 36:W465-9. [PMID: 18424797 PMCID: PMC2447785 DOI: 10.1093/nar/gkn180] [Citation(s) in RCA: 3290] [Impact Index Per Article: 205.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Phylogenetic analyses are central to many research areas in biology and typically involve the identification of homologous sequences, their multiple alignment, the phylogenetic reconstruction and the graphical representation of the inferred tree. The Phylogeny.fr platform transparently chains programs to automatically perform these tasks. It is primarily designed for biologists with no experience in phylogeny, but can also meet the needs of specialists; the first ones will find up-to-date tools chained in a phylogeny pipeline to analyze their data in a simple and robust way, while the specialists will be able to easily build and run sophisticated analyses. Phylogeny.fr offers three main modes. The 'One Click' mode targets non-specialists and provides a ready-to-use pipeline chaining programs with recognized accuracy and speed: MUSCLE for multiple alignment, PhyML for tree building, and TreeDyn for tree rendering. All parameters are set up to suit most studies, and users only have to provide their input sequences to obtain a ready-to-print tree. The 'Advanced' mode uses the same pipeline but allows the parameters of each program to be customized by users. The 'A la Carte' mode offers more flexibility and sophistication, as users can build their own pipeline by selecting and setting up the required steps from a large choice of tools to suit their specific needs. Prior to phylogenetic analysis, users can also collect neighbors of a query sequence by running BLAST on general or specialized databases. A guide tree then helps to select neighbor sequences to be used as input for the phylogeny pipeline. Phylogeny.fr is available at: http://www.phylogeny.fr/
Collapse
Affiliation(s)
- A Dereeper
- Information Génomique et Structurale (IGS), CNRS-UPR2589, IFR-88, Marseille, France
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Lescot M, Piffanelli P, Ciampi AY, Ruiz M, Blanc G, Leebens-Mack J, da Silva FR, Santos CMR, D'Hont A, Garsmeur O, Vilarinhos AD, Kanamori H, Matsumoto T, Ronning CM, Cheung F, Haas BJ, Althoff R, Arbogast T, Hine E, Pappas GJ, Sasaki T, Souza MT, Miller RNG, Glaszmann JC, Town CD. Insights into the Musa genome: syntenic relationships to rice and between Musa species. BMC Genomics 2008; 9:58. [PMID: 18234080 PMCID: PMC2270835 DOI: 10.1186/1471-2164-9-58] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2007] [Accepted: 01/30/2008] [Indexed: 01/10/2023] Open
Abstract
Background Musa species (Zingiberaceae, Zingiberales) including bananas and plantains are collectively the fourth most important crop in developing countries. Knowledge concerning Musa genome structure and the origin of distinct cultivars has greatly increased over the last few years. Until now, however, no large-scale analyses of Musa genomic sequence have been conducted. This study compares genomic sequence in two Musa species with orthologous regions in the rice genome. Results We produced 1.4 Mb of Musa sequence from 13 BAC clones, annotated and analyzed them along with 4 previously sequenced BACs. The 443 predicted genes revealed that Zingiberales genes share GC content and distribution characteristics with eudicot and Poaceae genomes. Comparison with rice revealed microsynteny regions that have persisted since the divergence of the Commelinid orders Poales and Zingiberales at least 117 Mya. The previously hypothesized large-scale duplication event in the common ancestor of major cereal lineages within the Poaceae was verified. The divergence time distributions for Musa-Zingiber (Zingiberaceae, Zingiberales) orthologs and paralogs provide strong evidence for a large-scale duplication event in the Musa lineage after its divergence from the Zingiberaceae approximately 61 Mya. Comparisons of genomic regions from M. acuminata and M. balbisiana revealed highly conserved genome structure, and indicated that these genomes diverged circa 4.6 Mya. Conclusion These results point to the utility of comparative analyses between distantly-related monocot species such as rice and Musa for improving our understanding of monocot genome evolution. Sequencing the genome of M. acuminata would provide a strong foundation for comparative genomics in the monocots. In addition a genome sequence would aid genomic and genetic analyses of cultivated Musa polyploid genotypes in research aimed at localizing and cloning genes controlling important agronomic traits for breeding purposes.
Collapse
Affiliation(s)
- Magali Lescot
- French Agricultural Research Center for International Development, UMR 1096, Avenue Agropolis, TA40/03, FR-34398, Montpellier, Cedex 5, France.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Lescot M, Rombauts S, Zhang J, Aubourg S, Mathé C, Jansson S, Rouzé P, Boerjan W. Annotation of a 95-kb Populus deltoides genomic sequence reveals a disease resistance gene cluster and novel class I and class II transposable elements. Theor Appl Genet 2004; 109:10-22. [PMID: 15085260 DOI: 10.1007/s00122-004-1621-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2003] [Accepted: 01/29/2004] [Indexed: 05/24/2023]
Abstract
Poplar has become a model system for functional genomics in woody plants. Here, we report the sequencing and annotation of the first large contiguous stretch of genomic sequence (95 kb) of poplar, corresponding to a bacterial artificial chromosome clone mapped 0.6 centiMorgan from the Melampsora larici-populina resistance locus. The annotation revealed 15 putative genetic objects, of which five were classified as hypothetical genes that were similar only with expressed sequence tags from poplar. Ten putative objects showed similarity with known genes, of which one was similar to a kinase. Three other objects corresponded to the toll/interleukin-1 receptor/nucleotide-binding site/leucine-rich repeat class of plant disease resistance genes, of which two were predicted to encode an amino terminal nuclear localization signal. Four objects were homologous to the Ty1/ copia family of class I transposable elements, one of which was designated Retropop and interrupted one of the disease resistance genes. Two other objects constituted a novel Spm-like class II transposable element, which we designated Magali.
Collapse
Affiliation(s)
- M Lescot
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology, Ghent University, Technologiepark 927, 9052 Gent, Belgium
| | | | | | | | | | | | | | | |
Collapse
|
16
|
Rombauts S, Florquin K, Lescot M, Marchal K, Rouzé P, van de Peer Y. Computational approaches to identify promoters and cis-regulatory elements in plant genomes. Plant Physiol 2003; 132:1162-76. [PMID: 12857799 PMCID: PMC167057 DOI: 10.1104/pp.102.017715] [Citation(s) in RCA: 77] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2002] [Revised: 01/10/2003] [Accepted: 03/17/2003] [Indexed: 05/19/2023]
Abstract
The identification of promoters and their regulatory elements is one of the major challenges in bioinformatics and integrates comparative, structural, and functional genomics. Many different approaches have been developed to detect conserved motifs in a set of genes that are either coregulated or orthologous. However, although recent approaches seem promising, in general, unambiguous identification of regulatory elements is not straightforward. The delineation of promoters is even harder, due to its complex nature, and in silico promoter prediction is still in its infancy. Here, we review the different approaches that have been developed for identifying promoters and their regulatory elements. We discuss the detection of cis-acting regulatory elements using word-counting or probabilistic methods (so-called "search by signal" methods) and the delineation of promoters by considering both sequence content and structural features ("search by content" methods). As an example of search by content, we explored in greater detail the association of promoters with CpG islands. However, due to differences in sequence content, the parameters used to detect CpG islands in humans and other vertebrates cannot be used for plants. Therefore, a preliminary attempt was made to define parameters that could possibly define CpG and CpNpG islands in Arabidopsis, by exploring the compositional landscape around the transcriptional start site. To this end, a data set of more than 5,000 gene sequences was built, including the promoter region, the 5'-untranslated region, and the first introns and coding exons. Preliminary analysis shows that promoter location based on the detection of potential CpG/CpNpG islands in the Arabidopsis genome is not straightforward. Nevertheless, because the landscape of CpG/CpNpG islands differs considerably between promoters and introns on the one side and exons (whether coding or not) on the other, more sophisticated approaches can probably be developed for the successful detection of "putative" CpG and CpNpG islands in plants.
Collapse
Affiliation(s)
- Stephane Rombauts
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology, Ghent University, B-9000 Gent, Belgium
| | | | | | | | | | | |
Collapse
|
17
|
Thijs G, Marchal K, Lescot M, Rombauts S, De Moor B, Rouzé P, Moreau Y. A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes. J Comput Biol 2002; 9:447-64. [PMID: 12015892 DOI: 10.1089/10665270252935566] [Citation(s) in RCA: 260] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Microarray experiments can reveal important information about transcriptional regulation. In our case, we look for potential promoter regulatory elements in the upstream region of coexpressed genes. Here we present two modifications of the original Gibbs sampling algorithm for motif finding (Lawrence et al., 1993). First, we introduce the use of a probability distribution to estimate the number of copies of the motif in a sequence. Second, we describe the technical aspects of the incorporation of a higher-order background model whose application we discussed in Thijs et al. (2001). Our implementation is referred to as the Motif Sampler. We successfully validate our algorithm on several data sets. First, we show results for three sets of upstream sequences containing known motifs: 1) the G-box light-response element in plants, 2) elements involved in methionine response in Saccharomyces cerevisiae, and 3) the FNR O(2)-responsive element in bacteria. We use these data sets to explain the influence of the parameters on the performance of our algorithm. Second, we show results for upstream sequences from four clusters of coexpressed genes identified in a microarray experiment on wounding in Arabidopsis thaliana. Several motifs could be matched to regulatory elements from plant defence pathways in our database of plant cis-acting regulatory elements (PlantCARE). Some other strong motifs do not have corresponding motifs in PlantCARE but are promising candidates for further analysis.
Collapse
Affiliation(s)
- Gert Thijs
- ESAT-SCD, KULeuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium.
| | | | | | | | | | | | | |
Collapse
|
18
|
Guiderdoni E, Cordero MJ, Vignols F, Garcia-Garrido JM, Lescot M, Tharreau D, Meynard D, Ferrière N, Notteghem JL, Delseny M. Inducibility by pathogen attack and developmental regulation of the rice Ltp1 gene. Plant Mol Biol 2002; 49:683-99. [PMID: 12081375 DOI: 10.1023/a:1015595100145] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Using a genomic clone encoding a rice lipid transfer protein, LTP1, we analysed the activity of the 5' region of the Ltp1 gene in transgenic rice (Oryza sativa L.) during plant development and under pathogen attack. The -1176/+13, -556/+13 and -284/+13 regions of the promoter were fused upstream from the uidA reporter gene and nos 3' polyadenylation signal, resulting in the pdelta1176Gus, pdelta556Gus and pdelta284Gus constructs which were transferred to rice by microprojectile bombardment. Histochemical and fluorometric GUS assays and in situ detection of uidA transcripts in transgenic homozygous lines harbouring the pdelta1176Gus construct demonstrated that the Ltp1 promoter is preferentially active in aerial vegetative and reproductive organs and that both specificity and level of expression are regulated during organ development. In leaf sheath, GUS activity which is initially strictly localized in the epidermis of growing tissue, becomes restricted to the vascular system in mature tissues. In expanded leaf blade, expression of the uidA gene was restricted to the cutting level suggesting inducibility by wounding. Strong activity was detected in lemma and palea, sterile glumes, and immature anther walls and microspores but not in female reproductive organs. No GUS activity was detected during seed embryo maturation whereas the uidA gene was strongly expressed at early stages of somatic embryogenesis in scutellum tissue. The Ltp1 transcripts were found to strongly accumulate in response to inoculation with the fungal agent of the blast disease, Magnaporthe grisea, in two rice cultivars exhibiting compatible or incompatible host-pathogen interactions. Analysis of pdelta1176Gus leaf samples inoculated with the blast fungus demonstrated that the Ltp1 promoter is induced in all cell types of tissues surrounding the lesion and notably in stomata guard cells. In plants harbouring the Ltp1 promoter deletion construct pdelta556Gus, activity was solely detected in the vascular system of mature leaves whereas no uidA gene expression was observed in pdelta284Gus plants. These observations are consistent with the proposed role of LTP1 in strenghtening of structural barriers and organ protection against mechanical disruption and pathogen attack.
Collapse
Affiliation(s)
- Emmanuel Guiderdoni
- BIOTROP and CALIM programmes, Cirad, Centre International de Recherches Agronomiques en coopération pour le Développement, Montpellier, France.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Thijs G, Moreau Y, De Smet F, Mathys J, Lescot M, Rombauts S, Rouze P, De Moor B, Marchal K. INCLUSive: integrated clustering, upstream sequence retrieval and motif sampling. Bioinformatics 2002; 18:331-2. [PMID: 11847086 DOI: 10.1093/bioinformatics/18.2.331] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
INCLUSive allows automatic multistep analysis of microarray data (clustering and motif finding). The clustering algorithm (adaptive quality-based clustering) groups together genes with highly similar expression profiles. The upstream sequences of the genes belonging to a cluster are automatically retrieved from GenBank and can be fed directly into Motif Sampler, a Gibbs sampling algorithm that retrieves statistically over-represented motifs in sets of sequences, in this case upstream regions of co-expressed genes.
Collapse
Affiliation(s)
- Gert Thijs
- ESAT_ SISTA/COSIC, KULeuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium.
| | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Lescot M, Déhais P, Thijs G, Marchal K, Moreau Y, Van de Peer Y, Rouzé P, Rombauts S. PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. Nucleic Acids Res 2002; 30:325-7. [PMID: 11752327 PMCID: PMC99092 DOI: 10.1093/nar/30.1.325] [Citation(s) in RCA: 3503] [Impact Index Per Article: 159.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
PlantCARE is a database of plant cis-acting regulatory elements, enhancers and repressors. Regulatory elements are represented by positional matrices, consensus sequences and individual sites on particular promoter sequences. Links to the EMBL, TRANSFAC and MEDLINE databases are provided when available. Data about the transcription sites are extracted mainly from the literature, supplemented with an increasing number of in silico predicted data. Apart from a general description for specific transcription factor sites, levels of confidence for the experimental evidence, functional information and the position on the promoter are given as well. New features have been implemented to search for plant cis-acting regulatory elements in a query sequence. Furthermore, links are now provided to a new clustering and motif search method to investigate clusters of co-expressed genes. New regulatory elements can be sent automatically and will be added to the database after curation. The PlantCARE relational database is available via the World Wide Web at http://sphinx.rug.ac.be:8080/PlantCARE/.
Collapse
Affiliation(s)
- Magali Lescot
- Vakgroep Moleculaire Genetica, Departement Plantengenetica, Vlaams Interuniversitair Instituut voor Biotechnologie, Universiteit Gent, K. L. Ledeganckstraat 35, B-9000 Gent, Belgium
| | | | | | | | | | | | | | | |
Collapse
|
21
|
Thijs G, Lescot M, Marchal K, Rombauts S, De Moor B, Rouzé P, Moreau Y. A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 2001; 17:1113-22. [PMID: 11751219 DOI: 10.1093/bioinformatics/17.12.1113] [Citation(s) in RCA: 286] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Transcriptome analysis allows detection and clustering of genes that are coexpressed under various biological circumstances. Under the assumption that coregulated genes share cis-acting regulatory elements, it is important to investigate the upstream sequences controlling the transcription of these genes. To improve the robustness of the Gibbs sampling algorithm to noisy data sets we propose an extension of this algorithm for motif finding with a higher-order background model. RESULTS Simulated data and real biological data sets with well-described regulatory elements are used to test the influence of the different background models on the performance of the motif detection algorithm. We show that the use of a higher-order model considerably enhances the performance of our motif finding algorithm in the presence of noisy data. For Arabidopsis thaliana, a reliable background model based on a set of carefully selected intergenic sequences was constructed. AVAILABILITY Our implementation of the Gibbs sampler called the Motif Sampler can be used through a web interface: http://www.esat.kuleuven.ac.be/~thijs/Work/MotifSampler.html. CONTACT gert.thijs@esat.kuleuven.ac.be; yves.moreau@esat.kuleuven.ac.be
Collapse
Affiliation(s)
- G Thijs
- ESAT-SISTA/COSIC, KULeuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium.
| | | | | | | | | | | | | |
Collapse
|
22
|
|