1
|
Eloe-Fadrosh EA, Mungall CJ, Miller MA, Smith M, Patil SS, Kelliher JM, Johnson LYD, Rodriguez FE, Chain PSG, Hu B, Thornton MB, McCue LA, McHardy AC, Harris NL, Reddy TBK, Mukherjee S, Hunter CI, Walls R, Schriml LM. A Practical Approach to Using the Genomic Standards Consortium MIxS Reporting Standard for Comparative Genomics and Metagenomics. Methods Mol Biol 2024; 2802:587-609. [PMID: 38819573 DOI: 10.1007/978-1-0716-3838-5_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Comparative analysis of (meta)genomes necessitates aggregation, integration, and synthesis of well-annotated data using standards. The Genomic Standards Consortium (GSC) collaborates with the research community to develop and maintain the Minimum Information about any (x) Sequence (MIxS) reporting standard for genomic data. To facilitate the use of the GSC's MIxS reporting standard, we provide a description of the structure and terminology, how to navigate ontologies for required terms in MIxS, and demonstrate practical usage through a soil metagenome example.
Collapse
Affiliation(s)
- Emiley A Eloe-Fadrosh
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| | - Christopher J Mungall
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Mark Andrew Miller
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Montana Smith
- Pacific Northwest National Laboratory, Richland, WA, USA
| | - Sujay Sanjeev Patil
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Julia M Kelliher
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Leah Y D Johnson
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | | | - Patrick S G Chain
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Bin Hu
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Michael B Thornton
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Lee Ann McCue
- Pacific Northwest National Laboratory, Richland, WA, USA
| | - Alice Carolyn McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Nomi L Harris
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - T B K Reddy
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Supratim Mukherjee
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Christopher I Hunter
- GigaScience Press, Hong Kong Science Park, Pak Shek Kok, New Territories, Hong Kong
| | | | - Lynn M Schriml
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| |
Collapse
|
2
|
Baltoumas FA, Karatzas E, Paez-Espino D, Venetsianou NK, Aplakidou E, Oulas A, Finn RD, Ovchinnikov S, Pafilis E, Kyrpides NC, Pavlopoulos GA. Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters. FRONTIERS IN BIOINFORMATICS 2023; 3:1157956. [PMID: 36959975 PMCID: PMC10029925 DOI: 10.3389/fbinf.2023.1157956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 02/21/2023] [Indexed: 03/06/2023] Open
Abstract
Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment.
Collapse
Affiliation(s)
- Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - David Paez-Espino
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Anastasis Oulas
- The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Robert D. Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, United States
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Nikos C. Kyrpides
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- Center of New Biotechnologies and Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Athens, Greece
- Hellenic Army Academy, Vari, Greece
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| |
Collapse
|
3
|
Zafeiropoulos H, Beracochea M, Ninidakis S, Exter K, Potirakis A, De Moro G, Richardson L, Corre E, Machado J, Pafilis E, Kotoulas G, Santi I, Finn RD, Cox CJ, Pavloudi C. metaGOflow: a workflow for the analysis of marine Genomic Observatories shotgun metagenomics data. Gigascience 2022; 12:giad078. [PMID: 37850871 PMCID: PMC10583283 DOI: 10.1093/gigascience/giad078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 06/30/2023] [Accepted: 09/11/2023] [Indexed: 10/19/2023] Open
Abstract
BACKGROUND Genomic Observatories (GOs) are sites of long-term scientific study that undertake regular assessments of the genomic biodiversity. The European Marine Omics Biodiversity Observation Network (EMO BON) is a network of GOs that conduct regular biological community samplings to generate environmental and metagenomic data of microbial communities from designated marine stations around Europe. The development of an effective workflow is essential for the analysis of the EMO BON metagenomic data in a timely and reproducible manner. FINDINGS Based on the established MGnify resource, we developed metaGOflow. metaGOflow supports the fast inference of taxonomic profiles from GO-derived data based on ribosomal RNA genes and their functional annotation using the raw reads. Thanks to the Research Object Crate packaging, relevant metadata about the sample under study, and the details of the bioinformatics analysis it has been subjected to, are inherited to the data product while its modular implementation allows running the workflow partially. The analysis of 2 EMO BON samples and 1 Tara Oceans sample was performed as a use case. CONCLUSIONS metaGOflow is an efficient and robust workflow that scales to the needs of projects producing big metagenomic data such as EMO BON. It highlights how containerization technologies along with modern workflow languages and metadata package approaches can support the needs of researchers when dealing with ever-increasing volumes of biological data. Despite being initially oriented to address the needs of EMO BON, metaGOflow is a flexible and easy-to-use workflow that can be broadly used for one-sample-at-a-time analysis of shotgun metagenomics data.
Collapse
Affiliation(s)
- Haris Zafeiropoulos
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece
- KU Leuven, Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, Laboratory of Molecular Bacteriology, 3000 Leuven, Belgium
| | - Martin Beracochea
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stelios Ninidakis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece
| | - Katrina Exter
- Flanders Marine Institute (VLIZ), 8400 Oostende, Belgium
| | - Antonis Potirakis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece
| | - Gianluca De Moro
- Centro de Ciências do Mar (CCMAR), Universidade do Algarve, Campus de Gambelas, 8005-139 Faro, Portugal
| | - Lorna Richardson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Erwan Corre
- CNRS, FR 2424, ABiMS Platform, Station Biologique de Roscoff (SBR), 29680 Roscoff, France
| | - João Machado
- Centro de Ciências do Mar (CCMAR), Universidade do Algarve, Campus de Gambelas, 8005-139 Faro, Portugal
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece
| | - Georgios Kotoulas
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece
| | - Ioulia Santi
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece
- European Marine Biological Resource Centre (EMBRC-ERIC), 75005 Paris, France
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cymon J Cox
- Centro de Ciências do Mar (CCMAR), Universidade do Algarve, Campus de Gambelas, 8005-139 Faro, Portugal
| | - Christina Pavloudi
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece
- Department of Biological Sciences, The George Washington University, 20052 Washington, DC, USA
| |
Collapse
|
4
|
Poulsen CS, Kaas RS, Aarestrup FM, Pamp SJ. Standard Sample Storage Conditions Have an Impact on Inferred Microbiome Composition and Antimicrobial Resistance Patterns. Microbiol Spectr 2021; 9:e0138721. [PMID: 34612701 DOI: 10.1101/2021.05.24.445395] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2023] Open
Abstract
Storage of biological specimens is crucial in the life and medical sciences. Storage conditions for samples can be different for a number of reasons, and it is unclear what effect this can have on the inferred microbiome composition in metagenomics analyses. Here, we assess the effect of common storage temperatures (deep freezer, -80°C; freezer, -20°C; refrigerator, 5°C; room temperature, 22°C) and storage times (immediate sample processing, 0 h; next day, 16 h; over weekend, 64 h; longer term, 4, 8, and 12 months) as well as repeated sample freezing and thawing (2 to 4 freeze-thaw cycles). We examined two different pig feces and sewage samples, unspiked and spiked with a mock community, in triplicate, respectively, amounting to a total of 438 samples (777 Gbp; 5.1 billion reads). Storage conditions had a significant and systematic effect on the taxonomic and functional composition of microbiomes. Distinct microbial taxa and antimicrobial resistance classes were, in some situations, similarly affected across samples, while others were not, suggesting an impact of individual inherent sample characteristics. With an increasing number of freeze-thaw cycles, an increasing abundance of Firmicutes, Actinobacteria, and eukaryotic microorganisms was observed. We provide recommendations for sample storage and strongly suggest including more detailed information in the metadata together with the DNA sequencing data in public repositories to better facilitate meta-analyses and reproducibility of findings. IMPORTANCE Previous research has reported effects of DNA isolation, library preparation, and sequencing technology on metagenomics-based microbiome composition; however, the effect of biospecimen storage conditions has not been thoroughly assessed. We examined the effect of common sample storage conditions on metagenomics-based microbiome composition and found significant and, in part, systematic effects. Repeated freeze-thaw cycles could be used to improve the detection of microorganisms with more rigid cell walls, including parasites. We provide a data set that could also be used for benchmarking algorithms to identify and correct for unwanted batch effects. Overall, the findings suggest that all samples of a microbiome study should be stored in the same way. Furthermore, there is a need to mandate more detailed information about sample storage and processing be published together with DNA sequencing data at the International Nucleotide Sequence Database Collaboration (ENA/EBI, NCBI, DDBJ) or other repositories.
Collapse
Affiliation(s)
- Casper Sahl Poulsen
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmarkgrid.5170.3, Kongens Lyngby, Denmark
| | - Rolf Sommer Kaas
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmarkgrid.5170.3, Kongens Lyngby, Denmark
| | - Frank M Aarestrup
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmarkgrid.5170.3, Kongens Lyngby, Denmark
| | - Sünje Johanna Pamp
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmarkgrid.5170.3, Kongens Lyngby, Denmark
| |
Collapse
|
5
|
Poulsen CS, Kaas RS, Aarestrup FM, Pamp SJ. Standard Sample Storage Conditions Have an Impact on Inferred Microbiome Composition and Antimicrobial Resistance Patterns. Microbiol Spectr 2021; 9:e0138721. [PMID: 34612701 PMCID: PMC8510183 DOI: 10.1128/spectrum.01387-21] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 09/02/2021] [Indexed: 12/11/2022] Open
Abstract
Storage of biological specimens is crucial in the life and medical sciences. Storage conditions for samples can be different for a number of reasons, and it is unclear what effect this can have on the inferred microbiome composition in metagenomics analyses. Here, we assess the effect of common storage temperatures (deep freezer, -80°C; freezer, -20°C; refrigerator, 5°C; room temperature, 22°C) and storage times (immediate sample processing, 0 h; next day, 16 h; over weekend, 64 h; longer term, 4, 8, and 12 months) as well as repeated sample freezing and thawing (2 to 4 freeze-thaw cycles). We examined two different pig feces and sewage samples, unspiked and spiked with a mock community, in triplicate, respectively, amounting to a total of 438 samples (777 Gbp; 5.1 billion reads). Storage conditions had a significant and systematic effect on the taxonomic and functional composition of microbiomes. Distinct microbial taxa and antimicrobial resistance classes were, in some situations, similarly affected across samples, while others were not, suggesting an impact of individual inherent sample characteristics. With an increasing number of freeze-thaw cycles, an increasing abundance of Firmicutes, Actinobacteria, and eukaryotic microorganisms was observed. We provide recommendations for sample storage and strongly suggest including more detailed information in the metadata together with the DNA sequencing data in public repositories to better facilitate meta-analyses and reproducibility of findings. IMPORTANCE Previous research has reported effects of DNA isolation, library preparation, and sequencing technology on metagenomics-based microbiome composition; however, the effect of biospecimen storage conditions has not been thoroughly assessed. We examined the effect of common sample storage conditions on metagenomics-based microbiome composition and found significant and, in part, systematic effects. Repeated freeze-thaw cycles could be used to improve the detection of microorganisms with more rigid cell walls, including parasites. We provide a data set that could also be used for benchmarking algorithms to identify and correct for unwanted batch effects. Overall, the findings suggest that all samples of a microbiome study should be stored in the same way. Furthermore, there is a need to mandate more detailed information about sample storage and processing be published together with DNA sequencing data at the International Nucleotide Sequence Database Collaboration (ENA/EBI, NCBI, DDBJ) or other repositories.
Collapse
Affiliation(s)
- Casper Sahl Poulsen
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Rolf Sommer Kaas
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Frank M. Aarestrup
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Sünje Johanna Pamp
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|
6
|
Marks PC, Bigler M, Alsop EB, Vigneron A, Lomans BP, De Paula R, Geissler B, Tsesmetzis N. MetaHCR: a web-enabled metagenome data management system for hydrocarbon resources. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:1-10. [PMID: 30212909 PMCID: PMC6146120 DOI: 10.1093/database/bay087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2018] [Accepted: 07/24/2018] [Indexed: 11/16/2022]
Abstract
The ever-increasing metagenomic data necessitate appropriate cataloguing in a way that facilitates the comparison and better contextualization of the underlying investigations. To this extent, information associated with the sequencing data as well as the original sample and the environment where it was obtained from is crucial. To date, there are not any publicly available repositories able to capture environmental metadata pertaining to hydrocarbon-rich environments. As such, contextualization and comparative analysis among sequencing datasets derived from these environments is to a certain degree hindered or cannot be fully evaluated. The metagenomics data management system for hydrocarbon resources (MetaHCRs) enables the capturing of marker gene and whole metagenome sequencing data as well as over 300 contextual attributes associated with samples, organisms, environments and geological properties, among others. Moreover, MetaHCR implements the Minimum Information about any Sequence–hydrocarbon resource specification from the Genomic Standards Consortium; it integrates a user-friendly web interface and relational database model, and it enables the generation of complex custom search. MetaHCR has been tested with 36 publicly available metagenomic studies, and its modular architecture can be easily customized for other types of environmental and metagenomics studies.
Collapse
Affiliation(s)
- Peter C Marks
- Shell International Exploration and Production Inc., Houston, USA
| | - Marc Bigler
- Ecole Supérieure de Biotechnologie Strasbourg, Illkirch-Graffenstaden, France
| | - Eric B Alsop
- Shell International Exploration and Production Inc., Houston, USA.,DOE Joint Genome Institute,Walnut Creek, California, USA
| | - Adrien Vigneron
- Shell International Exploration and Production Inc., Houston, USA.,School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, UK
| | - Bart P Lomans
- Shell Global Solutions International B.V., HW Amsterdam, Netherlands
| | | | - Brett Geissler
- RD & E Microbiology Group, NALCO Champion, Sugar Land, USA
| | - Nicolas Tsesmetzis
- Shell International Exploration and Production Inc., Houston, USA.,School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, UK
| |
Collapse
|
7
|
Tsesmetzis N, Yilmaz P, Marks PC, Kyrpides NC, Head IM, Lomans BP. MIxS-HCR: a MIxS extension defining a minimal information standard for sequence data from environments pertaining to hydrocarbon resources. Stand Genomic Sci 2016; 11:78. [PMID: 27777648 PMCID: PMC5059931 DOI: 10.1186/s40793-016-0203-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2016] [Accepted: 10/05/2016] [Indexed: 11/21/2022] Open
Abstract
Here we introduce a MIxS extension to facilitate the recording and cataloguing of metadata from samples related to hydrocarbon resources. The proposed MIxS-HCR package incorporates the core features of the MIxS standard for marker gene (MIMARKS) and metagenomic (MIMS) sequences along with a hydrocarbon resources customized environmental package. Adoption of the MIxS-HCR standard will enable the comparison and better contextualization of investigations related to hydrocarbon rich environments. The insights from such standardized way of reporting could be highly beneficial for the successful development and optimization of hydrocarbon recovery processes and management of microbiological issues in petroleum production systems.
Collapse
Affiliation(s)
- Nicolas Tsesmetzis
- Shell International Exploration and Production Inc., 3333 HW6S, Houston, 77082 TX USA
| | - Pelin Yilmaz
- Microbial Genomics and Bioinformatics Research Group, Max Planck Institute for Marine Microbiology, Bremen, Germany
| | - Peter C Marks
- Shell International Exploration and Production Inc., 3333 HW6S, Houston, 77082 TX USA
| | | | - Ian M Head
- School of Civil Engineering and Geosciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| | - Bart P Lomans
- Shell Global Solutions International B.V., Rijswijk, Netherlands
| |
Collapse
|
8
|
Droege G, Barker K, Seberg O, Coddington J, Benson E, Berendsohn WG, Bunk B, Butler C, Cawsey EM, Deck J, Döring M, Flemons P, Gemeinholzer B, Güntsch A, Hollowell T, Kelbert P, Kostadinov I, Kottmann R, Lawlor RT, Lyal C, Mackenzie-Dodds J, Meyer C, Mulcahy D, Nussbeck SY, O'Tuama É, Orrell T, Petersen G, Robertson T, Söhngen C, Whitacre J, Wieczorek J, Yilmaz P, Zetzsche H, Zhang Y, Zhou X. The Global Genome Biodiversity Network (GGBN) Data Standard specification. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw125. [PMID: 27694206 PMCID: PMC5045859 DOI: 10.1093/database/baw125] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Accepted: 08/09/2016] [Indexed: 11/24/2022]
Abstract
Genomic samples of non-model organisms are becoming increasingly important in a broad range of studies from developmental biology, biodiversity analyses, to conservation. Genomic sample definition, description, quality, voucher information and metadata all need to be digitized and disseminated across scientific communities. This information needs to be concise and consistent in today’s ever-increasing bioinformatic era, for complementary data aggregators to easily map databases to one another. In order to facilitate exchange of information on genomic samples and their derived data, the Global Genome Biodiversity Network (GGBN) Data Standard is intended to provide a platform based on a documented agreement to promote the efficient sharing and usage of genomic sample material and associated specimen information in a consistent way. The new data standard presented here build upon existing standards commonly used within the community extending them with the capability to exchange data on tissue, environmental and DNA sample as well as sequences. The GGBN Data Standard will reveal and democratize the hidden contents of biodiversity biobanks, for the convenience of everyone in the wider biobanking community. Technical tools exist for data providers to easily map their databases to the standard. Database URL:http://terms.tdwg.org/wiki/GGBN_Data_Standard
Collapse
Affiliation(s)
- G Droege
- Botanic Garden and Botanical Museum Berlin-Dahlem, Freie Universität Berlin, Königin-Luise-Str. 6-8, Berlin 14195, Germany
| | - K Barker
- National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
| | - O Seberg
- Natural History Museum of Denmark, University of Copenhagen, Sølvgade 83, opg. S, Copenhagen DK-1307, Denmark
| | - J Coddington
- National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
| | - E Benson
- Damar Research Scientists, Damar, Drum Road, Cuparmuir, Fife KY15 5RJ, UK
| | - W G Berendsohn
- Botanic Garden and Botanical Museum Berlin-Dahlem, Freie Universität Berlin, Königin-Luise-Str. 6-8, Berlin 14195, Germany
| | - B Bunk
- Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Inhoffenstr. 7B, Braunschweig 38124, Germany
| | - C Butler
- National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
| | - E M Cawsey
- Australian National Wildlife Collection, CSIRO National Research Collections Australia, Canberra, Australia
| | - J Deck
- Berkeley Natural History Museums, University of California at Berkeley, Berkeley, CA 94720, USA
| | - M Döring
- Global Biodiversity Information Facility Secretariat, Universitetsparken 15, Copenhagen DK-2100, Denmark
| | - P Flemons
- Australian Museum, Sydney 2010, NSW, Australia
| | - B Gemeinholzer
- Systematic Botany, Justus Liebig University, Giessen 35392, Germany
| | - A Güntsch
- Botanic Garden and Botanical Museum Berlin-Dahlem, Freie Universität Berlin, Königin-Luise-Str. 6-8, Berlin 14195, Germany
| | - T Hollowell
- National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
| | - P Kelbert
- Botanic Garden and Botanical Museum Berlin-Dahlem, Freie Universität Berlin, Königin-Luise-Str. 6-8, Berlin 14195, Germany
| | - I Kostadinov
- Department of Life Sciences & Chemistry, Jacobs University Bremen gGmbH, Campus Ring 1, Bremen 28759, Germany
| | - R Kottmann
- Microbial Genomics and Bioinformatics Research Group, Max Planck Institute for Marine Microbiology, Celsiusstrasse 1, Bremen 28359, Germany
| | - R T Lawlor
- ARC-Net Applied Research on Cancer Centre, Department of Pathology and Diagnostics, University of Verona, Verona 37134, Italy
| | - C Lyal
- Natural History Museum, Cromwell Road, London SW7 5BD, UK
| | | | - C Meyer
- National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
| | - D Mulcahy
- National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
| | - S Y Nussbeck
- Department of Medical Informatics and UMG Biobank, University Medical Center Göttingen, Robert-Koch-Str. 40, Göttingen 37075, Germany
| | - É O'Tuama
- Global Biodiversity Information Facility Secretariat, Universitetsparken 15, Copenhagen DK-2100, Denmark
| | - T Orrell
- National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
| | - G Petersen
- Natural History Museum of Denmark, University of Copenhagen, Sølvgade 83, opg. S, Copenhagen DK-1307, Denmark
| | - T Robertson
- Global Biodiversity Information Facility Secretariat, Universitetsparken 15, Copenhagen DK-2100, Denmark
| | - C Söhngen
- Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Inhoffenstr. 7B, Braunschweig 38124, Germany
| | - J Whitacre
- National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
| | - J Wieczorek
- Museum of Vertebrate Zoology, University of California at Berkeley, Berkeley, CA 94720, USA
| | - P Yilmaz
- Microbial Genomics and Bioinformatics Research Group, Max Planck Institute for Marine Microbiology, Celsiusstrasse 1, Bremen 28359, Germany
| | - H Zetzsche
- Julius Kuehn-Institute (JKI), Federal Research Centre for Cultivated Plants, Institute for Resistance Research and Stress Tolerance, Erwin-Baur-Str. 27, Quedlinburg 06484, Germany
| | - Y Zhang
- China National GeneBank, BGI-Shenzhen, Shenzhen, Guangdong 518083, China
| | - X Zhou
- China National GeneBank, BGI-Shenzhen, Shenzhen, Guangdong 518083, China
| |
Collapse
|
9
|
Li X, Song L, Wang G, Ren L, Yu D, Chen G, Wang X, Yu J, Liu G, Du Z. Complete genome sequence of a deeply branched marine Bacteroidia bacterium Draconibacterium orientale type strain FH5(T). Mar Genomics 2016; 26:13-6. [PMID: 26796622 DOI: 10.1016/j.margen.2016.01.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2015] [Revised: 01/05/2016] [Accepted: 01/05/2016] [Indexed: 10/22/2022]
Abstract
Draconibacterium orientale strain FH5(T) isolated from a marine sediment sample from coast of Weihai, China, was a new species within the proposed new genus Draconibacterium in class Bacteroidia. Here, we present the genome sequence of D. orientale FH5(T), which contains 5,132,075 bp with a G+C content of 41.31%. The genome sequence will contribute to a better understanding of the physiology of this species.
Collapse
Affiliation(s)
- Xiaoli Li
- College of Marine Science, Shandong University at Weihai, Weihai 264209, China
| | - Lai Song
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Guoliang Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China; Graduate School of the Chinese Academy of Sciences, Beijing 100049, China
| | - Lufeng Ren
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Dan Yu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Guanjun Chen
- College of Marine Science, Shandong University at Weihai, Weihai 264209, China
| | - Xumin Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Jun Yu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Guiming Liu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China.
| | - Zongjun Du
- College of Marine Science, Shandong University at Weihai, Weihai 264209, China.
| |
Collapse
|
10
|
Land M, Hauser L, Jun SR, Nookaew I, Leuze MR, Ahn TH, Karpinets T, Lund O, Kora G, Wassenaar T, Poudel S, Ussery DW. Insights from 20 years of bacterial genome sequencing. Funct Integr Genomics 2015; 15:141-61. [PMID: 25722247 PMCID: PMC4361730 DOI: 10.1007/s10142-015-0433-4] [Citation(s) in RCA: 391] [Impact Index Per Article: 43.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Revised: 02/11/2015] [Accepted: 02/12/2015] [Indexed: 12/18/2022]
Abstract
Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.
Collapse
Affiliation(s)
- Miriam Land
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Loren Hauser
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
- Joint Institute for Biological Sciences, University of Tennessee, Knoxville, TN 37996 USA
- Department of Microbiology, University of Tennessee, Knoxville, TN 37996 USA
| | - Se-Ran Jun
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Intawat Nookaew
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Michael R. Leuze
- Computer Science and Mathematics Division, Computer Science Research Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Tae-Hyuk Ahn
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
- Computer Science and Mathematics Division, Computer Science Research Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Tatiana Karpinets
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Ole Lund
- Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, Kgs. Lyngby, 2800 Denmark
| | - Guruprased Kora
- Computer Science and Mathematics Division, Computer Science Research Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Trudy Wassenaar
- Molecular Microbiology and Genomics Consultants, Tannenstr 7, 55576 Zotzenheim, Germany
| | - Suresh Poudel
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
- Genome Science and Technology, University of Tennessee, Knoxville, TN 37996 USA
| | - David W. Ussery
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
- Joint Institute for Biological Sciences, University of Tennessee, Knoxville, TN 37996 USA
- Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, Kgs. Lyngby, 2800 Denmark
- Genome Science and Technology, University of Tennessee, Knoxville, TN 37996 USA
| |
Collapse
|
11
|
Bischof J, Harrison T, Paczian T, Glass E, Wilke A, Meyer F. Metazen - metadata capture for metagenomes. Stand Genomic Sci 2014; 9:18. [PMID: 25780508 PMCID: PMC4334943 DOI: 10.1186/1944-3277-9-18] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2014] [Accepted: 11/03/2014] [Indexed: 11/30/2022] Open
Abstract
Background As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. Unfortunately, these tools are not specifically designed for metagenomic surveys; in particular, they lack the appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusions Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.
Collapse
Affiliation(s)
- Jared Bischof
- Computation Institute, University of Chicago, 5735 S Ellis Ave, Chicago, IL 60637, USA ; Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL 60439, USA
| | - Travis Harrison
- Computation Institute, University of Chicago, 5735 S Ellis Ave, Chicago, IL 60637, USA ; Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL 60439, USA
| | - Tobias Paczian
- Computation Institute, University of Chicago, 5735 S Ellis Ave, Chicago, IL 60637, USA ; Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL 60439, USA
| | - Elizabeth Glass
- Computation Institute, University of Chicago, 5735 S Ellis Ave, Chicago, IL 60637, USA ; Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL 60439, USA ; Biological Sciences Division, Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL 60439, USA
| | - Andreas Wilke
- Computation Institute, University of Chicago, 5735 S Ellis Ave, Chicago, IL 60637, USA ; Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL 60439, USA ; Biological Sciences Division, Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL 60439, USA
| | - Folker Meyer
- Computation Institute, University of Chicago, 5735 S Ellis Ave, Chicago, IL 60637, USA ; Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL 60439, USA ; Biological Sciences Division, Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL 60439, USA
| |
Collapse
|
12
|
Walls RL, Deck J, Guralnick R, Baskauf S, Beaman R, Blum S, Bowers S, Buttigieg PL, Davies N, Endresen D, Gandolfo MA, Hanner R, Janning A, Krishtalka L, Matsunaga A, Midford P, Morrison N, Tuama ÉÓ, Schildhauer M, Smith B, Stucky BJ, Thomer A, Wieczorek J, Whitacre J, Wooley J. Semantics in support of biodiversity knowledge discovery: an introduction to the biological collections ontology and related ontologies. PLoS One 2014; 9:e89606. [PMID: 24595056 PMCID: PMC3940615 DOI: 10.1371/journal.pone.0089606] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2013] [Accepted: 01/24/2014] [Indexed: 11/19/2022] Open
Abstract
The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers.
Collapse
Affiliation(s)
- Ramona L. Walls
- The iPlant Collaborative, University of Arizona, Tucson, Arizona, United States of America
- * E-mail:
| | - John Deck
- University of California, Berkeley, Berkeley, California, United States of America
| | - Robert Guralnick
- Department of Ecology and Evolutionary Biology and the CU Museum of Natural History, University of Colorado at Boulder, Boulder, Colorado, United States of America
| | - Steve Baskauf
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Reed Beaman
- University of Florida, Florida Museum of Natural History, Gainesville, Florida, United States of America
| | - Stanley Blum
- Research Informatics, California Academy of Sciences, San Francisco, California, United States of America
| | - Shawn Bowers
- Gonzaga University, Computer Science, Spokane, Washington, United States of America
| | - Pier Luigi Buttigieg
- Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Bremerhaven, Germany
| | - Neil Davies
- University of California, Berkeley, Gump South Pacific Research Station, Moorea, French Polynesia
| | - Dag Endresen
- GBIF Norway, Natural History Museum, University in Oslo, Oslo, Norway
| | - Maria Alejandra Gandolfo
- LH Bailey Hortorium, Department of Plant Biology, Cornell University, Ithaca, New York, United States of America
| | - Robert Hanner
- Biodiversity Institute of Ontario, University of Guelph, Guelph, ON, Canada
| | - Alyssa Janning
- School of Information Resources and Library Science, University of Arizona, Tucson, Arizona, United States of America
| | - Leonard Krishtalka
- Biodiversity Institute and Ecology & Evolutionary Biology, The University of Kansas, Lawrence, Kansas, United States of America
| | - Andréa Matsunaga
- University of Florida, Gainesville, Florida, United States of America
| | - Peter Midford
- Ecology and Evolutionary Biology, University of Kansas, Lawrence, Kansas, United States of America
| | - Norman Morrison
- The BioVeL Project, School of Computer Science, The University of Manchester, Manchester, United Kingdom
| | | | - Mark Schildhauer
- National Center for Ecological Analysis and Synthesis, Santa Barbara, California, United States of America
| | - Barry Smith
- Department of Philosophy, University at Buffalo, Buffalo, New York, United States of America
| | - Brian J. Stucky
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, Colorado, United States of America
| | - Andrea Thomer
- Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, Urbana-Champaign, Illinois, United States of America
| | - John Wieczorek
- 3101 VLSB, Museum of Vertebrate Zoology, University of California, Berkeley, Berkeley, California, United States of America
| | - Jamie Whitacre
- Informatics Branch, Information Technology Office, National Museum of Natural History, Smithsonian Institution, Washington, DC, United States of America
| | - John Wooley
- University of California San Diego, La Jolla, California, United States of America
| |
Collapse
|
13
|
Ison J, Kalas M, Jonassen I, Bolser D, Uludag M, McWilliam H, Malone J, Lopez R, Pettifer S, Rice P. EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats. Bioinformatics 2013; 29:1325-32. [PMID: 23479348 PMCID: PMC3654706 DOI: 10.1093/bioinformatics/btt113] [Citation(s) in RCA: 126] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Revised: 02/28/2013] [Accepted: 03/01/2013] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Advancing the search, publication and integration of bioinformatics tools and resources demands consistent machine-understandable descriptions. A comprehensive ontology allowing such descriptions is therefore required. RESULTS EDAM is an ontology of bioinformatics operations (tool or workflow functions), types of data and identifiers, application domains and data formats. EDAM supports semantic annotation of diverse entities such as Web services, databases, programmatic libraries, standalone tools, interactive applications, data schemas, datasets and publications within bioinformatics. EDAM applies to organizing and finding suitable tools and data and to automating their integration into complex applications or workflows. It includes over 2200 defined concepts and has successfully been used for annotations and implementations. AVAILABILITY The latest stable version of EDAM is available in OWL format from http://edamontology.org/EDAM.owl and in OBO format from http://edamontology.org/EDAM.obo. It can be viewed online at the NCBO BioPortal and the EBI Ontology Lookup Service. For documentation and license please refer to http://edamontology.org. This article describes version 1.2 available at http://edamontology.org/EDAM_1.2.owl. CONTACT jison@ebi.ac.uk.
Collapse
Affiliation(s)
- Jon Ison
- EMBL European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Radom M, Rybarczyk A, Kottmann R, Formanowicz P, Szachniuk M, Glöckner FO, Rebholz-Schuhmann D, Błażewicz J. Poseidon: An information retrieval and extraction system for metagenomic marine science. ECOL INFORM 2012. [DOI: 10.1016/j.ecoinf.2012.07.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
15
|
Teeling H, Glöckner FO. Current opportunities and challenges in microbial metagenome analysis--a bioinformatic perspective. Brief Bioinform 2012; 13:728-42. [PMID: 22966151 PMCID: PMC3504927 DOI: 10.1093/bib/bbs039] [Citation(s) in RCA: 148] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2012] [Accepted: 06/09/2012] [Indexed: 12/21/2022] Open
Abstract
Metagenomics has become an indispensable tool for studying the diversity and metabolic potential of environmental microbes, whose bulk is as yet non-cultivable. Continual progress in next-generation sequencing allows for generating increasingly large metagenomes and studying multiple metagenomes over time or space. Recently, a new type of holistic ecosystem study has emerged that seeks to combine metagenomics with biodiversity, meta-expression and contextual data. Such 'ecosystems biology' approaches bear the potential to not only advance our understanding of environmental microbes to a new level but also impose challenges due to increasing data complexities, in particular with respect to bioinformatic post-processing. This mini review aims to address selected opportunities and challenges of modern metagenomics from a bioinformatics perspective and hopefully will serve as a useful resource for microbial ecologists and bioinformaticians alike.
Collapse
|
16
|
Kalyana-Sundaram S, Shanmugam A, Chinnaiyan AM. Gene Fusion Markup Language: a prototype for exchanging gene fusion data. BMC Bioinformatics 2012; 13:269. [PMID: 23072312 PMCID: PMC3607969 DOI: 10.1186/1471-2105-13-269] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2011] [Accepted: 10/11/2012] [Indexed: 12/26/2022] Open
Abstract
Background An avalanche of next generation sequencing (NGS) studies has generated an unprecedented amount of genomic structural variation data. These studies have also identified many novel gene fusion candidates with more detailed resolution than previously achieved. However, in the excitement and necessity of publishing the observations from this recently developed cutting-edge technology, no community standardization approach has arisen to organize and represent the data with the essential attributes in an interchangeable manner. As transcriptome studies have been widely used for gene fusion discoveries, the current non-standard mode of data representation could potentially impede data accessibility, critical analyses, and further discoveries in the near future. Results Here we propose a prototype, Gene Fusion Markup Language (GFML) as an initiative to provide a standard format for organizing and representing the significant features of gene fusion data. GFML will offer the advantage of representing the data in a machine-readable format to enable data exchange, automated analysis interpretation, and independent verification. As this database-independent exchange initiative evolves it will further facilitate the formation of related databases, repositories, and analysis tools. The GFML prototype is made available at
http://code.google.com/p/gfml-prototype/. Conclusion The Gene Fusion Markup Language (GFML) presented here could facilitate the development of a standard format for organizing, integrating and representing the significant features of gene fusion data in an inter-operable and query-able fashion that will enable biologically intuitive access to gene fusion findings and expedite functional characterization. A similar model is envisaged for other NGS data analyses.
Collapse
Affiliation(s)
- Shanker Kalyana-Sundaram
- Michigan Center for Translational Pathology, Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | | | | |
Collapse
|
17
|
Robbins RJ, Beach J, Blum S, Dawyndt P, Deck J, Kottmann R, Morrison N, Tuama EÓ, San Gil I, Vieglas D, Wieczorek J, Wooley J. RCN4GSC Meeting Report: Initiating a Testbed for Managing Data at the Interface of Biodiversity and Genomics/Metagenomics, May 2011. Stand Genomic Sci 2012; 7:171-4. [PMID: 23409219 PMCID: PMC3558955 DOI: 10.4056/sigs.3176515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Following up on efforts from two earlier workshops, a meeting was convened in San Diego to (a) establish working connections between experts in the use of the Darwin Core and the GSC MIxS standards, (b) conduct mutual briefings to promote knowledge exchange and to increase the understanding of the two communities’ approaches, constraints, community goals, subtleties, etc., (c) perform an element-by-element comparison of the two standards, assessing the compatibility and complementarity of the two approaches, (d) propose and consider possible use cases and test beds in which a joint annotation approach might be tried, to useful scientific effect, and (e) propose additional action items necessary to continue the development of this joint effort. Several focused working teams were identified to continue the work after the meeting ended.
Collapse
|
18
|
Logares R, Haverkamp TH, Kumar S, Lanzén A, Nederbragt AJ, Quince C, Kauserud H. Environmental microbiology through the lens of high-throughput DNA sequencing: Synopsis of current platforms and bioinformatics approaches. J Microbiol Methods 2012; 91:106-13. [DOI: 10.1016/j.mimet.2012.07.017] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2012] [Revised: 07/19/2012] [Accepted: 07/23/2012] [Indexed: 10/28/2022]
|
19
|
|
20
|
The user's view on biodiversity data sharing — Investigating facts of acceptance and requirements to realize a sustainable use of research data —. ECOL INFORM 2012. [DOI: 10.1016/j.ecoinf.2012.03.004] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
21
|
Liolios K, Schriml L, Hirschman L, Pagani I, Nosrat B, Sterk P, White O, Rocca-Serra P, Sansone SA, Taylor C, Kyrpides NC, Field D. The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness. Stand Genomic Sci 2012; 6:438-47. [PMID: 23409217 PMCID: PMC3558968 DOI: 10.4056/sigs.2675953] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Variability in the extent of the descriptions of data ('metadata') held in public repositories forces users to assess the quality of records individually, which rapidly becomes impractical. The scoring of records on the richness of their description provides a simple, objective proxy measure for quality that enables filtering that supports downstream analysis. Pivotally, such descriptions should spur on improvements. Here, we introduce such a measure - the 'Metadata Coverage Index' (MCI): the percentage of available fields actually filled in a record or description. MCI scores can be calculated across a database, for individual records or for their component parts (e.g., fields of interest). There are many potential uses for this simple metric: for example; to filter, rank or search for records; to assess the metadata availability of an ad hoc collection; to determine the frequency with which fields in a particular record type are filled, especially with respect to standards compliance; to assess the utility of specific tools and resources, and of data capture practice more generally; to prioritize records for further curation; to serve as performance metrics of funded projects; or to quantify the value added by curation. Here we demonstrate the utility of MCI scores using metadata from the Genomes Online Database (GOLD), including records compliant with the 'Minimum Information about a Genome Sequence' (MIGS) standard developed by the Genomic Standards Consortium. We discuss challenges and address the further application of MCI scores; to show improvements in annotation quality over time, to inform the work of standards bodies and repository providers on the usability and popularity of their products, and to assess and credit the work of curators. Such an index provides a step towards putting metadata capture practices and in the future, standards compliance, into a quantitative and objective framework.
Collapse
Affiliation(s)
- Konstantinos Liolios
- Microbial Genomics and Metagenomic Super Program, Department of Energy Joint Genome Institute, Walnut Creek, CA, USA
| | - Lynn Schriml
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | | | - Ioanna Pagani
- Microbial Genomics and Metagenomic Super Program, Department of Energy Joint Genome Institute, Walnut Creek, CA, USA
| | - Bahador Nosrat
- Microbial Genomics and Metagenomic Super Program, Department of Energy Joint Genome Institute, Walnut Creek, CA, USA
| | - Peter Sterk
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Owen White
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | | | | | - Chris Taylor
- European Molecular Biology Laboratory (EMBL) Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Cambridge, UK
| | - Nikos C. Kyrpides
- Microbial Genomics and Metagenomic Super Program, Department of Energy Joint Genome Institute, Walnut Creek, CA, USA
| | - Dawn Field
- University of Oxford, Oxford e-Research Centre, Oxford, UK
- Centre for Ecology & Hydrology, Wallingford, Oxfordshire, UK
| |
Collapse
|
22
|
Gilbert JA, Bao Y, Wang H, Sansone SA, Edmunds SC, Morrison N, Meyer F, Schriml LM, Davies N, Sterk P, Wilkening J, Garrity GM, Field D, Robbins R, Smith DP, Mizrachi I, Moreau C. Report of the 13(th) Genomic Standards Consortium Meeting, Shenzhen, China, March 4-7, 2012. Stand Genomic Sci 2012; 6:276-86. [PMID: 22768370 PMCID: PMC3387801 DOI: 10.4056/sigs.2876184] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
This report details the outcome of the 13(th) Meeting of the Genomic Standards Consortium. The three-day conference was held at the Kingkey Palace Hotel, Shenzhen, China, on March 5-7, 2012, and was hosted by the Beijing Genomics Institute. The meeting, titled From Genomes to Interactions to Communities to Models, highlighted the role of data standards associated with genomic, metagenomic, and amplicon sequence data and the contextual information associated with the sample. To this end the meeting focused on genomic projects for animals, plants, fungi, and viruses; metagenomic studies in host-microbe interactions; and the dynamics of microbial communities. In addition, the meeting hosted a Genomic Observatories Network session, a Genomic Standards Consortium biodiversity working group session, and a Microbiology of the Built Environment session sponsored by the Alfred P. Sloan Foundation.
Collapse
|
23
|
Zaneveld JRR, Parfrey LW, Van Treuren W, Lozupone C, Clemente JC, Knights D, Stombaugh J, Kuczynski J, Knight R. Combined phylogenetic and genomic approaches for the high-throughput study of microbial habitat adaptation. Trends Microbiol 2011; 19:472-82. [PMID: 21872475 PMCID: PMC3184378 DOI: 10.1016/j.tim.2011.07.006] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2011] [Revised: 07/22/2011] [Accepted: 07/25/2011] [Indexed: 01/21/2023]
Abstract
High-throughput sequencing technologies provide new opportunities to address longstanding questions about habitat adaptation in microbial organisms. How have microbes managed to adapt to such a wide range of environments, and what genomic features allow for such adaptation? We review recent large-scale studies of habitat adaptation, with emphasis on those that utilize phylogenetic techniques. On the basis of current trends, we summarize methodological challenges faced by investigators, and the tools, techniques and analytical approaches available to overcome them. Phylogenetic approaches and detailed information about each environmental sample will be crucial as the ability to collect genome sequences continues to expand.
Collapse
Affiliation(s)
- Jesse R R Zaneveld
- Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, CO 80309, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Hankeln W, Wendel NJ, Gerken J, Waldmann J, Buttigieg PL, Kostadinov I, Kottmann R, Yilmaz P, Glöckner FO. CDinFusion--submission-ready, on-line integration of sequence and contextual data. PLoS One 2011; 6:e24797. [PMID: 21935468 PMCID: PMC3172294 DOI: 10.1371/journal.pone.0024797] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2011] [Accepted: 08/19/2011] [Indexed: 11/19/2022] Open
Abstract
State of the art (DNA) sequencing methods applied in "Omics" studies grant insight into the 'blueprints' of organisms from all domains of life. Sequencing is carried out around the globe and the data is submitted to the public repositories of the International Nucleotide Sequence Database Collaboration. However, the context in which these studies are conducted often gets lost, because experimental data, as well as information about the environment are rarely submitted along with the sequence data. If these contextual or metadata are missing, key opportunities of comparison and analysis across studies and habitats are hampered or even impossible. To address this problem, the Genomic Standards Consortium (GSC) promotes checklists and standards to better describe our sequence data collection and to promote the capturing, exchange and integration of sequence data with contextual data. In a recent community effort the GSC has developed a series of recommendations for contextual data that should be submitted along with sequence data. To support the scientific community to significantly enhance the quality and quantity of contextual data in the public sequence data repositories, specialized software tools are needed. In this work we present CDinFusion, a web-based tool to integrate contextual and sequence data in (Multi)FASTA format prior to submission. The tool is open source and available under the Lesser GNU Public License 3. A public installation is hosted and maintained at the Max Planck Institute for Marine Microbiology at http://www.megx.net/cdinfusion. The tool may also be installed locally using the open source code available at http://code.google.com/p/cdinfusion.
Collapse
Affiliation(s)
- Wolfgang Hankeln
- Max Planck Institute for Marine Microbiology, Bremen, Germany
- Jacobs University gGmbH, Bremen, Germany
| | - Norma Johanna Wendel
- Max Planck Institute for Marine Microbiology, Bremen, Germany
- Fachhochschule Bingen, Bingen am Rhein, Germany
| | - Jan Gerken
- Max Planck Institute for Marine Microbiology, Bremen, Germany
- Jacobs University gGmbH, Bremen, Germany
| | - Jost Waldmann
- Max Planck Institute for Marine Microbiology, Bremen, Germany
| | - Pier Luigi Buttigieg
- Max Planck Institute for Marine Microbiology, Bremen, Germany
- Jacobs University gGmbH, Bremen, Germany
| | - Ivaylo Kostadinov
- Max Planck Institute for Marine Microbiology, Bremen, Germany
- Jacobs University gGmbH, Bremen, Germany
| | - Renzo Kottmann
- Max Planck Institute for Marine Microbiology, Bremen, Germany
| | - Pelin Yilmaz
- Max Planck Institute for Marine Microbiology, Bremen, Germany
- Jacobs University gGmbH, Bremen, Germany
| | - Frank Oliver Glöckner
- Max Planck Institute for Marine Microbiology, Bremen, Germany
- Jacobs University gGmbH, Bremen, Germany
| |
Collapse
|
25
|
Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat Biotechnol 2011; 29:415-20. [PMID: 21552244 DOI: 10.1038/nbt.1823] [Citation(s) in RCA: 452] [Impact Index Per Article: 34.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences--the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The 'environmental packages' apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.
Collapse
|
26
|
Field D, Amaral-Zettler L, Cochrane G, Cole JR, Dawyndt P, Garrity GM, Gilbert J, Glöckner FO, Hirschman L, Karsch-Mizrachi I, Klenk HP, Knight R, Kottmann R, Kyrpides N, Meyer F, San Gil I, Sansone SA, Schriml LM, Sterk P, Tatusova T, Ussery DW, White O, Wooley J. The Genomic Standards Consortium. PLoS Biol 2011; 9:e1001088. [PMID: 21713030 PMCID: PMC3119656 DOI: 10.1371/journal.pbio.1001088] [Citation(s) in RCA: 135] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
A vast and rich body of information has grown up as a result of the world's enthusiasm for 'omics technologies. Finding ways to describe and make available this information that maximise its usefulness has become a major effort across the 'omics world. At the heart of this effort is the Genomic Standards Consortium (GSC), an open-membership organization that drives community-based standardization activities, Here we provide a short history of the GSC, provide an overview of its range of current activities, and make a call for the scientific community to join forces to improve the quality and quantity of contextual information about our public collections of genomes, metagenomes, and marker gene sequences.
Collapse
Affiliation(s)
- Dawn Field
- Centre for Ecology & Hydrology, Maclean Building, Crowmarsh Gifford, Wallingford, Oxfordshire, United Kingdom.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Glöckner FO, Joint I. Marine microbial genomics in Europe: current status and perspectives. Microb Biotechnol 2011; 3:523-30. [PMID: 20953416 PMCID: PMC2948668 DOI: 10.1111/j.1751-7915.2010.00169.x] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2009] [Accepted: 02/06/2010] [Indexed: 11/29/2022] Open
Abstract
The oceans are the Earth's largest ecosystem, covering 70% of our planet and providing goods and services for the majority of the world's population. Understanding the complex abiotic and biotic processes on the micro‐ to macroscale is the key to protect and sustain the marine ecosystem. Marine microorganisms are the ‘gatekeepers’ of the biotic processes that control the global cycles of energy and organic matter. A multinational, multidisciplinary approach, bringing together research on oceanography, biodiversity and genomics, is now needed to understand and finally predict the complex responses of the marine ecosystem to ongoing global changes. Such an integrative approach will not only bring better understanding of the complex interplay of the organisms with their environment, but will reveal a wealth of new metabolic processes and functions, which have a high potential for biotechnological applications. This potential has already been recognized by the European commission which funded a series of workshops and projects on marine genomics in the sixth and seventh framework programme. Nevertheless, there remain many obstacles to achieving the goal – such as a lack of bioinformatics tailored for the marine field, consistent data acquisition and exchange, as well as continuous monitoring programmes and a lack of relevant marine bacterial models. Marine ecosystems research is complex and challenging, but it also harbours the opportunity to cross the borders between disciplines and countries to finally create a rewarding marine research era that is more than the sum of its parts.
Collapse
Affiliation(s)
- Frank Oliver Glöckner
- Microbial Genomics Group, Max Planck Institute for Marine Microbiology, D-28359 Bremen, Germany.
| | | |
Collapse
|
28
|
Duhaime MB, Kottmann R, Field D, Glöckner FO. Enriching public descriptions of marine phages using the Genomic Standards Consortium MIGS standard. Stand Genomic Sci 2011; 4:271-85. [PMID: 21677864 PMCID: PMC3111985 DOI: 10.4056/sigs.621069] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
In any sequencing project, the possible depth of comparative analysis is determined largely by the amount and quality of the accompanying contextual data. The structure, content, and storage of this contextual data should be standardized to ensure consistent coverage of all sequenced entities and facilitate comparisons. The Genomic Standards Consortium (GSC) has developed the “Minimum Information about Genome/Metagenome Sequences (MIGS/MIMS)” checklist for the description of genomes and here we annotate all 30 publicly available marine bacteriophage sequences to the MIGS standard. These annotations build on existing International Nucleotide Sequence Database Collaboration (INSDC) records, and confirm, as expected that current submissions lack most MIGS fields. MIGS fields were manually curated from the literature and placed in XML format as specified by the Genomic Contextual Data Markup Language (GCDML). These “machine-readable” reports were then analyzed to highlight patterns describing this collection of genomes. Completed reports are provided in GCDML. This work represents one step towards the annotation of our complete collection of genome sequences and shows the utility of capturing richer metadata along with raw sequences.
Collapse
|
29
|
Yilmaz P, Gilbert JA, Knight R, Amaral-Zettler L, Karsch-Mizrachi I, Cochrane G, Nakamura Y, Sansone SA, Glöckner FO, Field D. The genomic standards consortium: bringing standards to life for microbial ecology. ISME JOURNAL 2011; 5:1565-7. [PMID: 21472015 PMCID: PMC3176512 DOI: 10.1038/ismej.2011.39] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Pelin Yilmaz
- Microbial Genomics and Bioinformatics Group, Max Planck Institute for Marine Microbiology, Bremen, Germany
| | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Glass E, Meyer F, Gilbert JA, Field D, Hunter S, Kottmann R, Kyrpides N, Sansone S, Schriml L, Sterk P, White O, Wooley J. Meeting Report from the Genomic Standards Consortium (GSC) Workshop 10. Stand Genomic Sci 2010; 3:225-31. [PMID: 21304723 PMCID: PMC3035307 DOI: 10.4056/sigs.1423520] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
This report summarizes the proceedings of the 10th workshop of the Genomic Standards Consortium (GSC), held at Argonne National Laboratory, IL, USA. It was the second GSC workshop to have open registration and attracted over 60 participants who worked together to progress the full range of projects ongoing within the GSC. Overall, the primary focus of the workshop was on advancing the M5 platform for next-generation collaborative computational infrastructures. Other key outcomes included the formation of a GSC working group focused on MIGS/MIMS/MIENS compliance using the ISA software suite and the formal launch of the GSC Developer Working Group. Further information about the GSC and its range of activities can be found at http://gensc.org/.
Collapse
|
31
|
Gilbert JA, Meyer F, Knight R, Field D, Kyrpides N, Yilmaz P, Wooley J. Meeting report: GSC M5 roundtable at the 13th International Society for Microbial Ecology meeting in Seattle, WA, USA August 22-27, 2010. Stand Genomic Sci 2010; 3:235-9. [PMID: 21304725 PMCID: PMC3035306 DOI: 10.4056/sigs.1333437] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
This report summarizes the proceedings of the Metagenomics, Metadata, Metaanalysis, Models and Metainfrastructure (M5) Roundtable at the 13th International Society for Microbial Ecology Meeting in Seattle, WA, USA August 22-27, 2010. The Genomic Standards Consortium (GSC) hosted this meeting as a community engagement exercise to describe the GSC to the microbial ecology community during this important international meeting. The roundtable included five talks given by members of the GSC, and was followed by audience participation in the form of a roundtable discussion. This report summarizes this event. Further information on the GSC and its range of activities can be found at http://www.gensc.org.
Collapse
|
32
|
Davidsen T, Madupu R, Sterk P, Field D, Garrity G, Gilbert J, Glöckner FO, Hirschman L, Kolker E, Kottmann R, Kyrpides N, Meyer F, Morrison N, Schriml L, Tatusova T, Wooley J. Meeting Report from the Genomic Standards Consortium (GSC) Workshop 9. Stand Genomic Sci 2010; 3:216-24. [PMID: 21304722 PMCID: PMC3035308 DOI: 10.4056/sigs.1353455] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
This report summarizes the proceedings of the 9th workshop of the Genomic Standards Consortium (GSC), held at the J. Craig Venter Institute, Rockville, MD, USA. It was the first GSC workshop to have open registration and attracted over 90 participants. This workshop featured sessions that provided overviews of the full range of ongoing GSC projects. It included sessions on Standards in Genomic Sciences, the open access journal of the GSC, building standards for genome annotation, the M5 platform for next-generation collaborative computational infrastructures, building ties with the biodiversity research community and two discussion panels with government and industry participants. Progress was made on all fronts, and major outcomes included the completion of the MIENS specification for publication and the formation of the Biodiversity working group.
Collapse
|
33
|
Kalas M, Puntervoll P, Joseph A, Bartaseviciūte E, Töpfer A, Venkataraman P, Pettifer S, Bryne JC, Ison J, Blanchet C, Rapacki K, Jonassen I. BioXSD: the common data-exchange format for everyday bioinformatics web services. Bioinformatics 2010; 26:i540-6. [PMID: 20823319 PMCID: PMC2935419 DOI: 10.1093/bioinformatics/btq391] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Motivation: The world-wide community of life scientists has access to a large number of public bioinformatics databases and tools, which are developed and deployed using diverse technologies and designs. More and more of the resources offer programmatic web-service interface. However, efficient use of the resources is hampered by the lack of widely used, standard data-exchange formats for the basic, everyday bioinformatics data types. Results: BioXSD has been developed as a candidate for standard, canonical exchange format for basic bioinformatics data. BioXSD is represented by a dedicated XML Schema and defines syntax for biological sequences, sequence annotations, alignments and references to resources. We have adapted a set of web services to use BioXSD as the input and output format, and implemented a test-case workflow. This demonstrates that the approach is feasible and provides smooth interoperability. Semantics for BioXSD is provided by annotation with the EDAM ontology. We discuss in a separate section how BioXSD relates to other initiatives and approaches, including existing standards and the Semantic Web. Availability: The BioXSD 1.0 XML Schema is freely available at http://www.bioxsd.org/BioXSD-1.0.xsd under the Creative Commons BY-ND 3.0 license. The http://bioxsd.org web page offers documentation, examples of data in BioXSD format, example workflows with source codes in common programming languages, an updated list of compatible web services and tools and a repository of feature requests from the community. Contact:matus.kalas@bccs.uib.no; developers@bioxsd.org; support@bioxsd.org
Collapse
Affiliation(s)
- Matús Kalas
- Bergen Center for Computational Science, Uni Research, Bergen, Norway.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Verslyppe B, Kottmann R, De Smet W, De Baets B, De Vos P, Dawyndt P. Microbiological Common Language (MCL): a standard for electronic information exchange in the Microbial Commons. Res Microbiol 2010; 161:439-45. [DOI: 10.1016/j.resmic.2010.02.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2009] [Revised: 01/22/2010] [Accepted: 02/12/2010] [Indexed: 10/19/2022]
|
35
|
Hankeln W, Buttigieg PL, Fink D, Kottmann R, Yilmaz P, Glöckner FO. MetaBar - a tool for consistent contextual data acquisition and standards compliant submission. BMC Bioinformatics 2010; 11:358. [PMID: 20591175 PMCID: PMC2912304 DOI: 10.1186/1471-2105-11-358] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2010] [Accepted: 06/30/2010] [Indexed: 11/10/2022] Open
Abstract
Background Environmental sequence datasets are increasing at an exponential rate; however, the vast majority of them lack appropriate descriptors like sampling location, time and depth/altitude: generally referred to as metadata or contextual data. The consistent capture and structured submission of these data is crucial for integrated data analysis and ecosystems modeling. The application MetaBar has been developed, to support consistent contextual data acquisition. Results MetaBar is a spreadsheet and web-based software tool designed to assist users in the consistent acquisition, electronic storage, and submission of contextual data associated to their samples. A preconfigured Microsoft® Excel® spreadsheet is used to initiate structured contextual data storage in the field or laboratory. Each sample is given a unique identifier and at any stage the sheets can be uploaded to the MetaBar database server. To label samples, identifiers can be printed as barcodes. An intuitive web interface provides quick access to the contextual data in the MetaBar database as well as user and project management capabilities. Export functions facilitate contextual and sequence data submission to the International Nucleotide Sequence Database Collaboration (INSDC), comprising of the DNA DataBase of Japan (DDBJ), the European Molecular Biology Laboratory database (EMBL) and GenBank. MetaBar requests and stores contextual data in compliance to the Genomic Standards Consortium specifications. The MetaBar open source code base for local installation is available under the GNU General Public License version 3 (GNU GPL3). Conclusion The MetaBar software supports the typical workflow from data acquisition and field-sampling to contextual data enriched sequence submission to an INSDC database. The integration with the megx.net marine Ecological Genomics database and portal facilitates georeferenced data integration and metadata-based comparisons of sampling sites as well as interactive data visualization. The ample export functionalities and the INSDC submission support enable exchange of data across disciplines and safeguarding contextual data.
Collapse
|
36
|
Valdivia-Granda WA. Bioinformatics for biodefense: challenges and opportunities. Biosecur Bioterror 2010; 8:69-77. [PMID: 20230234 DOI: 10.1089/bsp.2009.0024] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The intentional release of traditional or combinatorial bioweapons remains one of the most important challenges that will continue to shape homeland security. The misuse of dual-use and how-to methods and techniques in the fields of molecular, synthetic, and computational biology can lessen the technical barriers for launching attacks, even for small groups or individuals. Bioinformatics is guiding the implementation of several biodefense countermeasures. However, existing algorithms have not effectively translated available pathogen genomic data into standardized diagnostics, rational vaccine development, or broad spectrum therapeutics. Despite its potential, bioinformatics has a limited impact on forensic and intelligence operations. More than 12 biodefense databases and information exchange architectures lack interoperability and a common layer that restricts scalability and the development of biodefense enterprises. Therefore, in order to use next-generation genome sequencing for medical intelligence, forensic operations, biothreat awareness, and mitigation, the attention has to be redirected toward the development of computational biology applications. This article debates some of the challenges that the bioinformatics field confronts in terms of biodefense problems and proposes potential opportunities to use pathogen genomic data. Issues related to the analysis of pathogen genomes and emerging methods including genomic barcoding, active curation, and knowledge management and their impact on intelligence, forensics, and policymaking are discussed.
Collapse
|
37
|
Tamames J, de Lorenzo V. EnvMine: a text-mining system for the automatic extraction of contextual information. BMC Bioinformatics 2010; 11:294. [PMID: 20515448 PMCID: PMC2901371 DOI: 10.1186/1471-2105-11-294] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2009] [Accepted: 06/01/2010] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND For ecological studies, it is crucial to count on adequate descriptions of the environments and samples being studied. Such a description must be done in terms of their physicochemical characteristics, allowing a direct comparison between different environments that would be difficult to do otherwise. Also the characterization must include the precise geographical location, to make possible the study of geographical distributions and biogeographical patterns. Currently, there is no schema for annotating these environmental features, and these data have to be extracted from textual sources (published articles). So far, this had to be performed by manual inspection of the corresponding documents. To facilitate this task, we have developed EnvMine, a set of text-mining tools devoted to retrieve contextual information (physicochemical variables and geographical locations) from textual sources of any kind. RESULTS EnvMine is capable of retrieving the physicochemical variables cited in the text, by means of the accurate identification of their associated units of measurement. In this task, the system achieves a recall (percentage of items retrieved) of 92% with less than 1% error. Also a Bayesian classifier was tested for distinguishing parts of the text describing environmental characteristics from others dealing with, for instance, experimental settings.Regarding the identification of geographical locations, the system takes advantage of existing databases such as GeoNames to achieve 86% recall with 92% precision. The identification of a location includes also the determination of its exact coordinates (latitude and longitude), thus allowing the calculation of distance between the individual locations. CONCLUSION EnvMine is a very efficient method for extracting contextual information from different text sources, like published articles or web pages. This tool can help in determining the precise location and physicochemical variables of sampling sites, thus facilitating the performance of ecological analyses. EnvMine can also help in the development of standards for the annotation of environmental features.
Collapse
Affiliation(s)
- Javier Tamames
- Centro Nacional de Biotecnología (CNB), CSIC, C/Darwin 3, 28049 Madrid, Spain.
| | | |
Collapse
|
38
|
Pfister CA, Meyer F, Antonopoulos DA. Metagenomic profiling of a microbial assemblage associated with the California mussel: a node in networks of carbon and nitrogen cycling. PLoS One 2010; 5:e10518. [PMID: 20463896 PMCID: PMC2865538 DOI: 10.1371/journal.pone.0010518] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2009] [Accepted: 04/06/2010] [Indexed: 11/19/2022] Open
Abstract
Mussels are conspicuous and often abundant members of rocky shores and may constitute an important site for the nitrogen cycle due to their feeding and excretion activities. We used shotgun metagenomics of the microbial community associated with the surface of mussels (Mytilus californianus) on Tatoosh Island in Washington state to test whether there is a nitrogen-based microbial assemblage associated with mussels. Analyses of both tidepool mussels and those on emergent benches revealed a diverse community of Bacteria and Archaea with approximately 31 million bp from 6 mussels in each habitat. Using MG-RAST, between 22.5–25.6% were identifiable using the SEED non-redundant database for proteins. Of those fragments that were identifiable through MG-RAST, the composition was dominated by Cyanobacteria and Alpha- and Gamma-proteobacteria. Microbial composition was highly similar between the tidepool and emergent bench mussels, suggesting similar functions across these different microhabitats. One percent of the proteins identified in each sample were related to nitrogen cycling. When normalized to protein discovery rate, the high diversity and abundance of enzymes related to the nitrogen cycle in mussel-associated microbes is as great or greater than that described for other marine metagenomes. In some instances, the nitrogen-utilizing profile of this assemblage was more concordant with soil metagenomes in the Midwestern U.S. than for open ocean system. Carbon fixation and Calvin cycle enzymes further represented 0.65 and 1.26% of all proteins and their abundance was comparable to a number of open ocean marine metagenomes. In sum, the diversity and abundance of nitrogen and carbon cycle related enzymes in the microbes occupying the shells of Mytilus californianus suggest these mussels provide a node for microbial populations and thus biogeochemical processes.
Collapse
Affiliation(s)
- Catherine A Pfister
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America.
| | | | | |
Collapse
|
39
|
Abstract
Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics.
Collapse
Affiliation(s)
- John C. Wooley
- Community Cyberinfrastructure for Marine Microbial Ecology Research and Analysis, California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, California, United States of America
| | - Adam Godzik
- Community Cyberinfrastructure for Marine Microbial Ecology Research and Analysis, California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, California, United States of America
- Program in Bioinformatics and Systems Biology, Burnham Institute for Medical Research, La Jolla, California, United States of America
| | - Iddo Friedberg
- Department of Microbiology, Miami University, Oxford, Ohio, United States of America
- Department of Computer Science and Software Engineering, Miami University, Oxford, Ohio, United States of America
| |
Collapse
|
40
|
Kottmann R, Kostadinov I, Duhaime MB, Buttigieg PL, Yilmaz P, Hankeln W, Waldmann J, Glöckner FO. Megx.net: integrated database resource for marine ecological genomics. Nucleic Acids Res 2010; 38:D391-5. [PMID: 19858098 PMCID: PMC2808895 DOI: 10.1093/nar/gkp918] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2009] [Accepted: 10/08/2009] [Indexed: 11/28/2022] Open
Abstract
Megx.net is a database and portal that provides integrated access to georeferenced marker genes, environment data and marine genome and metagenome projects for microbial ecological genomics. All data are stored in the Microbial Ecological Genomics DataBase (MegDB), which is subdivided to hold both sequence and habitat data and global environmental data layers. The extended system provides access to several hundreds of genomes and metagenomes from prokaryotes and phages, as well as over a million small and large subunit ribosomal RNA sequences. With the refined Genes Mapserver, all data can be interactively visualized on a world map and statistics describing environmental parameters can be calculated. Sequence entries have been curated to comply with the proposed minimal standards for genomes and metagenomes (MIGS/MIMS) of the Genomic Standards Consortium. Access to data is facilitated by Web Services. The updated megx.net portal offers microbial ecologists greatly enhanced database content, and new features and tools for data analysis, all of which are freely accessible from our webpage http://www.megx.net.
Collapse
Affiliation(s)
- Renzo Kottmann
- Microbial Genomics Group, Max Planck Institute for Marine Microbiology, D-28359 Bremen and Jacobs University Bremen gGmbH, D-28759 Bremen, Germany
| | - Ivalyo Kostadinov
- Microbial Genomics Group, Max Planck Institute for Marine Microbiology, D-28359 Bremen and Jacobs University Bremen gGmbH, D-28759 Bremen, Germany
| | - Melissa Beth Duhaime
- Microbial Genomics Group, Max Planck Institute for Marine Microbiology, D-28359 Bremen and Jacobs University Bremen gGmbH, D-28759 Bremen, Germany
| | - Pier Luigi Buttigieg
- Microbial Genomics Group, Max Planck Institute for Marine Microbiology, D-28359 Bremen and Jacobs University Bremen gGmbH, D-28759 Bremen, Germany
| | - Pelin Yilmaz
- Microbial Genomics Group, Max Planck Institute for Marine Microbiology, D-28359 Bremen and Jacobs University Bremen gGmbH, D-28759 Bremen, Germany
| | - Wolfgang Hankeln
- Microbial Genomics Group, Max Planck Institute for Marine Microbiology, D-28359 Bremen and Jacobs University Bremen gGmbH, D-28759 Bremen, Germany
| | - Jost Waldmann
- Microbial Genomics Group, Max Planck Institute for Marine Microbiology, D-28359 Bremen and Jacobs University Bremen gGmbH, D-28759 Bremen, Germany
| | - Frank Oliver Glöckner
- Microbial Genomics Group, Max Planck Institute for Marine Microbiology, D-28359 Bremen and Jacobs University Bremen gGmbH, D-28759 Bremen, Germany
| |
Collapse
|
41
|
Field D, Friedberg I, Sterk P, Kottmann R, Glöckner FO, Hirschman L, Garrity GM, Cochrane G, Wooley J, Gilbert J. Meeting Report: "Metagenomics, Metadata and Meta-analysis" (M3) Special Interest Group at ISMB 2009. Stand Genomic Sci 2009; 1:278-82. [PMID: 21304668 PMCID: PMC3035241 DOI: 10.4056/sigs.641096] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
This report summarizes the proceedings of the “Metagenomics, Metadata and Meta-analysis” (M3) Special Interest Group (SIG) meeting held at the Intelligent Systems for Molecular Biology 2009 conference. The Genomic Standards Consortium (GSC) hosted this meeting to explore the bottlenecks and emerging solutions for obtaining biological insights through large-scale comparative analysis of metagenomic datasets. The M3 SIG included 16 talks, half of which were selected from submitted abstracts, a poster session and a panel discussion involving members of the GSC Board. This report summarizes this one-day SIG, attempts to identify shared themes and recapitulates community recommendations for the future of this field. The GSC will also host an M3 workshop at the Pacific Symposium on Biocomputing (PSB) in January 2010. Further information about the GSC and its range of activities can be found at http://gensc.org/.
Collapse
|
42
|
Wagener J, Spjuth O, Willighagen EL, Wikberg JES. XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous web services. BMC Bioinformatics 2009; 10:279. [PMID: 19732427 PMCID: PMC2755485 DOI: 10.1186/1471-2105-10-279] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2009] [Accepted: 09/04/2009] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Life sciences make heavily use of the web for both data provision and analysis. However, the increasing amount of available data and the diversity of analysis tools call for machine accessible interfaces in order to be effective. HTTP-based Web service technologies, like the Simple Object Access Protocol (SOAP) and REpresentational State Transfer (REST) services, are today the most common technologies for this in bioinformatics. However, these methods have severe drawbacks, including lack of discoverability, and the inability for services to send status notifications. Several complementary workarounds have been proposed, but the results are ad-hoc solutions of varying quality that can be difficult to use. RESULTS We present a novel approach based on the open standard Extensible Messaging and Presence Protocol (XMPP), consisting of an extension (IO Data) to comprise discovery, asynchronous invocation, and definition of data types in the service. That XMPP cloud services are capable of asynchronous communication implies that clients do not have to poll repetitively for status, but the service sends the results back to the client upon completion. Implementations for Bioclipse and Taverna are presented, as are various XMPP cloud services in bio- and cheminformatics. CONCLUSION XMPP with its extensions is a powerful protocol for cloud services that demonstrate several advantages over traditional HTTP-based Web services: 1) services are discoverable without the need of an external registry, 2) asynchronous invocation eliminates the need for ad-hoc solutions like polling, and 3) input and output types defined in the service allows for generation of clients on the fly without the need of an external semantics description. The many advantages over existing technologies make XMPP a highly interesting candidate for next generation online services in bioinformatics.
Collapse
Affiliation(s)
- Johannes Wagener
- Max von Pettenkofer-Institut, Ludwig-Maximilians-Universität, Munich, Germany.
| | | | | | | |
Collapse
|
43
|
Nelson OW, Harrison SH, Garrity GM. Meeting report for SIGS1: First Conference of the Standards in Genomic Sciences eJournal. Stand Genomic Sci 2009; 1:72-6. [PMID: 21304640 PMCID: PMC3035209 DOI: 10.4056/sigs.328] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Oranmiyan W Nelson
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, USA
| | | | | |
Collapse
|
44
|
Field D, Sterk P, Kyrpides N, Kottmann R, Glöckner FO, Hirschman L, Garrity GM, Wooley J, Gilna P. Meeting Report from the Genomic Standards Consortium (GSC) Workshops 6 and 7. Stand Genomic Sci 2009; 1:68-71. [PMID: 21304639 PMCID: PMC3035212 DOI: 10.4056/sigs.25165] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
This report summarizes the proceedings of the 6th and 7th workshops of the Genomic Standards Consortium (GSC), held back-to-back in 2008. GSC 6 focused on furthering the activities of GSC working groups, GSC 7 focused on outreach to the wider community. GSC 6 was held October 10-14, 2008 at the European Bioinformatics Institute, Cambridge, United Kingdom and included a two-day workshop focused on the refinement of the Genomic Contextual Data Markup Language (GCDML). GSC 7 was held as the opening day of the International Congress on Metagenomics 2008 in San Diego California. Major achievements of these combined meetings included an agreement from the International Nucleotide Sequence Database Consortium (INSDC) to create a "MIGS" keyword for capturing "Minimum Information about a Genome Sequence" compliant information within INSDC (DDBJ/EMBL /Genbank) records, launch of GCDML 1.0, MIGS compliance of the first set of "Genomic Encyclopedia of Bacteria and Archaea" project genomes, approval of a proposal to extend MIGS to 16S rRNA sequences within a "Minimum Information about an Environmental Sequence", finalization of plans for the GSC eJournal, "Standards in Genomic Sciences" (SIGS), and the formation of a GSC Board. Subsequently, the GSC has been awarded a Research Co-ordination Network (RCN4GSC) grant from the National Science Foundation, held the first SIGS workshop and launched the journal. The GSC will also be hosting outreach workshops at both ISMB 2009 and PSB 2010 focused on "Metagenomics, Metadata and MetaAnalysis" (M(3)). Further information about the GSC and its range of activities can be found at http://gensc.org, including videos of all the presentations at GSC 7.
Collapse
|
45
|
Wooley JC, Field D, Glöckner FO. Extending Standards for Genomics and Metagenomics Data: A Research Coordination Network for the Genomic Standards Consortium (RCN4GSC). Stand Genomic Sci 2009; 1:87-90. [PMID: 21304642 PMCID: PMC3035207 DOI: 10.4056/sigs.26218] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Through a newly established Research Coordination Network for the Genomic Standards Consortium (RCN4GSC), the GSC will continue its leadership in establishing and integrating genomic standards through community-based efforts. These efforts, undertaken in the context of genomic and metagenomic research aim to ensure the electronic capture of all genomic data and to facilitate the achievement of a community consensus around collecting and managing relevant contextual information connected to the sequence data. The GSC operates as an open, inclusive organization, welcoming inspired biologists with a commitment to community service. Within the collaborative framework of the ongoing, international activities of the GSC, the RCN will expand the range of research domains engaged in these standardization efforts and sustain scientific networking to encourage active participation by the broader community. The RCN4GSC, funded for five years by the US National Science Foundation, will primarily support outcome-focused working meetings and the exchange of early-career scientists between GSC research groups in order to advance key standards contributions such as GCDML. Focusing on the timely delivery of the extant GSC core projects, the RCN will also extend the pioneering efforts of the GSC to engage researchers active in developing ecological, environmental and biodiversity data standards. As the initial goals of the GSC are increasingly achieved, promoting the comprehensive use of effective standards will be essential to ensure the effective use of sequence and associated data, to provide access for all biologists to all of the information, and to create interdisciplinary opportunities for discovery. The RCN will facilitate these implementation activities through participation in major scientific conferences and presentations on scientific advances enabled by community usage of genomic standards.
Collapse
|
46
|
Chervitz SA, Parkinson H, Fostel JM, Causton HC, Sanson SA, Deutsch EW, Field D, Taylor CF, Rocca-Serra P, White J, Stoeckert CJ. Standards for Functional Genomics. Bioinformatics 2009. [DOI: 10.1007/978-0-387-92738-1_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
47
|
Genomes and knowledge - a questionable relationship? Trends Microbiol 2008; 16:512-9. [PMID: 18819801 DOI: 10.1016/j.tim.2008.08.001] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2008] [Revised: 08/15/2008] [Accepted: 08/21/2008] [Indexed: 11/22/2022]
Abstract
The availability of bacterial genome sequences has ushered in an era of post-genomic research - accelerating and often enabling molecular genetic analyses. For bacteriologists focussing on an individual bacterium, comparing genomes has also led to a greater understanding of their favoured organism through contextualization. But how does the value of such contextualization vary with the number of available genomes? It seems that for most genome metrics, comparison against approximately 100 genomes is sufficient, with comparison against further genomes not considerably affecting the contextual knowledge gained. It appears that quality, rather than quantity, might be the most important factor when comparing genomes.
Collapse
|