Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kottmann R, Gray T, Murphy S, Kagan L, Kravitz S, Lombardot T, Field D, Glöckner FO. A standard MIGS/MIMS compliant XML Schema: toward the development of the Genomic Contextual Data Markup Language (GCDML). OMICS 2008;12:115-21. [PMID: 18479204 DOI: 10.1089/omi.2008.0a10] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

For:	Kottmann R, Gray T, Murphy S, Kagan L, Kravitz S, Lombardot T, Field D, Glöckner FO. A standard MIGS/MIMS compliant XML Schema: toward the development of the Genomic Contextual Data Markup Language (GCDML). OMICS 2008;12:115-21. [PMID: 18479204 DOI: 10.1089/omi.2008.0a10] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Number

Cited by Other Article(s)

Eloe-Fadrosh EA, Mungall CJ, Miller MA, Smith M, Patil SS, Kelliher JM, Johnson LYD, Rodriguez FE, Chain PSG, Hu B, Thornton MB, McCue LA, McHardy AC, Harris NL, Reddy TBK, Mukherjee S, Hunter CI, Walls R, Schriml LM. A Practical Approach to Using the Genomic Standards Consortium MIxS Reporting Standard for Comparative Genomics and Metagenomics. Methods Mol Biol 2024;2802:587-609. [PMID: 38819573 DOI: 10.1007/978-1-0716-3838-5_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]

Affiliation(s)

Emiley A Eloe-Fadrosh Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Christopher J Mungall Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Mark Andrew Miller Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Montana Smith Pacific Northwest National Laboratory, Richland, WA, USA
Sujay Sanjeev Patil Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Julia M Kelliher Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
Leah Y D Johnson Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
Francisca E Rodriguez Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
Patrick S G Chain Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
Bin Hu Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
Michael B Thornton Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Lee Ann McCue Pacific Northwest National Laboratory, Richland, WA, USA
Alice Carolyn McHardy Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
Nomi L Harris Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
T B K Reddy DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Supratim Mukherjee DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Christopher I Hunter GigaScience Press, Hong Kong Science Park, Pak Shek Kok, New Territories, Hong Kong
Ramona Walls Critical Path Institute, Tucson, AZ, USA
Lynn M Schriml University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA

Collapse

Baltoumas FA, Karatzas E, Paez-Espino D, Venetsianou NK, Aplakidou E, Oulas A, Finn RD, Ovchinnikov S, Pafilis E, Kyrpides NC, Pavlopoulos GA. Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters. FRONTIERS IN BIOINFORMATICS 2023;3:1157956. [PMID: 36959975 PMCID: PMC10029925 DOI: 10.3389/fbinf.2023.1157956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 02/21/2023] [Indexed: 03/06/2023] Open

Zafeiropoulos H, Beracochea M, Ninidakis S, Exter K, Potirakis A, De Moro G, Richardson L, Corre E, Machado J, Pafilis E, Kotoulas G, Santi I, Finn RD, Cox CJ, Pavloudi C. metaGOflow: a workflow for the analysis of marine Genomic Observatories shotgun metagenomics data. Gigascience 2022;12:giad078. [PMID: 37850871 PMCID: PMC10583283 DOI: 10.1093/gigascience/giad078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 06/30/2023] [Accepted: 09/11/2023] [Indexed: 10/19/2023] Open

Affiliation(s)

Haris Zafeiropoulos Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece KU Leuven, Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, Laboratory of Molecular Bacteriology, 3000 Leuven, Belgium
Martin Beracochea European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Stelios Ninidakis Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece
Katrina Exter Flanders Marine Institute (VLIZ), 8400 Oostende, Belgium
Antonis Potirakis Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece
Gianluca De Moro Centro de Ciências do Mar (CCMAR), Universidade do Algarve, Campus de Gambelas, 8005-139 Faro, Portugal
Lorna Richardson European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Erwan Corre CNRS, FR 2424, ABiMS Platform, Station Biologique de Roscoff (SBR), 29680 Roscoff, France
João Machado Centro de Ciências do Mar (CCMAR), Universidade do Algarve, Campus de Gambelas, 8005-139 Faro, Portugal
Evangelos Pafilis Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece
Georgios Kotoulas Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece
Ioulia Santi Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece European Marine Biological Resource Centre (EMBRC-ERIC), 75005 Paris, France
Robert D Finn European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Cymon J Cox Centro de Ciências do Mar (CCMAR), Universidade do Algarve, Campus de Gambelas, 8005-139 Faro, Portugal
Christina Pavloudi Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece Department of Biological Sciences, The George Washington University, 20052 Washington, DC, USA

Collapse

Poulsen CS, Kaas RS, Aarestrup FM, Pamp SJ. Standard Sample Storage Conditions Have an Impact on Inferred Microbiome Composition and Antimicrobial Resistance Patterns. Microbiol Spectr 2021;9:e0138721. [PMID: 34612701 DOI: 10.1101/2021.05.24.445395] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2023] Open

Abstract

Storage of biological specimens is crucial in the life and medical sciences. Storage conditions for samples can be different for a number of reasons, and it is unclear what effect this can have on the inferred microbiome composition in metagenomics analyses. Here, we assess the effect of common storage temperatures (deep freezer, -80°C; freezer, -20°C; refrigerator, 5°C; room temperature, 22°C) and storage times (immediate sample processing, 0 h; next day, 16 h; over weekend, 64 h; longer term, 4, 8, and 12 months) as well as repeated sample freezing and thawing (2 to 4 freeze-thaw cycles). We examined two different pig feces and sewage samples, unspiked and spiked with a mock community, in triplicate, respectively, amounting to a total of 438 samples (777 Gbp; 5.1 billion reads). Storage conditions had a significant and systematic effect on the taxonomic and functional composition of microbiomes. Distinct microbial taxa and antimicrobial resistance classes were, in some situations, similarly affected across samples, while others were not, suggesting an impact of individual inherent sample characteristics. With an increasing number of freeze-thaw cycles, an increasing abundance of Firmicutes, Actinobacteria, and eukaryotic microorganisms was observed. We provide recommendations for sample storage and strongly suggest including more detailed information in the metadata together with the DNA sequencing data in public repositories to better facilitate meta-analyses and reproducibility of findings. IMPORTANCE Previous research has reported effects of DNA isolation, library preparation, and sequencing technology on metagenomics-based microbiome composition; however, the effect of biospecimen storage conditions has not been thoroughly assessed. We examined the effect of common sample storage conditions on metagenomics-based microbiome composition and found significant and, in part, systematic effects. Repeated freeze-thaw cycles could be used to improve the detection of microorganisms with more rigid cell walls, including parasites. We provide a data set that could also be used for benchmarking algorithms to identify and correct for unwanted batch effects. Overall, the findings suggest that all samples of a microbiome study should be stored in the same way. Furthermore, there is a need to mandate more detailed information about sample storage and processing be published together with DNA sequencing data at the International Nucleotide Sequence Database Collaboration (ENA/EBI, NCBI, DDBJ) or other repositories.

Collapse

Poulsen CS, Kaas RS, Aarestrup FM, Pamp SJ. Standard Sample Storage Conditions Have an Impact on Inferred Microbiome Composition and Antimicrobial Resistance Patterns. Microbiol Spectr 2021;9:e0138721. [PMID: 34612701 PMCID: PMC8510183 DOI: 10.1128/spectrum.01387-21] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 09/02/2021] [Indexed: 12/11/2022] Open

Abstract

Storage of biological specimens is crucial in the life and medical sciences. Storage conditions for samples can be different for a number of reasons, and it is unclear what effect this can have on the inferred microbiome composition in metagenomics analyses. Here, we assess the effect of common storage temperatures (deep freezer, -80°C; freezer, -20°C; refrigerator, 5°C; room temperature, 22°C) and storage times (immediate sample processing, 0 h; next day, 16 h; over weekend, 64 h; longer term, 4, 8, and 12 months) as well as repeated sample freezing and thawing (2 to 4 freeze-thaw cycles). We examined two different pig feces and sewage samples, unspiked and spiked with a mock community, in triplicate, respectively, amounting to a total of 438 samples (777 Gbp; 5.1 billion reads). Storage conditions had a significant and systematic effect on the taxonomic and functional composition of microbiomes. Distinct microbial taxa and antimicrobial resistance classes were, in some situations, similarly affected across samples, while others were not, suggesting an impact of individual inherent sample characteristics. With an increasing number of freeze-thaw cycles, an increasing abundance of Firmicutes, Actinobacteria, and eukaryotic microorganisms was observed. We provide recommendations for sample storage and strongly suggest including more detailed information in the metadata together with the DNA sequencing data in public repositories to better facilitate meta-analyses and reproducibility of findings. IMPORTANCE Previous research has reported effects of DNA isolation, library preparation, and sequencing technology on metagenomics-based microbiome composition; however, the effect of biospecimen storage conditions has not been thoroughly assessed. We examined the effect of common sample storage conditions on metagenomics-based microbiome composition and found significant and, in part, systematic effects. Repeated freeze-thaw cycles could be used to improve the detection of microorganisms with more rigid cell walls, including parasites. We provide a data set that could also be used for benchmarking algorithms to identify and correct for unwanted batch effects. Overall, the findings suggest that all samples of a microbiome study should be stored in the same way. Furthermore, there is a need to mandate more detailed information about sample storage and processing be published together with DNA sequencing data at the International Nucleotide Sequence Database Collaboration (ENA/EBI, NCBI, DDBJ) or other repositories.

Collapse

Marks PC, Bigler M, Alsop EB, Vigneron A, Lomans BP, De Paula R, Geissler B, Tsesmetzis N. MetaHCR: a web-enabled metagenome data management system for hydrocarbon resources. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018;2018:1-10. [PMID: 30212909 PMCID: PMC6146120 DOI: 10.1093/database/bay087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2018] [Accepted: 07/24/2018] [Indexed: 11/16/2022]

Tsesmetzis N, Yilmaz P, Marks PC, Kyrpides NC, Head IM, Lomans BP. MIxS-HCR: a MIxS extension defining a minimal information standard for sequence data from environments pertaining to hydrocarbon resources. Stand Genomic Sci 2016;11:78. [PMID: 27777648 PMCID: PMC5059931 DOI: 10.1186/s40793-016-0203-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2016] [Accepted: 10/05/2016] [Indexed: 11/21/2022] Open

Droege G, Barker K, Seberg O, Coddington J, Benson E, Berendsohn WG, Bunk B, Butler C, Cawsey EM, Deck J, Döring M, Flemons P, Gemeinholzer B, Güntsch A, Hollowell T, Kelbert P, Kostadinov I, Kottmann R, Lawlor RT, Lyal C, Mackenzie-Dodds J, Meyer C, Mulcahy D, Nussbeck SY, O'Tuama É, Orrell T, Petersen G, Robertson T, Söhngen C, Whitacre J, Wieczorek J, Yilmaz P, Zetzsche H, Zhang Y, Zhou X. The Global Genome Biodiversity Network (GGBN) Data Standard specification. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016;2016:baw125. [PMID: 27694206 PMCID: PMC5045859 DOI: 10.1093/database/baw125] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Accepted: 08/09/2016] [Indexed: 11/24/2022]

Affiliation(s)

G Droege Botanic Garden and Botanical Museum Berlin-Dahlem, Freie Universität Berlin, Königin-Luise-Str. 6-8, Berlin 14195, Germany
K Barker National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
O Seberg Natural History Museum of Denmark, University of Copenhagen, Sølvgade 83, opg. S, Copenhagen DK-1307, Denmark
J Coddington National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
E Benson Damar Research Scientists, Damar, Drum Road, Cuparmuir, Fife KY15 5RJ, UK
W G Berendsohn Botanic Garden and Botanical Museum Berlin-Dahlem, Freie Universität Berlin, Königin-Luise-Str. 6-8, Berlin 14195, Germany
B Bunk Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Inhoffenstr. 7B, Braunschweig 38124, Germany
C Butler National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
E M Cawsey Australian National Wildlife Collection, CSIRO National Research Collections Australia, Canberra, Australia
J Deck Berkeley Natural History Museums, University of California at Berkeley, Berkeley, CA 94720, USA
M Döring Global Biodiversity Information Facility Secretariat, Universitetsparken 15, Copenhagen DK-2100, Denmark
P Flemons Australian Museum, Sydney 2010, NSW, Australia
B Gemeinholzer Systematic Botany, Justus Liebig University, Giessen 35392, Germany
A Güntsch Botanic Garden and Botanical Museum Berlin-Dahlem, Freie Universität Berlin, Königin-Luise-Str. 6-8, Berlin 14195, Germany
T Hollowell National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
P Kelbert Botanic Garden and Botanical Museum Berlin-Dahlem, Freie Universität Berlin, Königin-Luise-Str. 6-8, Berlin 14195, Germany
I Kostadinov Department of Life Sciences & Chemistry, Jacobs University Bremen gGmbH, Campus Ring 1, Bremen 28759, Germany
R Kottmann Microbial Genomics and Bioinformatics Research Group, Max Planck Institute for Marine Microbiology, Celsiusstrasse 1, Bremen 28359, Germany
R T Lawlor ARC-Net Applied Research on Cancer Centre, Department of Pathology and Diagnostics, University of Verona, Verona 37134, Italy
C Lyal Natural History Museum, Cromwell Road, London SW7 5BD, UK
J Mackenzie-Dodds Natural History Museum, Cromwell Road, London SW7 5BD, UK
C Meyer National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
D Mulcahy National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
S Y Nussbeck Department of Medical Informatics and UMG Biobank, University Medical Center Göttingen, Robert-Koch-Str. 40, Göttingen 37075, Germany
É O'Tuama Global Biodiversity Information Facility Secretariat, Universitetsparken 15, Copenhagen DK-2100, Denmark
T Orrell National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
G Petersen Natural History Museum of Denmark, University of Copenhagen, Sølvgade 83, opg. S, Copenhagen DK-1307, Denmark
T Robertson Global Biodiversity Information Facility Secretariat, Universitetsparken 15, Copenhagen DK-2100, Denmark
C Söhngen Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Inhoffenstr. 7B, Braunschweig 38124, Germany
J Whitacre National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
J Wieczorek Museum of Vertebrate Zoology, University of California at Berkeley, Berkeley, CA 94720, USA
P Yilmaz Microbial Genomics and Bioinformatics Research Group, Max Planck Institute for Marine Microbiology, Celsiusstrasse 1, Bremen 28359, Germany
H Zetzsche Julius Kuehn-Institute (JKI), Federal Research Centre for Cultivated Plants, Institute for Resistance Research and Stress Tolerance, Erwin-Baur-Str. 27, Quedlinburg 06484, Germany
Y Zhang China National GeneBank, BGI-Shenzhen, Shenzhen, Guangdong 518083, China
X Zhou China National GeneBank, BGI-Shenzhen, Shenzhen, Guangdong 518083, China

Collapse

Li X, Song L, Wang G, Ren L, Yu D, Chen G, Wang X, Yu J, Liu G, Du Z. Complete genome sequence of a deeply branched marine Bacteroidia bacterium Draconibacterium orientale type strain FH5(T). Mar Genomics 2016;26:13-6. [PMID: 26796622 DOI: 10.1016/j.margen.2016.01.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2015] [Revised: 01/05/2016] [Accepted: 01/05/2016] [Indexed: 10/22/2022]

Land M, Hauser L, Jun SR, Nookaew I, Leuze MR, Ahn TH, Karpinets T, Lund O, Kora G, Wassenaar T, Poudel S, Ussery DW. Insights from 20 years of bacterial genome sequencing. Funct Integr Genomics 2015;15:141-61. [PMID: 25722247 PMCID: PMC4361730 DOI: 10.1007/s10142-015-0433-4] [Citation(s) in RCA: 391] [Impact Index Per Article: 43.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Revised: 02/11/2015] [Accepted: 02/12/2015] [Indexed: 12/18/2022]

Abstract

Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.

Collapse

Affiliation(s)

Miriam Land Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
Loren Hauser Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA Joint Institute for Biological Sciences, University of Tennessee, Knoxville, TN 37996 USA Department of Microbiology, University of Tennessee, Knoxville, TN 37996 USA
Se-Ran Jun Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
Intawat Nookaew Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
Michael R. Leuze Computer Science and Mathematics Division, Computer Science Research Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
Tae-Hyuk Ahn Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA Computer Science and Mathematics Division, Computer Science Research Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
Tatiana Karpinets Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
Ole Lund Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, Kgs. Lyngby, 2800 Denmark
Guruprased Kora Computer Science and Mathematics Division, Computer Science Research Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
Trudy Wassenaar Molecular Microbiology and Genomics Consultants, Tannenstr 7, 55576 Zotzenheim, Germany
Suresh Poudel Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA Genome Science and Technology, University of Tennessee, Knoxville, TN 37996 USA
David W. Ussery Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA Joint Institute for Biological Sciences, University of Tennessee, Knoxville, TN 37996 USA Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, Kgs. Lyngby, 2800 Denmark Genome Science and Technology, University of Tennessee, Knoxville, TN 37996 USA

Collapse

Bischof J, Harrison T, Paczian T, Glass E, Wilke A, Meyer F. Metazen - metadata capture for metagenomes. Stand Genomic Sci 2014;9:18. [PMID: 25780508 PMCID: PMC4334943 DOI: 10.1186/1944-3277-9-18] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2014] [Accepted: 11/03/2014] [Indexed: 11/30/2022] Open

Walls RL, Deck J, Guralnick R, Baskauf S, Beaman R, Blum S, Bowers S, Buttigieg PL, Davies N, Endresen D, Gandolfo MA, Hanner R, Janning A, Krishtalka L, Matsunaga A, Midford P, Morrison N, Tuama ÉÓ, Schildhauer M, Smith B, Stucky BJ, Thomer A, Wieczorek J, Whitacre J, Wooley J. Semantics in support of biodiversity knowledge discovery: an introduction to the biological collections ontology and related ontologies. PLoS One 2014;9:e89606. [PMID: 24595056 PMCID: PMC3940615 DOI: 10.1371/journal.pone.0089606] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2013] [Accepted: 01/24/2014] [Indexed: 11/19/2022] Open

Abstract

The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers.

Collapse

Affiliation(s)

Ramona L. Walls The iPlant Collaborative, University of Arizona, Tucson, Arizona, United States of America * E-mail:
John Deck University of California, Berkeley, Berkeley, California, United States of America
Robert Guralnick Department of Ecology and Evolutionary Biology and the CU Museum of Natural History, University of Colorado at Boulder, Boulder, Colorado, United States of America
Steve Baskauf Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee, United States of America
Reed Beaman University of Florida, Florida Museum of Natural History, Gainesville, Florida, United States of America
Stanley Blum Research Informatics, California Academy of Sciences, San Francisco, California, United States of America
Shawn Bowers Gonzaga University, Computer Science, Spokane, Washington, United States of America
Pier Luigi Buttigieg Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Bremerhaven, Germany
Neil Davies University of California, Berkeley, Gump South Pacific Research Station, Moorea, French Polynesia
Dag Endresen GBIF Norway, Natural History Museum, University in Oslo, Oslo, Norway
Maria Alejandra Gandolfo LH Bailey Hortorium, Department of Plant Biology, Cornell University, Ithaca, New York, United States of America
Robert Hanner Biodiversity Institute of Ontario, University of Guelph, Guelph, ON, Canada
Alyssa Janning School of Information Resources and Library Science, University of Arizona, Tucson, Arizona, United States of America
Leonard Krishtalka Biodiversity Institute and Ecology & Evolutionary Biology, The University of Kansas, Lawrence, Kansas, United States of America
Andréa Matsunaga University of Florida, Gainesville, Florida, United States of America
Peter Midford Ecology and Evolutionary Biology, University of Kansas, Lawrence, Kansas, United States of America
Norman Morrison The BioVeL Project, School of Computer Science, The University of Manchester, Manchester, United Kingdom
Éamonn Ó. Tuama GBIF Secretariat, Copenhagen, Denmark
Mark Schildhauer National Center for Ecological Analysis and Synthesis, Santa Barbara, California, United States of America
Barry Smith Department of Philosophy, University at Buffalo, Buffalo, New York, United States of America
Brian J. Stucky Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, Colorado, United States of America
Andrea Thomer Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, Urbana-Champaign, Illinois, United States of America
John Wieczorek 3101 VLSB, Museum of Vertebrate Zoology, University of California, Berkeley, Berkeley, California, United States of America
Jamie Whitacre Informatics Branch, Information Technology Office, National Museum of Natural History, Smithsonian Institution, Washington, DC, United States of America
John Wooley University of California San Diego, La Jolla, California, United States of America

Collapse

Ison J, Kalas M, Jonassen I, Bolser D, Uludag M, McWilliam H, Malone J, Lopez R, Pettifer S, Rice P. EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats. Bioinformatics 2013;29:1325-32. [PMID: 23479348 PMCID: PMC3654706 DOI: 10.1093/bioinformatics/btt113] [Citation(s) in RCA: 126] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Revised: 02/28/2013] [Accepted: 03/01/2013] [Indexed: 11/14/2022] Open

Radom M, Rybarczyk A, Kottmann R, Formanowicz P, Szachniuk M, Glöckner FO, Rebholz-Schuhmann D, Błażewicz J. Poseidon: An information retrieval and extraction system for metagenomic marine science. ECOL INFORM 2012. [DOI: 10.1016/j.ecoinf.2012.07.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Teeling H, Glöckner FO. Current opportunities and challenges in microbial metagenome analysis--a bioinformatic perspective. Brief Bioinform 2012;13:728-42. [PMID: 22966151 PMCID: PMC3504927 DOI: 10.1093/bib/bbs039] [Citation(s) in RCA: 148] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2012] [Accepted: 06/09/2012] [Indexed: 12/21/2022] Open

Kalyana-Sundaram S, Shanmugam A, Chinnaiyan AM. Gene Fusion Markup Language: a prototype for exchanging gene fusion data. BMC Bioinformatics 2012;13:269. [PMID: 23072312 PMCID: PMC3607969 DOI: 10.1186/1471-2105-13-269] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2011] [Accepted: 10/11/2012] [Indexed: 12/26/2022] Open

Robbins RJ, Beach J, Blum S, Dawyndt P, Deck J, Kottmann R, Morrison N, Tuama EÓ, San Gil I, Vieglas D, Wieczorek J, Wooley J. RCN4GSC Meeting Report: Initiating a Testbed for Managing Data at the Interface of Biodiversity and Genomics/Metagenomics, May 2011. Stand Genomic Sci 2012;7:171-4. [PMID: 23409219 PMCID: PMC3558955 DOI: 10.4056/sigs.3176515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Logares R, Haverkamp TH, Kumar S, Lanzén A, Nederbragt AJ, Quince C, Kauserud H. Environmental microbiology through the lens of high-throughput DNA sequencing: Synopsis of current platforms and bioinformatics approaches. J Microbiol Methods 2012;91:106-13. [DOI: 10.1016/j.mimet.2012.07.017] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2012] [Revised: 07/19/2012] [Accepted: 07/23/2012] [Indexed: 10/28/2022]

Data platforms in integrative biodiversity research. ECOL INFORM 2012. [DOI: 10.1016/j.ecoinf.2012.04.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

The user's view on biodiversity data sharing — Investigating facts of acceptance and requirements to realize a sustainable use of research data —. ECOL INFORM 2012. [DOI: 10.1016/j.ecoinf.2012.03.004] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Liolios K, Schriml L, Hirschman L, Pagani I, Nosrat B, Sterk P, White O, Rocca-Serra P, Sansone SA, Taylor C, Kyrpides NC, Field D. The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness. Stand Genomic Sci 2012;6:438-47. [PMID: 23409217 PMCID: PMC3558968 DOI: 10.4056/sigs.2675953] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Abstract

Variability in the extent of the descriptions of data ('metadata') held in public repositories forces users to assess the quality of records individually, which rapidly becomes impractical. The scoring of records on the richness of their description provides a simple, objective proxy measure for quality that enables filtering that supports downstream analysis. Pivotally, such descriptions should spur on improvements. Here, we introduce such a measure - the 'Metadata Coverage Index' (MCI): the percentage of available fields actually filled in a record or description. MCI scores can be calculated across a database, for individual records or for their component parts (e.g., fields of interest). There are many potential uses for this simple metric: for example; to filter, rank or search for records; to assess the metadata availability of an ad hoc collection; to determine the frequency with which fields in a particular record type are filled, especially with respect to standards compliance; to assess the utility of specific tools and resources, and of data capture practice more generally; to prioritize records for further curation; to serve as performance metrics of funded projects; or to quantify the value added by curation. Here we demonstrate the utility of MCI scores using metadata from the Genomes Online Database (GOLD), including records compliant with the 'Minimum Information about a Genome Sequence' (MIGS) standard developed by the Genomic Standards Consortium. We discuss challenges and address the further application of MCI scores; to show improvements in annotation quality over time, to inform the work of standards bodies and repository providers on the usability and popularity of their products, and to assess and credit the work of curators. Such an index provides a step towards putting metadata capture practices and in the future, standards compliance, into a quantitative and objective framework.

Collapse

Gilbert JA, Bao Y, Wang H, Sansone SA, Edmunds SC, Morrison N, Meyer F, Schriml LM, Davies N, Sterk P, Wilkening J, Garrity GM, Field D, Robbins R, Smith DP, Mizrachi I, Moreau C. Report of the 13(th) Genomic Standards Consortium Meeting, Shenzhen, China, March 4-7, 2012. Stand Genomic Sci 2012;6:276-86. [PMID: 22768370 PMCID: PMC3387801 DOI: 10.4056/sigs.2876184] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Zaneveld JRR, Parfrey LW, Van Treuren W, Lozupone C, Clemente JC, Knights D, Stombaugh J, Kuczynski J, Knight R. Combined phylogenetic and genomic approaches for the high-throughput study of microbial habitat adaptation. Trends Microbiol 2011;19:472-82. [PMID: 21872475 PMCID: PMC3184378 DOI: 10.1016/j.tim.2011.07.006] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2011] [Revised: 07/22/2011] [Accepted: 07/25/2011] [Indexed: 01/21/2023]

Hankeln W, Wendel NJ, Gerken J, Waldmann J, Buttigieg PL, Kostadinov I, Kottmann R, Yilmaz P, Glöckner FO. CDinFusion--submission-ready, on-line integration of sequence and contextual data. PLoS One 2011;6:e24797. [PMID: 21935468 PMCID: PMC3172294 DOI: 10.1371/journal.pone.0024797] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2011] [Accepted: 08/19/2011] [Indexed: 11/19/2022] Open

Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat Biotechnol 2011;29:415-20. [PMID: 21552244 DOI: 10.1038/nbt.1823] [Citation(s) in RCA: 452] [Impact Index Per Article: 34.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Field D, Amaral-Zettler L, Cochrane G, Cole JR, Dawyndt P, Garrity GM, Gilbert J, Glöckner FO, Hirschman L, Karsch-Mizrachi I, Klenk HP, Knight R, Kottmann R, Kyrpides N, Meyer F, San Gil I, Sansone SA, Schriml LM, Sterk P, Tatusova T, Ussery DW, White O, Wooley J. The Genomic Standards Consortium. PLoS Biol 2011;9:e1001088. [PMID: 21713030 PMCID: PMC3119656 DOI: 10.1371/journal.pbio.1001088] [Citation(s) in RCA: 135] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Glöckner FO, Joint I. Marine microbial genomics in Europe: current status and perspectives. Microb Biotechnol 2011;3:523-30. [PMID: 20953416 PMCID: PMC2948668 DOI: 10.1111/j.1751-7915.2010.00169.x] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2009] [Accepted: 02/06/2010] [Indexed: 11/29/2022] Open

Duhaime MB, Kottmann R, Field D, Glöckner FO. Enriching public descriptions of marine phages using the Genomic Standards Consortium MIGS standard. Stand Genomic Sci 2011;4:271-85. [PMID: 21677864 PMCID: PMC3111985 DOI: 10.4056/sigs.621069] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Yilmaz P, Gilbert JA, Knight R, Amaral-Zettler L, Karsch-Mizrachi I, Cochrane G, Nakamura Y, Sansone SA, Glöckner FO, Field D. The genomic standards consortium: bringing standards to life for microbial ecology. ISME JOURNAL 2011;5:1565-7. [PMID: 21472015 PMCID: PMC3176512 DOI: 10.1038/ismej.2011.39] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Glass E, Meyer F, Gilbert JA, Field D, Hunter S, Kottmann R, Kyrpides N, Sansone S, Schriml L, Sterk P, White O, Wooley J. Meeting Report from the Genomic Standards Consortium (GSC) Workshop 10. Stand Genomic Sci 2010;3:225-31. [PMID: 21304723 PMCID: PMC3035307 DOI: 10.4056/sigs.1423520] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Gilbert JA, Meyer F, Knight R, Field D, Kyrpides N, Yilmaz P, Wooley J. Meeting report: GSC M5 roundtable at the 13th International Society for Microbial Ecology meeting in Seattle, WA, USA August 22-27, 2010. Stand Genomic Sci 2010;3:235-9. [PMID: 21304725 PMCID: PMC3035306 DOI: 10.4056/sigs.1333437] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Davidsen T, Madupu R, Sterk P, Field D, Garrity G, Gilbert J, Glöckner FO, Hirschman L, Kolker E, Kottmann R, Kyrpides N, Meyer F, Morrison N, Schriml L, Tatusova T, Wooley J. Meeting Report from the Genomic Standards Consortium (GSC) Workshop 9. Stand Genomic Sci 2010;3:216-24. [PMID: 21304722 PMCID: PMC3035308 DOI: 10.4056/sigs.1353455] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Kalas M, Puntervoll P, Joseph A, Bartaseviciūte E, Töpfer A, Venkataraman P, Pettifer S, Bryne JC, Ison J, Blanchet C, Rapacki K, Jonassen I. BioXSD: the common data-exchange format for everyday bioinformatics web services. Bioinformatics 2010;26:i540-6. [PMID: 20823319 PMCID: PMC2935419 DOI: 10.1093/bioinformatics/btq391] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Verslyppe B, Kottmann R, De Smet W, De Baets B, De Vos P, Dawyndt P. Microbiological Common Language (MCL): a standard for electronic information exchange in the Microbial Commons. Res Microbiol 2010;161:439-45. [DOI: 10.1016/j.resmic.2010.02.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2009] [Revised: 01/22/2010] [Accepted: 02/12/2010] [Indexed: 10/19/2022]

Hankeln W, Buttigieg PL, Fink D, Kottmann R, Yilmaz P, Glöckner FO. MetaBar - a tool for consistent contextual data acquisition and standards compliant submission. BMC Bioinformatics 2010;11:358. [PMID: 20591175 PMCID: PMC2912304 DOI: 10.1186/1471-2105-11-358] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2010] [Accepted: 06/30/2010] [Indexed: 11/10/2022] Open

Abstract

Background

Environmental sequence datasets are increasing at an exponential rate; however, the vast majority of them lack appropriate descriptors like sampling location, time and depth/altitude: generally referred to as metadata or contextual data. The consistent capture and structured submission of these data is crucial for integrated data analysis and ecosystems modeling. The application MetaBar has been developed, to support consistent contextual data acquisition.

Results

MetaBar is a spreadsheet and web-based software tool designed to assist users in the consistent acquisition, electronic storage, and submission of contextual data associated to their samples. A preconfigured Microsoft^®Excel^®spreadsheet is used to initiate structured contextual data storage in the field or laboratory. Each sample is given a unique identifier and at any stage the sheets can be uploaded to the MetaBar database server. To label samples, identifiers can be printed as barcodes. An intuitive web interface provides quick access to the contextual data in the MetaBar database as well as user and project management capabilities. Export functions facilitate contextual and sequence data submission to the International Nucleotide Sequence Database Collaboration (INSDC), comprising of the DNA DataBase of Japan (DDBJ), the European Molecular Biology Laboratory database (EMBL) and GenBank. MetaBar requests and stores contextual data in compliance to the Genomic Standards Consortium specifications. The MetaBar open source code base for local installation is available under the GNU General Public License version 3 (GNU GPL3).

Conclusion

The MetaBar software supports the typical workflow from data acquisition and field-sampling to contextual data enriched sequence submission to an INSDC database. The integration with the megx.net marine Ecological Genomics database and portal facilitates georeferenced data integration and metadata-based comparisons of sampling sites as well as interactive data visualization. The ample export functionalities and the INSDC submission support enable exchange of data across disciplines and safeguarding contextual data.

Collapse

Valdivia-Granda WA. Bioinformatics for biodefense: challenges and opportunities. Biosecur Bioterror 2010;8:69-77. [PMID: 20230234 DOI: 10.1089/bsp.2009.0024] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]

Tamames J, de Lorenzo V. EnvMine: a text-mining system for the automatic extraction of contextual information. BMC Bioinformatics 2010;11:294. [PMID: 20515448 PMCID: PMC2901371 DOI: 10.1186/1471-2105-11-294] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2009] [Accepted: 06/01/2010] [Indexed: 12/13/2022] Open

Abstract

BACKGROUND

For ecological studies, it is crucial to count on adequate descriptions of the environments and samples being studied. Such a description must be done in terms of their physicochemical characteristics, allowing a direct comparison between different environments that would be difficult to do otherwise. Also the characterization must include the precise geographical location, to make possible the study of geographical distributions and biogeographical patterns. Currently, there is no schema for annotating these environmental features, and these data have to be extracted from textual sources (published articles). So far, this had to be performed by manual inspection of the corresponding documents. To facilitate this task, we have developed EnvMine, a set of text-mining tools devoted to retrieve contextual information (physicochemical variables and geographical locations) from textual sources of any kind.

RESULTS

EnvMine is capable of retrieving the physicochemical variables cited in the text, by means of the accurate identification of their associated units of measurement. In this task, the system achieves a recall (percentage of items retrieved) of 92% with less than 1% error. Also a Bayesian classifier was tested for distinguishing parts of the text describing environmental characteristics from others dealing with, for instance, experimental settings.Regarding the identification of geographical locations, the system takes advantage of existing databases such as GeoNames to achieve 86% recall with 92% precision. The identification of a location includes also the determination of its exact coordinates (latitude and longitude), thus allowing the calculation of distance between the individual locations.

CONCLUSION

EnvMine is a very efficient method for extracting contextual information from different text sources, like published articles or web pages. This tool can help in determining the precise location and physicochemical variables of sampling sites, thus facilitating the performance of ecological analyses. EnvMine can also help in the development of standards for the annotation of environmental features.

Collapse

Pfister CA, Meyer F, Antonopoulos DA. Metagenomic profiling of a microbial assemblage associated with the California mussel: a node in networks of carbon and nitrogen cycling. PLoS One 2010;5:e10518. [PMID: 20463896 PMCID: PMC2865538 DOI: 10.1371/journal.pone.0010518] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2009] [Accepted: 04/06/2010] [Indexed: 11/19/2022] Open

Abstract

Mussels are conspicuous and often abundant members of rocky shores and may constitute an important site for the nitrogen cycle due to their feeding and excretion activities. We used shotgun metagenomics of the microbial community associated with the surface of mussels (Mytilus californianus) on Tatoosh Island in Washington state to test whether there is a nitrogen-based microbial assemblage associated with mussels. Analyses of both tidepool mussels and those on emergent benches revealed a diverse community of Bacteria and Archaea with approximately 31 million bp from 6 mussels in each habitat. Using MG-RAST, between 22.5–25.6% were identifiable using the SEED non-redundant database for proteins. Of those fragments that were identifiable through MG-RAST, the composition was dominated by Cyanobacteria and Alpha- and Gamma-proteobacteria. Microbial composition was highly similar between the tidepool and emergent bench mussels, suggesting similar functions across these different microhabitats. One percent of the proteins identified in each sample were related to nitrogen cycling. When normalized to protein discovery rate, the high diversity and abundance of enzymes related to the nitrogen cycle in mussel-associated microbes is as great or greater than that described for other marine metagenomes. In some instances, the nitrogen-utilizing profile of this assemblage was more concordant with soil metagenomes in the Midwestern U.S. than for open ocean system. Carbon fixation and Calvin cycle enzymes further represented 0.65 and 1.26% of all proteins and their abundance was comparable to a number of open ocean marine metagenomes. In sum, the diversity and abundance of nitrogen and carbon cycle related enzymes in the microbes occupying the shells of Mytilus californianus suggest these mussels provide a node for microbial populations and thus biogeochemical processes.

Collapse

Wooley JC, Godzik A, Friedberg I. A primer on metagenomics. PLoS Comput Biol 2010;6:e1000667. [PMID: 20195499 PMCID: PMC2829047 DOI: 10.1371/journal.pcbi.1000667] [Citation(s) in RCA: 455] [Impact Index Per Article: 32.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open

Kottmann R, Kostadinov I, Duhaime MB, Buttigieg PL, Yilmaz P, Hankeln W, Waldmann J, Glöckner FO. Megx.net: integrated database resource for marine ecological genomics. Nucleic Acids Res 2010;38:D391-5. [PMID: 19858098 PMCID: PMC2808895 DOI: 10.1093/nar/gkp918] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2009] [Accepted: 10/08/2009] [Indexed: 11/28/2022] Open

Field D, Friedberg I, Sterk P, Kottmann R, Glöckner FO, Hirschman L, Garrity GM, Cochrane G, Wooley J, Gilbert J. Meeting Report: "Metagenomics, Metadata and Meta-analysis" (M3) Special Interest Group at ISMB 2009. Stand Genomic Sci 2009;1:278-82. [PMID: 21304668 PMCID: PMC3035241 DOI: 10.4056/sigs.641096] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Wagener J, Spjuth O, Willighagen EL, Wikberg JES. XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous web services. BMC Bioinformatics 2009;10:279. [PMID: 19732427 PMCID: PMC2755485 DOI: 10.1186/1471-2105-10-279] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2009] [Accepted: 09/04/2009] [Indexed: 01/12/2023] Open

Abstract

BACKGROUND

Life sciences make heavily use of the web for both data provision and analysis. However, the increasing amount of available data and the diversity of analysis tools call for machine accessible interfaces in order to be effective. HTTP-based Web service technologies, like the Simple Object Access Protocol (SOAP) and REpresentational State Transfer (REST) services, are today the most common technologies for this in bioinformatics. However, these methods have severe drawbacks, including lack of discoverability, and the inability for services to send status notifications. Several complementary workarounds have been proposed, but the results are ad-hoc solutions of varying quality that can be difficult to use.

RESULTS

We present a novel approach based on the open standard Extensible Messaging and Presence Protocol (XMPP), consisting of an extension (IO Data) to comprise discovery, asynchronous invocation, and definition of data types in the service. That XMPP cloud services are capable of asynchronous communication implies that clients do not have to poll repetitively for status, but the service sends the results back to the client upon completion. Implementations for Bioclipse and Taverna are presented, as are various XMPP cloud services in bio- and cheminformatics.

CONCLUSION

XMPP with its extensions is a powerful protocol for cloud services that demonstrate several advantages over traditional HTTP-based Web services: 1) services are discoverable without the need of an external registry, 2) asynchronous invocation eliminates the need for ad-hoc solutions like polling, and 3) input and output types defined in the service allows for generation of clients on the fly without the need of an external semantics description. The many advantages over existing technologies make XMPP a highly interesting candidate for next generation online services in bioinformatics.

Collapse

Nelson OW, Harrison SH, Garrity GM. Meeting report for SIGS1: First Conference of the Standards in Genomic Sciences eJournal. Stand Genomic Sci 2009;1:72-6. [PMID: 21304640 PMCID: PMC3035209 DOI: 10.4056/sigs.328] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Field D, Sterk P, Kyrpides N, Kottmann R, Glöckner FO, Hirschman L, Garrity GM, Wooley J, Gilna P. Meeting Report from the Genomic Standards Consortium (GSC) Workshops 6 and 7. Stand Genomic Sci 2009;1:68-71. [PMID: 21304639 PMCID: PMC3035212 DOI: 10.4056/sigs.25165] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Wooley JC, Field D, Glöckner FO. Extending Standards for Genomics and Metagenomics Data: A Research Coordination Network for the Genomic Standards Consortium (RCN4GSC). Stand Genomic Sci 2009;1:87-90. [PMID: 21304642 PMCID: PMC3035207 DOI: 10.4056/sigs.26218] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Abstract

Through a newly established Research Coordination Network for the Genomic Standards Consortium (RCN4GSC), the GSC will continue its leadership in establishing and integrating genomic standards through community-based efforts. These efforts, undertaken in the context of genomic and metagenomic research aim to ensure the electronic capture of all genomic data and to facilitate the achievement of a community consensus around collecting and managing relevant contextual information connected to the sequence data. The GSC operates as an open, inclusive organization, welcoming inspired biologists with a commitment to community service. Within the collaborative framework of the ongoing, international activities of the GSC, the RCN will expand the range of research domains engaged in these standardization efforts and sustain scientific networking to encourage active participation by the broader community. The RCN4GSC, funded for five years by the US National Science Foundation, will primarily support outcome-focused working meetings and the exchange of early-career scientists between GSC research groups in order to advance key standards contributions such as GCDML. Focusing on the timely delivery of the extant GSC core projects, the RCN will also extend the pioneering efforts of the GSC to engage researchers active in developing ecological, environmental and biodiversity data standards. As the initial goals of the GSC are increasingly achieved, promoting the comprehensive use of effective standards will be essential to ensure the effective use of sequence and associated data, to provide access for all biologists to all of the information, and to create interdisciplinary opportunities for discovery. The RCN will facilitate these implementation activities through participation in major scientific conferences and presentations on scientific advances enabled by community usage of genomic standards.

Collapse

Chervitz SA, Parkinson H, Fostel JM, Causton HC, Sanson SA, Deutsch EW, Field D, Taylor CF, Rocca-Serra P, White J, Stoeckert CJ. Standards for Functional Genomics. Bioinformatics 2009. [DOI: 10.1007/978-0-387-92738-1_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Genomes and knowledge - a questionable relationship? Trends Microbiol 2008;16:512-9. [PMID: 18819801 DOI: 10.1016/j.tim.2008.08.001] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2008] [Revised: 08/15/2008] [Accepted: 08/21/2008] [Indexed: 11/22/2022]