1
|
Schmidt C, Boissonnet T, Dohle J, Bernhardt K, Ferrando-May E, Wernet T, Nitschke R, Kunis S, Weidtkamp-Peters S. A practical guide to bioimaging research data management in core facilities. J Microsc 2024; 294:350-371. [PMID: 38752662 DOI: 10.1111/jmi.13317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 04/29/2024] [Accepted: 04/30/2024] [Indexed: 05/21/2024]
Abstract
Bioimage data are generated in diverse research fields throughout the life and biomedical sciences. Its potential for advancing scientific progress via modern, data-driven discovery approaches reaches beyond disciplinary borders. To fully exploit this potential, it is necessary to make bioimaging data, in general, multidimensional microscopy images and image series, FAIR, that is, findable, accessible, interoperable and reusable. These FAIR principles for research data management are now widely accepted in the scientific community and have been adopted by funding agencies, policymakers and publishers. To remain competitive and at the forefront of research, implementing the FAIR principles into daily routines is an essential but challenging task for researchers and research infrastructures. Imaging core facilities, well-established providers of access to imaging equipment and expertise, are in an excellent position to lead this transformation in bioimaging research data management. They are positioned at the intersection of research groups, IT infrastructure providers, the institution´s administration, and microscope vendors. In the frame of German BioImaging - Society for Microscopy and Image Analysis (GerBI-GMB), cross-institutional working groups and third-party funded projects were initiated in recent years to advance the bioimaging community's capability and capacity for FAIR bioimage data management. Here, we provide an imaging-core-facility-centric perspective outlining the experience and current strategies in Germany to facilitate the practical adoption of the FAIR principles closely aligned with the international bioimaging community. We highlight which tools and services are ready to be implemented and what the future directions for FAIR bioimage data have to offer.
Collapse
Affiliation(s)
- Christian Schmidt
- Enabling Technology Department, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Tom Boissonnet
- Center for Advanced Imaging, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Julia Dohle
- Center of Cellular Nanoanalytics, Integrated Bioimaging Facility iBiOs, University of Osnabrück, Osnabrück, Germany
| | - Karen Bernhardt
- Center of Cellular Nanoanalytics, Integrated Bioimaging Facility iBiOs, University of Osnabrück, Osnabrück, Germany
| | - Elisa Ferrando-May
- Enabling Technology Department, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Tobias Wernet
- Life Imaging Center, University of Freiburg, Freiburg, Germany
| | - Roland Nitschke
- Life Imaging Center, University of Freiburg, Freiburg, Germany
- CIBSS and BIOSS - Centres for Biological Signalling Studies, University of Freiburg, Freiburg, Germany
| | - Susanne Kunis
- Center of Cellular Nanoanalytics, Integrated Bioimaging Facility iBiOs, University of Osnabrück, Osnabrück, Germany
| | | |
Collapse
|
2
|
Minamiyama Y, Takeda H, Hayashi M, Asaoka M, Yamaji K. A study on formalizing the knowledge of data curation activities across different fields. PLoS One 2024; 19:e0301772. [PMID: 38662657 PMCID: PMC11045097 DOI: 10.1371/journal.pone.0301772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 03/21/2024] [Indexed: 04/28/2024] Open
Abstract
In recent years, with the trend of open science, there have been many efforts to share research data on the internet. To promote research data sharing, data curation is essential to make the data interpretable and reusable. In research fields such as life sciences, earth sciences, and social sciences, tasks and procedures have been already developed to implement efficient data curation to meet the needs and customs of individual research fields. However, not only data sharing within research fields but also interdisciplinary data sharing is required to promote open science. For this purpose, knowledge of data curation across the research fields is surveyed, analyzed, and organized as an ontology in this paper. As the survey, existing vocabularies and procedures are collected and compared as well as interviews with the data curators in research institutes in different fields are conducted to clarify commonalities and differences in data curation across the research fields. It turned out that the granularity of tasks and procedures that constitute the building blocks of data curation is not formalized. Without a method to overcome this gap, it will be challenging to promote interdisciplinary reuse of research data. Based on the analysis above, the ontology for the data curation process is proposed to describe data curation processes in different fields universally. It is described by OWL and shown as valid and consistent from the logical viewpoint. The ontology successfully represents data curation activities as the processes in the different fields acquired by the interviews. It is also helpful to identify the functions of the systems to support the data curation process. This study contributes to building a knowledge framework for an interdisciplinary understanding of data curation activities in different fields.
Collapse
Affiliation(s)
- Yasuyuki Minamiyama
- Research Center for Open Science and Data Platform, National Institute of Informatics, Chiyoda-City, Tokyo, Japan
| | - Hideaki Takeda
- Principles of Informatics Research Division, National Institute of Informatics, Chiyoda-City, Tokyo, Japan
| | - Masaharu Hayashi
- Research Center for Open Science and Data Platform, National Institute of Informatics, Chiyoda-City, Tokyo, Japan
| | - Makoto Asaoka
- Research Center for Open Science and Data Platform, National Institute of Informatics, Chiyoda-City, Tokyo, Japan
| | - Kazutsuna Yamaji
- Research Center for Open Science and Data Platform, National Institute of Informatics, Chiyoda-City, Tokyo, Japan
| |
Collapse
|
3
|
Chong R, Tipton L. The Pacific Innovations, Knowledge, and Opportunities (PIKO) Program: A Data Lifecycle Research Experience. HAWAI'I JOURNAL OF HEALTH & SOCIAL WELFARE 2023; 82:117-120. [PMID: 37901670 PMCID: PMC10612409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 10/31/2023]
Abstract
Pacific evidence-based clinical and translational research is greatly needed. However, there are research challenges that stem from the creation, accessibility, availability, usability, and compliance of data in the Pacific. As a result, there is a growing demand for a complementary approach to the traditional Western research process in clinical and translational research. The data lifecycle is one such approach with a history of use in various other disciplines. It was designed as a data management tool with a set of activities that guide researchers and organizations on the creation, management, usage, and distribution of data. This manuscript describes the data lifecycle and its use by the Biostatistics, Epidemiology, and Research Design core data science team in support of the Center for Pacific Innovations, Knowledge, and Opportunities program.
Collapse
Affiliation(s)
- Rylan Chong
- School of Natural Sciences and Mathematics, Department of Data Science, Analytics and Visualization, Chaminade University of Honolulu, Honolulu, HI
| | - Laura Tipton
- School of Natural Sciences and Mathematics, Department of Data Science, Analytics and Visualization, Chaminade University of Honolulu, Honolulu, HI
| |
Collapse
|
4
|
Novichkov PS, Chandonia JM, Arkin AP. CORAL: A framework for rigorous self-validated data modeling and integrative, reproducible data analysis. Gigascience 2022; 11:6762021. [PMID: 36251274 PMCID: PMC9575582 DOI: 10.1093/gigascience/giac089] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 04/12/2022] [Accepted: 08/31/2022] [Indexed: 11/04/2022] Open
Abstract
BACKGROUND Many organizations face challenges in managing and analyzing data, especially when relevant datasets arise from multiple sources and methods. Analyzing heterogeneous datasets and additional derived data requires rigorous tracking of their interrelationships and provenance. This task has long been a Grand Challenge of data science and has more recently been formalized in the FAIR principles: that all data objects be Findable, Accessible, Interoperable, and Reusable, both for machines and for people. Adherence to these principles is necessary for proper stewardship of information, for testing regulatory compliance, for measuring the efficiency of processes, and for facilitating reuse of data-analytical frameworks. FINDINGS We present the Contextual Ontology-based Repository Analysis Library (CORAL), a platform that greatly facilitates adherence to all 4 of the FAIR principles, including the especially difficult challenge of making heterogeneous datasets Interoperable and Reusable across all parts of a large, long-lasting organization. To achieve this, CORAL's data model requires that data generators extensively document the context for all data, and our tools maintain that context throughout the entire analysis pipeline. CORAL also features a web interface for data generators to upload and explore data, as well as a Jupyter notebook interface for data analysts, both backed by a common API. CONCLUSIONS CORAL enables organizations to build FAIR data types on the fly as they are needed, avoiding the expense of bespoke data modeling. CORAL provides a uniquely powerful platform to enable integrative cross-dataset analyses, generating deeper insights than are possible using traditional analysis tools.
Collapse
Affiliation(s)
| | - John-Marc Chandonia
- Correspondence address. John-Marc Chandonia, Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Lab, Mailstop Donner, Berkeley, CA 94720-3102, USA. E-mail:
| | - Adam P Arkin
- Correspondence address. Adam P. Arkin, Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mailstop 955-512L, Berkeley, CA 94720, USA. E-mail:
| |
Collapse
|
5
|
Rübel O, Tritt A, Ly R, Dichter BK, Ghosh S, Niu L, Baker P, Soltesz I, Ng L, Svoboda K, Frank L, Bouchard KE. The Neurodata Without Borders ecosystem for neurophysiological data science. eLife 2022; 11:e78362. [PMID: 36193886 PMCID: PMC9531949 DOI: 10.7554/elife.78362] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 05/13/2022] [Indexed: 01/21/2023] Open
Abstract
The neurophysiology of cells and tissues are monitored electrophysiologically and optically in diverse experiments and species, ranging from flies to humans. Understanding the brain requires integration of data across this diversity, and thus these data must be findable, accessible, interoperable, and reusable (FAIR). This requires a standard language for data and metadata that can coevolve with neuroscience. We describe design and implementation principles for a language for neurophysiology data. Our open-source software (Neurodata Without Borders, NWB) defines and modularizes the interdependent, yet separable, components of a data language. We demonstrate NWB's impact through unified description of neurophysiology data across diverse modalities and species. NWB exists in an ecosystem, which includes data management, analysis, visualization, and archive tools. Thus, the NWB data language enables reproduction, interchange, and reuse of diverse neurophysiology data. More broadly, the design principles of NWB are generally applicable to enhance discovery across biology through data FAIRness.
Collapse
Affiliation(s)
- Oliver Rübel
- Scientific Data Division, Lawrence Berkeley National LaboratoryBerkeleyUnited States
| | - Andrew Tritt
- Applied Mathematics and Computational Research Division, Lawrence Berkeley National LaboratoryBerkeleyUnited States
| | - Ryan Ly
- Scientific Data Division, Lawrence Berkeley National LaboratoryBerkeleyUnited States
| | | | - Satrajit Ghosh
- McGovern Institute for Brain Research, Massachusetts Institute of TechnologyCambridgeUnited States
- Department of Otolaryngology - Head and Neck Surgery, Harvard Medical SchoolBostonUnited States
| | | | - Pamela Baker
- Allen Institute for Brain ScienceSeattleUnited States
| | - Ivan Soltesz
- Department of Neurosurgery, Stanford UniversityStanfordUnited States
| | - Lydia Ng
- Allen Institute for Brain ScienceSeattleUnited States
| | - Karel Svoboda
- Allen Institute for Brain ScienceSeattleUnited States
- Janelia Research Campus, Howard Hughes Medical InstituteAshburnUnited States
| | - Loren Frank
- Janelia Research Campus, Howard Hughes Medical InstituteAshburnUnited States
- Kavli Institute for Fundamental NeuroscienceSan FranciscoUnited States
- Departments of Physiology and Psychiatry University of California, San FranciscoSan FranciscoUnited States
| | - Kristofer E Bouchard
- Scientific Data Division, Lawrence Berkeley National LaboratoryBerkeleyUnited States
- Kavli Institute for Fundamental NeuroscienceSan FranciscoUnited States
- Biological Systems and Engineering Division, Lawrence Berkeley National LaboratoryBerkeleyUnited States
- Helen Wills Neuroscience Institute and Redwood Center for Theoretical Neuroscience, University of California, BerkeleyBerkeleyUnited States
- Weill NeurohubBerkeleyUnited States
| |
Collapse
|
6
|
Paulhe N, Canlet C, Damont A, Peyriga L, Durand S, Deborde C, Alves S, Bernillon S, Berton T, Bir R, Bouville A, Cahoreau E, Centeno D, Costantino R, Debrauwer L, Delabrière A, Duperier C, Emery S, Flandin A, Hohenester U, Jacob D, Joly C, Jousse C, Lagree M, Lamari N, Lefebvre M, Lopez-Piffet C, Lyan B, Maucourt M, Migne C, Olivier MF, Rathahao-Paris E, Petriacq P, Pinelli J, Roch L, Roger P, Roques S, Tabet JC, Tremblay-Franco M, Traïkia M, Warnet A, Zhendre V, Rolin D, Jourdan F, Thévenot E, Moing A, Jamin E, Fenaille F, Junot C, Pujos-Guillot E, Giacomoni F. PeakForest: a multi-platform digital infrastructure for interoperable metabolite spectral data and metadata management. Metabolomics 2022; 18:40. [PMID: 35699774 PMCID: PMC9197906 DOI: 10.1007/s11306-022-01899-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 05/22/2022] [Indexed: 01/02/2023]
Abstract
INTRODUCTION Accuracy of feature annotation and metabolite identification in biological samples is a key element in metabolomics research. However, the annotation process is often hampered by the lack of spectral reference data in experimental conditions, as well as logistical difficulties in the spectral data management and exchange of annotations between laboratories. OBJECTIVES To design an open-source infrastructure allowing hosting both nuclear magnetic resonance (NMR) and mass spectra (MS), with an ergonomic Web interface and Web services to support metabolite annotation and laboratory data management. METHODS We developed the PeakForest infrastructure, an open-source Java tool with automatic programming interfaces that can be deployed locally to organize spectral data for metabolome annotation in laboratories. Standardized operating procedures and formats were included to ensure data quality and interoperability, in line with international recommendations and FAIR principles. RESULTS PeakForest is able to capture and store experimental spectral MS and NMR metadata as well as collect and display signal annotations. This modular system provides a structured database with inbuilt tools to curate information, browse and reuse spectral information in data treatment. PeakForest offers data formalization and centralization at the laboratory level, facilitating shared spectral data across laboratories and integration into public databases. CONCLUSION PeakForest is a comprehensive resource which addresses a technical bottleneck, namely large-scale spectral data annotation and metabolite identification for metabolomics laboratories with multiple instruments. PeakForest databases can be used in conjunction with bespoke data analysis pipelines in the Galaxy environment, offering the opportunity to meet the evolving needs of metabolomics research. Developed and tested by the French metabolomics community, PeakForest is freely-available at https://github.com/peakforest .
Collapse
Affiliation(s)
- Nils Paulhe
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, France
| | - Cécile Canlet
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, MetaboHUB, 31300, Toulouse, France
| | - Annelaure Damont
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris-Saclay, CEA, INRAE, MetaboHUB, 91191, Gif sur Yvette, France
| | - Lindsay Peyriga
- MetaboHUB-MetaToul, National Infrastructure of Metabolomics & Fluxomics (ANR-11-INBS-0010), 31077, Toulouse, France
| | - Stéphanie Durand
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, France
| | - Catherine Deborde
- Université de Bordeaux, INRAE, Biologie du Fruit et Pathologie, UMR 1332, Bordeaux Metabolome, MetaboHUB, PHENOME-EMPHASIS, 71 av E. Bourlaux, 33140, Villenave d'Ornon, France
| | - Sandra Alves
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris-Saclay, CEA, INRAE, MetaboHUB, 91191, Gif sur Yvette, France
| | - Stephane Bernillon
- Université de Bordeaux, INRAE, Biologie du Fruit et Pathologie, UMR 1332, Bordeaux Metabolome, MetaboHUB, PHENOME-EMPHASIS, 71 av E. Bourlaux, 33140, Villenave d'Ornon, France
| | - Thierry Berton
- Université de Bordeaux, INRAE, Biologie du Fruit et Pathologie, UMR 1332, Bordeaux Metabolome, MetaboHUB, PHENOME-EMPHASIS, 71 av E. Bourlaux, 33140, Villenave d'Ornon, France
| | - Raphael Bir
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, France
| | - Alyssa Bouville
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, MetaboHUB, 31300, Toulouse, France
| | - Edern Cahoreau
- MetaboHUB-MetaToul, National Infrastructure of Metabolomics & Fluxomics (ANR-11-INBS-0010), 31077, Toulouse, France
| | - Delphine Centeno
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, France
| | - Robin Costantino
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, MetaboHUB, 31300, Toulouse, France
| | - Laurent Debrauwer
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, MetaboHUB, 31300, Toulouse, France
| | - Alexis Delabrière
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris-Saclay, CEA, INRAE, MetaboHUB, 91191, Gif sur Yvette, France
| | - Christophe Duperier
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, France
| | - Sylvain Emery
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, France
| | - Amelie Flandin
- Université de Bordeaux, INRAE, Biologie du Fruit et Pathologie, UMR 1332, Bordeaux Metabolome, MetaboHUB, PHENOME-EMPHASIS, 71 av E. Bourlaux, 33140, Villenave d'Ornon, France
| | - Ulli Hohenester
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris-Saclay, CEA, INRAE, MetaboHUB, 91191, Gif sur Yvette, France
| | - Daniel Jacob
- Université de Bordeaux, INRAE, Biologie du Fruit et Pathologie, UMR 1332, Bordeaux Metabolome, MetaboHUB, PHENOME-EMPHASIS, 71 av E. Bourlaux, 33140, Villenave d'Ornon, France
| | - Charlotte Joly
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, France
| | - Cyril Jousse
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, France
| | - Marie Lagree
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, France
| | - Nadia Lamari
- Université de Bordeaux, INRAE, Biologie du Fruit et Pathologie, UMR 1332, Bordeaux Metabolome, MetaboHUB, PHENOME-EMPHASIS, 71 av E. Bourlaux, 33140, Villenave d'Ornon, France
| | - Marie Lefebvre
- Université de Bordeaux, INRAE, Biologie du Fruit et Pathologie, UMR 1332, Bordeaux Metabolome, MetaboHUB, PHENOME-EMPHASIS, 71 av E. Bourlaux, 33140, Villenave d'Ornon, France
| | - Claire Lopez-Piffet
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, France
| | - Bernard Lyan
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, France
| | - Mickael Maucourt
- Université de Bordeaux, INRAE, Biologie du Fruit et Pathologie, UMR 1332, Bordeaux Metabolome, MetaboHUB, PHENOME-EMPHASIS, 71 av E. Bourlaux, 33140, Villenave d'Ornon, France
| | - Carole Migne
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, France
| | - Marie-Francoise Olivier
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris-Saclay, CEA, INRAE, MetaboHUB, 91191, Gif sur Yvette, France
| | - Estelle Rathahao-Paris
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris-Saclay, CEA, INRAE, MetaboHUB, 91191, Gif sur Yvette, France
| | - Pierre Petriacq
- Université de Bordeaux, INRAE, Biologie du Fruit et Pathologie, UMR 1332, Bordeaux Metabolome, MetaboHUB, PHENOME-EMPHASIS, 71 av E. Bourlaux, 33140, Villenave d'Ornon, France
| | - Julie Pinelli
- Université de Bordeaux, INRAE, Biologie du Fruit et Pathologie, UMR 1332, Bordeaux Metabolome, MetaboHUB, PHENOME-EMPHASIS, 71 av E. Bourlaux, 33140, Villenave d'Ornon, France
| | - Léa Roch
- Université de Bordeaux, INRAE, Biologie du Fruit et Pathologie, UMR 1332, Bordeaux Metabolome, MetaboHUB, PHENOME-EMPHASIS, 71 av E. Bourlaux, 33140, Villenave d'Ornon, France
| | - Pierrick Roger
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris-Saclay, CEA, INRAE, MetaboHUB, 91191, Gif sur Yvette, France
| | - Simon Roques
- Université de Bordeaux, INRAE, Biologie du Fruit et Pathologie, UMR 1332, Bordeaux Metabolome, MetaboHUB, PHENOME-EMPHASIS, 71 av E. Bourlaux, 33140, Villenave d'Ornon, France
| | - Jean-Claude Tabet
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris-Saclay, CEA, INRAE, MetaboHUB, 91191, Gif sur Yvette, France
| | - Marie Tremblay-Franco
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, MetaboHUB, 31300, Toulouse, France
| | - Mounir Traïkia
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, France
| | - Anna Warnet
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris-Saclay, CEA, INRAE, MetaboHUB, 91191, Gif sur Yvette, France
| | - Vanessa Zhendre
- Université de Bordeaux, INRAE, Biologie du Fruit et Pathologie, UMR 1332, Bordeaux Metabolome, MetaboHUB, PHENOME-EMPHASIS, 71 av E. Bourlaux, 33140, Villenave d'Ornon, France
| | - Dominique Rolin
- Université de Bordeaux, INRAE, Biologie du Fruit et Pathologie, UMR 1332, Bordeaux Metabolome, MetaboHUB, PHENOME-EMPHASIS, 71 av E. Bourlaux, 33140, Villenave d'Ornon, France
| | - Fabien Jourdan
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, MetaboHUB, 31300, Toulouse, France
| | - Etienne Thévenot
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris-Saclay, CEA, INRAE, MetaboHUB, 91191, Gif sur Yvette, France
| | - Annick Moing
- Université de Bordeaux, INRAE, Biologie du Fruit et Pathologie, UMR 1332, Bordeaux Metabolome, MetaboHUB, PHENOME-EMPHASIS, 71 av E. Bourlaux, 33140, Villenave d'Ornon, France
| | - Emilien Jamin
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, MetaboHUB, 31300, Toulouse, France
| | - François Fenaille
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris-Saclay, CEA, INRAE, MetaboHUB, 91191, Gif sur Yvette, France
| | - Christophe Junot
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris-Saclay, CEA, INRAE, MetaboHUB, 91191, Gif sur Yvette, France
| | - Estelle Pujos-Guillot
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, France
| | - Franck Giacomoni
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, France.
| |
Collapse
|
7
|
Righelli D, Angelini C. Easyreporting simplifies the implementation of Reproducible Research layers in R software. PLoS One 2021; 16:e0244122. [PMID: 33970927 PMCID: PMC8109797 DOI: 10.1371/journal.pone.0244122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 04/20/2021] [Indexed: 11/19/2022] Open
Abstract
During last years "irreproducibility" became a general problem in omics data analysis due to the use of sophisticated and poorly described computational procedures. For avoiding misleading results, it is necessary to inspect and reproduce the entire data analysis as a unified product. Reproducible Research (RR) provides general guidelines for public access to the analytic data and related analysis code combined with natural language documentation, allowing third-parties to reproduce the findings. We developed easyreporting, a novel R/Bioconductor package, to facilitate the implementation of an RR layer inside reports/tools. We describe the main functionalities and illustrate the organization of an analysis report using a typical case study concerning the analysis of RNA-seq data. Then, we show how to use easyreporting in other projects to trace R functions automatically. This latter feature helps developers to implement procedures that automatically keep track of the analysis steps. Easyreporting can be useful in supporting the reproducibility of any data analysis project and shows great advantages for the implementation of R packages and GUIs. It turns out to be very helpful in bioinformatics, where the complexity of the analyses makes it extremely difficult to trace all the steps and parameters used in the study.
Collapse
Affiliation(s)
- Dario Righelli
- Department of Statistical Sciences, University of Padova, Padua, Italy
- Istituto per le Applicazioni del Calcolo “Mauro Picone”, National Research Council, Naples, Italy
- * E-mail: (DR); (CA)
| | - Claudia Angelini
- Istituto per le Applicazioni del Calcolo “Mauro Picone”, National Research Council, Naples, Italy
- * E-mail: (DR); (CA)
| |
Collapse
|
8
|
Hamdi Y, Zass L, Othman H, Radouani F, Allali I, Hanachi M, Okeke CJ, Chaouch M, Tendwa MB, Samtal C, Mohamed Sallam R, Alsayed N, Turkson M, Ahmed S, Benkahla A, Romdhane L, Souiai O, Tastan Bishop Ö, Ghedira K, Mohamed Fadlelmola F, Mulder N, Kamal Kassim S. Human OMICs and Computational Biology Research in Africa: Current Challenges and Prospects. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2021; 25:213-233. [PMID: 33794662 DOI: 10.1089/omi.2021.0004] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Following the publication of the first human genome, OMICs research, including genomics, transcriptomics, proteomics, and metagenomics, has been on the rise. OMICs studies revealed the complex genetic diversity among human populations and challenged our understandings of genotype-phenotype correlations. Africa, being the cradle of the first modern humans, is distinguished by a large genetic diversity within its populations and rich ethnolinguistic history. However, the available human OMICs tools and databases are not representative of this diversity, therefore creating significant gaps in biomedical research. African scientists, students, and publics are among the key contributors to OMICs systems science. This expert review examines the pressing issues in human OMICs research, education, and development in Africa, as seen through a lens of computational biology, public health relevant technology innovation, critically-informed science governance, and how best to harness OMICs data to benefit health and societies in Africa and beyond. We underscore the disparities between North and Sub-Saharan Africa at different levels. A harmonized African ethnolinguistic classification would help address annotation challenges associated with population diversity. Finally, building on the existing strategic research initiatives, such as the H3Africa and H3ABioNet Consortia, we highly recommend addressing large-scale multidisciplinary research challenges, strengthening research collaborations and knowledge transfer, and enhancing the ability of African researchers to influence and shape national and international research, policy, and funding agendas. This article and analysis contribute to a deeper understanding of past and current challenges in the African OMICs innovation ecosystem, while also offering foresight on future innovation trajectories.
Collapse
Affiliation(s)
- Yosr Hamdi
- Laboratory of Biomedical Genomics and Oncogenetics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia.,Laboratory of Human and Experimental Pathology, Institut Pasteur de Tunis, Tunis, Tunisia
| | - Lyndon Zass
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Houcemeddine Othman
- Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
| | - Fouzia Radouani
- Chlamydiae and Mycoplasmas Laboratory, Institut Pasteur du Maroc, Casablanca, Morocco
| | - Imane Allali
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa.,Laboratory of Human Pathologies Biology, Department of Biology, Faculty of Sciences, and Genomic Center of Human Pathologies, Faculty of Medicine and Pharmacy, Mohammed V University in Rabat, Rabat, Morocco
| | - Mariem Hanachi
- Laboratory of Bioinformatics, Biomathematics and Biostatistics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia.,Faculty of Science of Bizerte, Zarzouna, University of Carthage, Tunis, Tunisia
| | - Chiamaka Jessica Okeke
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa
| | - Melek Chaouch
- Laboratory of Bioinformatics, Biomathematics and Biostatistics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Maureen Bilinga Tendwa
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa
| | - Chaimae Samtal
- Laboratory of Biotechnology, Environment, Agri-food and Health, Faculty of Sciences Dhar El Mahraz-Sidi Mohammed Ben Abdellah University, Fez, Morocco.,University of Mohamed Premier, Oujda, Morocco
| | - Reem Mohamed Sallam
- Department of Medical Biochemistry and Molecular Biology, Faculty of Medicine, Ain Shams University, Cairo, Egypt.,Department of Basic Medical Sciences, Faculty of Medicine, Galala University, Suez, Egypt
| | - Nihad Alsayed
- Centre for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum, Khartoum, Sudan
| | - Michael Turkson
- The National Institute for Mathematical Sciences, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana
| | - Samah Ahmed
- Centre for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum, Khartoum, Sudan
| | - Alia Benkahla
- Laboratory of Bioinformatics, Biomathematics and Biostatistics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Lilia Romdhane
- Laboratory of Biomedical Genomics and Oncogenetics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia.,Faculty of Science of Bizerte, Zarzouna, University of Carthage, Tunis, Tunisia
| | - Oussema Souiai
- Laboratory of Bioinformatics, Biomathematics and Biostatistics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Özlem Tastan Bishop
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa
| | - Kais Ghedira
- Laboratory of Bioinformatics, Biomathematics and Biostatistics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia
| | - Faisal Mohamed Fadlelmola
- Centre for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum, Khartoum, Sudan
| | - Nicola Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Samar Kamal Kassim
- Department of Medical Biochemistry and Molecular Biology, Faculty of Medicine, Ain Shams University, Cairo, Egypt
| |
Collapse
|
9
|
Yoon A, Kim Y. The role of data-reuse experience in biological scientists’ data sharing: an empirical analysis. ELECTRONIC LIBRARY 2020. [DOI: 10.1108/el-06-2019-0146] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
The purpose of this paper is to investigate how scientists’ prior data-reuse experience affects their data-sharing intention by updating diverse attitudinal, control and normative beliefs about data sharing.
Design/methodology/approach
This paper used a survey method and the research model was evaluated by applying structural equation modelling to 476 survey responses from biological scientists in the USA.
Findings
The results show that prior data-reuse experience significantly increases the perceived community and career benefits and subjective norms of data sharing and significantly decreases the perceived risk and effort involved in data sharing. The perceived community benefits and subjective norms of data sharing positively influence scientists’ data-sharing intention, whereas the perceived risk and effort negatively influence scientists’ data-sharing intention.
Research limitations/implications
Based on the theory of planned behaviour, the research model was developed by connecting scientists’ prior data-reuse experience and data-sharing intention mediated through diverse attitudinal, control and normative perceptions of data sharing.
Practical implications
This research suggests that to facilitate scientists’ data-sharing behaviours, data reuse needs to be encouraged. Data sharing and reuse are interconnected, so scientists’ data sharing can be better promoted by providing them with data-reuse experience.
Originality/value
This is one of the initial studies examining the relationship between data-reuse experience and data-sharing behaviour, and it considered the following mediating factors: perceived community benefit, career benefit, career risk, effort and subjective norm of data sharing. This research provides an advanced investigation of data-sharing behaviour in the relationship with data-reuse experience and suggests significant implications for fostering data-sharing behaviour.
Collapse
|
10
|
Hackett RA, Belitz MW, Gilbert EE, Monfils AK. A data management workflow of biodiversity data from the field to data users. APPLICATIONS IN PLANT SCIENCES 2019; 7:e11310. [PMID: 31890356 PMCID: PMC6923704 DOI: 10.1002/aps3.11310] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2019] [Accepted: 10/21/2019] [Indexed: 06/10/2023]
Abstract
PREMISE Heterogeneity of biodiversity data from the collections, research, and management communities presents challenges for data findability, accessibility, interoperability, and reusability. Workflows designed with data collection, standards, dissemination, and reuse in mind will generate better information across geopolitical, administrative, and institutional boundaries. Here, we present our data workflow as a case study of how we collected, shared, and used data from multiple sources. METHODS In 2012, we initiated the collection of biodiversity data relating to Michigan prairie fens, including data on plant communities and the federally endangered Poweshiek skipperling (Oarisma poweshiek). RESULTS Over 23,000 occurrence records were compiled in a database following Darwin Core standards. The records were linked with media and biological, chemical, and geometric measurements. We published the data as Global Biodiversity Information Facility data sets and in Symbiota SEINet portals. DISCUSSION We highlight data collection techniques that optimized transcription time, including the use of predetermined and controlled vocabulary, Darwin Core terms, and data dictionaries. The validity and longevity of our data were supported by voucher specimens, metadata with measurement records, and published manuscripts detailing methods and data sets. Key to our data dissemination was cooperation among partners and the utilization of dynamic tools. To increase data interoperability, we need flexible and customizable data collection templates, coding, and enhanced communication among communities using biodiversity data.
Collapse
Affiliation(s)
- Rachel A. Hackett
- Department of BiologyInstitute for Great Lakes ResearchCentral Michigan UniversityBioscience Building 2100, 1455 Calumet CourtMount PleasantMichigan48859USA
- Michigan Natural Features InventoryMichigan State University ExtensionP.O. Box 13036LansingMichigan48901‐3036USA
| | - Michael W. Belitz
- Department of BiologyInstitute for Great Lakes ResearchCentral Michigan UniversityBioscience Building 2100, 1455 Calumet CourtMount PleasantMichigan48859USA
- Florida Museum of Natural HistoryUniversity of FloridaGainesvilleFlorida32611USA
| | | | - Anna K. Monfils
- Department of BiologyInstitute for Great Lakes ResearchCentral Michigan UniversityBioscience Building 2100, 1455 Calumet CourtMount PleasantMichigan48859USA
| |
Collapse
|
11
|
Emam I, Elyasigomari V, Matthews A, Pavlidis S, Rocca-Serra P, Guitton F, Verbeeck D, Grainger L, Borgogni E, Del Giudice G, Saqi M, Houston P, Guo Y. PlatformTM, a standards-based data custodianship platform for translational medicine research. Sci Data 2019; 6:149. [PMID: 31409798 PMCID: PMC6692384 DOI: 10.1038/s41597-019-0156-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Accepted: 07/25/2019] [Indexed: 12/20/2022] Open
Abstract
Biomedical informatics has traditionally adopted a linear view of the informatics process (collect, store and analyse) in translational medicine (TM) studies; focusing primarily on the challenges in data integration and analysis. However, a data management challenge presents itself with the new lifecycle view of data emphasized by the recent calls for data re-use, long term data preservation, and data sharing. There is currently a lack of dedicated infrastructure focused on the ‘manageability’ of the data lifecycle in TM research between data collection and analysis. Current community efforts towards establishing a culture for open science prompt the creation of a data custodianship environment for management of TM data assets to support data reuse and reproducibility of research results. Here we present the development of a lifecycle-based methodology to create a metadata management framework based on community driven standards for standardisation, consolidation and integration of TM research data. Based on this framework, we also present the development of a new platform (PlatformTM) focused on managing the lifecycle for translational research data assets.
Collapse
Affiliation(s)
- Ibrahim Emam
- Data Science Institute, Imperial College London, London, UK.
| | | | - Alex Matthews
- Clinical Research Centre, University of Surrey, Guildford, UK
| | | | | | | | | | | | | | | | - Mansoor Saqi
- Data Science Institute, Imperial College London, London, UK
| | - Paul Houston
- CDISC, Clinical Data Interchange Standards Consortium and CDISC EU Foundation, London, UK
| | - Yike Guo
- Data Science Institute, Imperial College London, London, UK
| |
Collapse
|
12
|
NetR and AttR, Two New Bioinformatic Tools to Integrate Diverse Datasets into Cytoscape Network and Attribute Files. Genes (Basel) 2019; 10:genes10060423. [PMID: 31159440 PMCID: PMC6628208 DOI: 10.3390/genes10060423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 05/25/2019] [Accepted: 05/27/2019] [Indexed: 11/17/2022] Open
Abstract
High-throughput technologies have allowed researchers to obtain genome-wide data from a wide array of experimental model systems. Unfortunately, however, new data generation tends to significantly outpace data re-utilization, and most high throughput datasets are only rarely used in subsequent studies or to generate new hypotheses to be tested experimentally. The reasons behind such data underutilization include a widespread lack of programming expertise among experimentalist biologists to carry out the necessary file reformatting that is often necessary to integrate published data from disparate sources. We have developed two programs (NetR and AttR), which allow experimental biologists with little to no programming background to integrate publicly available datasets into files that can be later visualized with Cytoscape to display hypothetical networks that result from combining individual datasets, as well as a series of published attributes related to the genes or proteins in the network. NetR also allows users to import protein and genetic interaction data from InterMine, which can further enrich a network model based on curated information. We expect that NetR/AttR will allow experimental biologists to mine a largely unexploited wealth of data in their fields and facilitate their integration into hypothetical models to be tested experimentally.
Collapse
|
13
|
The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res 2019; 47:D330-D338. [PMID: 30395331 PMCID: PMC6323945 DOI: 10.1093/nar/gky1055] [Citation(s) in RCA: 2532] [Impact Index Per Article: 506.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2018] [Accepted: 10/17/2018] [Indexed: 02/06/2023] Open
Abstract
The Gene Ontology resource (GO; http://geneontology.org) provides structured, computable knowledge regarding the functions of genes and gene products. Founded in 1998, GO has become widely adopted in the life sciences, and its contents are under continual improvement, both in quantity and in quality. Here, we report the major developments of the GO resource during the past two years. Each monthly release of the GO resource is now packaged and given a unique identifier (DOI), enabling GO-based analyses on a specific release to be reproduced in the future. The molecular function ontology has been refactored to better represent the overall activities of gene products, with a focus on transcription regulator activities. Quality assurance efforts have been ramped up to address potentially out-of-date or inaccurate annotations. New evidence codes for high-throughput experiments now enable users to filter out annotations obtained from these sources. GO-CAM, a new framework for representing gene function that is more expressive than standard GO annotations, has been released, and users can now explore the growing repository of these models. We also provide the 'GO ribbon' widget for visualizing GO annotations to a gene; the widget can be easily embedded in any web page.
Collapse
|
14
|
Watson-Haigh NS, Suchecki R, Kalashyan E, Garcia M, Baumann U. DAWN: a resource for yielding insights into the diversity among wheat genomes. BMC Genomics 2018; 19:941. [PMID: 30558550 PMCID: PMC6296097 DOI: 10.1186/s12864-018-5228-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2018] [Accepted: 11/06/2018] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Democratising the growing body of whole genome sequencing data available for Triticum aestivum (bread wheat) has been impeded by the lack of a genome reference and the large computational requirements for analysing these data sets. RESULTS DAWN (Diversity Among Wheat geNomes) integrates data from the T. aestivum Chinese Spring (CS) IWGSC RefSeq v1.0 genome with public WGS and exome data from 17 and 62 accessions respectively, enabling researchers and breeders alike to investigate genotypic differences between wheat accessions at the level of whole chromosomes down to individual genes. CONCLUSIONS Using DAWN we show that it is possible to visualise small and large chromosomal deletions, identify haplotypes at a glance and spot the consequences of selective breeding. DAWN allows us to detect the break points of alien introgression segments brought into an accession when transferring desired genes. Furthermore, we can find possible explanations for reduced recombination in parts of a chromosome, we can predict regions with linkage drag, and also look at diversity in centromeric regions.
Collapse
Affiliation(s)
- Nathan S. Watson-Haigh
- School of Agriculture, Food and Wine, University of Adelaide, PMB 1, Glen Osmond, 5064 SA Australia
- Bioinformatics Hub, School of Biological Sciences, University of Adelaide, Adelaide, SA 5005 Australia
| | - Radosław Suchecki
- School of Agriculture, Food and Wine, University of Adelaide, PMB 1, Glen Osmond, 5064 SA Australia
- CSIRO Agriculture and Food, Glen Osmond, Locked Bag 2, Adelaide, SA 5064 Australia
| | - Elena Kalashyan
- School of Agriculture, Food and Wine, University of Adelaide, PMB 1, Glen Osmond, 5064 SA Australia
| | - Melissa Garcia
- School of Agriculture, Food and Wine, University of Adelaide, PMB 1, Glen Osmond, 5064 SA Australia
| | - Ute Baumann
- School of Agriculture, Food and Wine, University of Adelaide, PMB 1, Glen Osmond, 5064 SA Australia
| |
Collapse
|
15
|
Pascar J, Chandler CH. A bioinformatics approach to identifying Wolbachia infections in arthropods. PeerJ 2018; 6:e5486. [PMID: 30202647 PMCID: PMC6126470 DOI: 10.7717/peerj.5486] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Accepted: 07/30/2018] [Indexed: 11/20/2022] Open
Abstract
Wolbachia is the most widespread endosymbiont, infecting >20% of arthropod species, and capable of drastically manipulating the host's reproductive mechanisms. Conventionally, diagnosis has relied on PCR amplification; however, PCR is not always a reliable diagnostic technique due to primer specificity, strain diversity, degree of infection and/or tissue sampled. Here, we look for evidence of Wolbachia infection across a wide array of arthropod species using a bioinformatic approach to detect the Wolbachia genes ftsZ, wsp, and the groE operon in next-generation sequencing samples available through the NCBI Sequence Read Archive. For samples showing signs of infection, we attempted to assemble entire Wolbachia genomes, and in order to better understand the relationships between hosts and symbionts, phylogenies were constructed using the assembled gene sequences. Out of the 34 species with positively identified infections, eight species of arthropod had not previously been recorded to harbor Wolbachia infection. All putative infections cluster with known representative strains belonging to supergroup A or B, which are known to only infect arthropods. This study presents an efficient bioinformatic approach for post-sequencing diagnosis and analysis of Wolbachia infection in arthropods.
Collapse
Affiliation(s)
- Jane Pascar
- Department of Biological Sciences, State University of New York at Oswego, Oswego, NY, United States of America
- Department of Biology, Syracuse University, Syracuse, NY, United States of America
| | - Christopher H. Chandler
- Department of Biological Sciences, State University of New York at Oswego, Oswego, NY, United States of America
| |
Collapse
|