1
|
Kapoor M, Ventura ES, Walsh A, Sokolov A, George N, Kumari S, Provart NJ, Cole B, Libault M, Tickle T, Warren WC, Koltes JE, Papatheodorou I, Ware D, Harrison PW, Elsik C, Yordanova G, Burdett T, Tuggle CK. Building a FAIR data ecosystem for incorporating single-cell transcriptomics data into agricultural genome to phenome research. Front Genet 2024; 15:1460351. [PMID: 39678381 PMCID: PMC11638175 DOI: 10.3389/fgene.2024.1460351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Accepted: 11/13/2024] [Indexed: 12/17/2024] Open
Abstract
Introduction The agriculture genomics community has numerous data submission standards available, but the standards for describing and storing single-cell (SC, e.g., scRNA- seq) data are comparatively underdeveloped. Methods To bridge this gap, we leveraged recent advancements in human genomics infrastructure, such as the integration of the Human Cell Atlas Data Portal with Terra, a secure, scalable, open-source platform for biomedical researchers to access data, run analysis tools, and collaborate. In parallel, the Single Cell Expression Atlas at EMBL-EBI offers a comprehensive data ingestion portal for high-throughput sequencing datasets, including plants, protists, and animals (including humans). Developing data tools connecting these resources would offer significant advantages to the agricultural genomics community. The FAANG data portal at EMBL-EBI emphasizes delivering rich metadata and highly accurate and reliable annotation of farmed animals but is not computationally linked to either of these resources. Results Herein, we describe a pilot-scale project that determines whether the current FAANG metadata standards for livestock can be used to ingest scRNA-seq datasets into Terra in a manner consistent with HCA Data Portal standards. Importantly, rich scRNA-seq metadata can now be brokered through the FAANG data portal using a semi-automated process, thereby avoiding the need for substantial expert curation. We have further extended the functionality of this tool so that validated and ingested SC files within the HCA Data Portal are transferred to Terra for further analysis. In addition, we verified data ingestion into Terra, hosted on Azure, and demonstrated the use of a workflow to analyze the first ingested porcine scRNA-seq dataset. Additionally, we have also developed prototype tools to visualize the output of scRNA-seq analyses on genome browsers to compare gene expression patterns across tissues and cell populations. This JBrowse tool now features distinct tracks, showcasing PBMC scRNA-seq alongside two bulk RNA-seq experiments. Discussion We intend to further build upon these existing tools to construct a scientist-friendly data resource and analytical ecosystem based on Findable, Accessible, Interoperable, and Reusable (FAIR) SC principles to facilitate SC-level genomic analysis through data ingestion, storage, retrieval, re-use, visualization, and comparative annotation across agricultural species.
Collapse
Affiliation(s)
- Muskan Kapoor
- Department of Animal Science, Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Enrique Sapena Ventura
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, Cambridgeshire, United Kingdom
| | - Amy Walsh
- Animal Science Research Center, Division of Animal Science and Division of Plant Science and Technology, University of Missouri-Columbia, Columbia, MO, United States
| | - Alexey Sokolov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, Cambridgeshire, United Kingdom
| | - Nancy George
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, Cambridgeshire, United Kingdom
| | - Sunita Kumari
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, United States
| | - Nicholas J. Provart
- Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, Canada
| | - Benjamin Cole
- Lawrence Berkeley National Laboratory, DOE-Joint Genome Institute, Berkeley, CA, United States
| | - Marc Libault
- Plant Science and Technology, University of Missouri, Columbia, MO, United States
| | - Timothy Tickle
- The Broad Institute of MIT and Harvard, Data Sciences Platform, Cambridge, MA, United States
| | - Wesley C. Warren
- Division of Animal Science, University of Missouri-Columbia, Columbia, MO, United States
| | - James E. Koltes
- Department of Animal Science, Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Irene Papatheodorou
- Earlham Institute, Norwich Research Park, Norwich, United Kingdom
- Medical School, University of East Anglia, Norwich Research Park, Norwich, United Kingdom
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, United States
- U.S. Department of Agriculture, Agricultural Research Service, NEA Robert W. Holley Center for Agriculture and Health, Cornell University, Ithaca, NY, United States
| | - Peter W. Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, Cambridgeshire, United Kingdom
| | - Christine Elsik
- Animal Science Research Center, Division of Animal Science and Division of Plant Science and Technology, University of Missouri-Columbia, Columbia, MO, United States
| | - Galabina Yordanova
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, Cambridgeshire, United Kingdom
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, Cambridgeshire, United Kingdom
| | - Christopher K. Tuggle
- Department of Animal Science, Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| |
Collapse
|
2
|
Liu W, He H, Chicco D. Gene signatures for cancer research: A 25-year retrospective and future avenues. PLoS Comput Biol 2024; 20:e1012512. [PMID: 39413055 PMCID: PMC11482671 DOI: 10.1371/journal.pcbi.1012512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2024] Open
Abstract
Over the past two decades, extensive studies, particularly in cancer analysis through large datasets like The Cancer Genome Atlas (TCGA), have aimed at improving patient therapies and precision medicine. However, limited overlap and inconsistencies among gene signatures across different cohorts pose challenges. The dynamic nature of the transcriptome, encompassing diverse RNA species and functional complexities at gene and isoform levels, introduces intricacies, and current gene signatures face reproducibility issues due to the unique transcriptomic landscape of each patient. In this context, discrepancies arising from diverse sequencing technologies, data analysis algorithms, and software tools further hinder consistency. While careful experimental design, analytical strategies, and standardized protocols could enhance reproducibility, future prospects lie in multiomics data integration, machine learning techniques, open science practices, and collaborative efforts. Standardized metrics, quality control measures, and advancements in single-cell RNA-seq will contribute to unbiased gene signature identification. In this perspective article, we outline some thoughts and insights addressing challenges, standardized practices, and advanced methodologies enhancing the reliability of gene signatures in disease transcriptomic research.
Collapse
Affiliation(s)
- Wei Liu
- College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Huaqin He
- College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Davide Chicco
- Dipartimento di Informatica Sistemistica e Comunicazione, Università di Milano-Bicocca, Milan, Italy
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
3
|
Aksenova A, Johny A, Adams T, Gribbon P, Jacobs M, Hofmann-Apitius M. Current state of data stewardship tools in life science. Front Big Data 2024; 7:1428568. [PMID: 39351001 PMCID: PMC11439729 DOI: 10.3389/fdata.2024.1428568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 08/23/2024] [Indexed: 10/04/2024] Open
Abstract
In today's data-centric landscape, effective data stewardship is critical for facilitating scientific research and innovation. This article provides an overview of essential tools and frameworks for modern data stewardship practices. Over 300 tools were analyzed in this study, assessing their utility, relevance to data stewardship, and applicability within the life sciences domain.
Collapse
Affiliation(s)
- Anna Aksenova
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, Germany
| | - Anoop Johny
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin, Germany
| | - Tim Adams
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin, Germany
| | - Phil Gribbon
- Fraunhofer Institute for Translational Medicine and Pharmacology, Discovery Research Screening Port, Hamburg, Germany
| | - Marc Jacobs
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin, Germany
| | - Martin Hofmann-Apitius
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, Germany
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin, Germany
| |
Collapse
|
4
|
Conrad TOF, Ferrer E, Mietchen D, Pusch L, Stegmüller J, Schubotz M. Making Mathematical Research Data FAIR: Pathways to Improved Data Sharing. Sci Data 2024; 11:676. [PMID: 38909043 PMCID: PMC11193822 DOI: 10.1038/s41597-024-03480-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 06/05/2024] [Indexed: 06/24/2024] Open
Abstract
The sharing and citation of research data is becoming increasingly recognized as an essential building block in scientific research across various fields and disciplines. Sharing research data allows other researchers to reproduce results, replicate findings, and build on them. Ultimately, this will foster faster cycles in knowledge generation. Some disciplines, such as astronomy or bioinformatics, already have a long history of sharing data; many others do not. The current landscape of available systems for sharing research data is diverse. In this article, we conduct a detailed analysis of existing web-based systems, specifically focusing on mathematical research data.
Collapse
Affiliation(s)
| | | | - Daniel Mietchen
- FIZ Karlsruhe - Leibniz Institute for Information Infrastructure, Berlin, Germany
- Ronin Institute for Independent Scholarship, Montclair, USA
| | | | - Johannes Stegmüller
- FIZ Karlsruhe - Leibniz Institute for Information Infrastructure, Berlin, Germany
| | - Moritz Schubotz
- FIZ Karlsruhe - Leibniz Institute for Information Infrastructure, Berlin, Germany.
| |
Collapse
|
5
|
Zhang K, Liang J, Fu Y, Chu J, Fu L, Wang Y, Li W, Zhou Y, Li J, Yin X, Wang H, Liu X, Mou C, Wang C, Wang H, Dong X, Yan D, Yu M, Zhao S, Li X, Ma Y. AGIDB: a versatile database for genotype imputation and variant decoding across species. Nucleic Acids Res 2024; 52:D835-D849. [PMID: 37889051 PMCID: PMC10767904 DOI: 10.1093/nar/gkad913] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 10/05/2023] [Accepted: 10/11/2023] [Indexed: 10/28/2023] Open
Abstract
The high cost of large-scale, high-coverage whole-genome sequencing has limited its application in genomics and genetics research. The common approach has been to impute whole-genome sequence variants obtained from a few individuals for a larger population of interest individually genotyped using SNP chip. An alternative involves low-coverage whole-genome sequencing (lcWGS) of all individuals in the larger population, followed by imputation to sequence resolution. To overcome limitations of processing lcWGS data and meeting specific genotype imputation requirements, we developed AGIDB (https://agidb.pro), a website comprising tools and database with an unprecedented sample size and comprehensive variant decoding for animals. AGIDB integrates whole-genome sequencing and chip data from 17 360 and 174 945 individuals, respectively, across 89 species to identify over one billion variants, totaling a massive 688.57 TB of processed data. AGIDB focuses on integrating multiple genotype imputation scenarios. It also provides user-friendly searching and data analysis modules that enable comprehensive annotation of genetic variants for specific populations. To meet a wide range of research requirements, AGIDB offers downloadable reference panels for each species in addition to its extensive dataset, variant decoding and utility tools. We hope that AGIDB will become a key foundational resource in genetics and breeding, providing robust support to researchers.
Collapse
Affiliation(s)
- Kaili Zhang
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of Agriculture, Huazhong Agricultural University, Wuhan 430070, China
| | - Jiete Liang
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of Agriculture, Huazhong Agricultural University, Wuhan 430070, China
| | - Yuhua Fu
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of Agriculture, Huazhong Agricultural University, Wuhan 430070, China
| | - Jinyu Chu
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of Agriculture, Huazhong Agricultural University, Wuhan 430070, China
| | - Liangliang Fu
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of Agriculture, Huazhong Agricultural University, Wuhan 430070, China
- Hubei Hongshan Laboratory, Wuhan 430070, China
- The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan 430070, China
| | - Yongfei Wang
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of Agriculture, Huazhong Agricultural University, Wuhan 430070, China
| | - Wangjiao Li
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of Agriculture, Huazhong Agricultural University, Wuhan 430070, China
| | - You Zhou
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of Agriculture, Huazhong Agricultural University, Wuhan 430070, China
| | - Jinhua Li
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of Agriculture, Huazhong Agricultural University, Wuhan 430070, China
| | - Xiaoxiao Yin
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of Agriculture, Huazhong Agricultural University, Wuhan 430070, China
| | - Haiyan Wang
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of Agriculture, Huazhong Agricultural University, Wuhan 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xiaolei Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of Agriculture, Huazhong Agricultural University, Wuhan 430070, China
- Hubei Hongshan Laboratory, Wuhan 430070, China
| | - Chunyan Mou
- College of Animal Science and Technology, Southwest University, Chongqing 402460, China
| | - Chonglong Wang
- Key Laboratory of Pig Molecular Quantitative Genetics of Anhui Academy of Agricultural Sciences, Anhui Provincial Key Laboratory of Livestock and Poultry Product Safety Engineering, Institute of Animal Husbandry and Veterinary Medicine, Anhui Academy of Agricultural Sciences, Hefei 230031, China
| | - Heng Wang
- College of Animal Science and Technology, Shandong Agricultural University, Taian 271018, China
| | - Xinxing Dong
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
| | - Dawei Yan
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
| | - Mei Yu
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of Agriculture, Huazhong Agricultural University, Wuhan 430070, China
- Hubei Hongshan Laboratory, Wuhan 430070, China
| | - Shuhong Zhao
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of Agriculture, Huazhong Agricultural University, Wuhan 430070, China
- Hubei Hongshan Laboratory, Wuhan 430070, China
- Lingnan Modern Agricultural Science and Technology Guangdong Laboratory, Guangzhou 510642, China
| | - Xinyun Li
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of Agriculture, Huazhong Agricultural University, Wuhan 430070, China
- Hubei Hongshan Laboratory, Wuhan 430070, China
| | - Yunlong Ma
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of Agriculture, Huazhong Agricultural University, Wuhan 430070, China
- Lingnan Modern Agricultural Science and Technology Guangdong Laboratory, Guangzhou 510642, China
| |
Collapse
|
6
|
Thompson PT, Ojha S, Powell CD, Pennell KG, Moseley HNB. A proposed FAIR approach for disseminating geospatial information system maps. Sci Data 2023; 10:389. [PMID: 37328607 PMCID: PMC10275873 DOI: 10.1038/s41597-023-02281-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 05/31/2023] [Indexed: 06/18/2023] Open
Abstract
We present a draft Minimum Information About Geospatial Information System (MIAGIS) standard for facilitating public deposition of geospatial information system (GIS) datasets that follows the FAIR (Findable, Accessible, Interoperable and Reusable) principles. The draft MIAGIS standard includes a deposition directory structure and a minimum javascript object notation (JSON) metadata formatted file that is designed to capture critical metadata describing GIS layers and maps as well as their sources of data and methods of generation. The associated miagis Python package facilitates the creation of this MIAGIS metadata file and directly supports metadata extraction from both Esri JSON and GEOJSON GIS data formats plus options for extraction from user-specified JSON formats. We also demonstrate their use in crafting two example depositions of ArcGIS generated maps. We hope this draft MIAGIS standard along with the supporting miagis Python package will assist in establishing a GIS standards group that will develop the draft into a full standard for the wider GIS community as well as a future public repository for GIS datasets.
Collapse
Affiliation(s)
- P Travis Thompson
- University of Kentucky Superfund Research Center (UKSRC), Lexington, KY, USA
| | - Sweta Ojha
- University of Kentucky Superfund Research Center (UKSRC), Lexington, KY, USA
- University of Kentucky, College of Engineering, Department of Civil Engineering, Lexington, KY, USA
| | - Christian D Powell
- University of Kentucky Superfund Research Center (UKSRC), Lexington, KY, USA
- University of Kentucky, Department of Computer Science (Data Science Program), Lexington, KY, USA
| | - Kelly G Pennell
- University of Kentucky Superfund Research Center (UKSRC), Lexington, KY, USA
- University of Kentucky, College of Engineering, Department of Civil Engineering, Lexington, KY, USA
| | - Hunter N B Moseley
- University of Kentucky Superfund Research Center (UKSRC), Lexington, KY, USA.
- University of Kentucky, Department of Molecular and Cellular Biochemistry, Lexington, KY, USA.
| |
Collapse
|
7
|
Hebart MN, Contier O, Teichmann L, Rockter AH, Zheng CY, Kidder A, Corriveau A, Vaziri-Pashkam M, Baker CI. THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior. eLife 2023; 12:e82580. [PMID: 36847339 PMCID: PMC10038662 DOI: 10.7554/elife.82580] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 02/25/2023] [Indexed: 03/01/2023] Open
Abstract
Understanding object representations requires a broad, comprehensive sampling of the objects in our visual world with dense measurements of brain activity and behavior. Here, we present THINGS-data, a multimodal collection of large-scale neuroimaging and behavioral datasets in humans, comprising densely sampled functional MRI and magnetoencephalographic recordings, as well as 4.70 million similarity judgments in response to thousands of photographic images for up to 1,854 object concepts. THINGS-data is unique in its breadth of richly annotated objects, allowing for testing countless hypotheses at scale while assessing the reproducibility of previous findings. Beyond the unique insights promised by each individual dataset, the multimodality of THINGS-data allows combining datasets for a much broader view into object processing than previously possible. Our analyses demonstrate the high quality of the datasets and provide five examples of hypothesis-driven and data-driven applications. THINGS-data constitutes the core public release of the THINGS initiative (https://things-initiative.org) for bridging the gap between disciplines and the advancement of cognitive neuroscience.
Collapse
Affiliation(s)
- Martin N Hebart
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
- Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain SciencesLeipzigGermany
- Department of Medicine, Justus Liebig University GiessenGiessenGermany
| | - Oliver Contier
- Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain SciencesLeipzigGermany
- Max Planck School of Cognition, Max Planck Institute for Human Cognitive and Brain SciencesLeipzigGermany
| | - Lina Teichmann
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| | - Adam H Rockter
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| | - Charles Y Zheng
- Machine Learning Core, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| | - Alexis Kidder
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| | - Anna Corriveau
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| | - Maryam Vaziri-Pashkam
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| | - Chris I Baker
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| |
Collapse
|
8
|
Arend D, Scholz U, Lange M. The Plant Phenomics and Genomics Research Data Repository: An On-Premise Approach for FAIR-Compliant Data Acquisition. Methods Mol Biol 2023; 2703:3-22. [PMID: 37646933 DOI: 10.1007/978-1-0716-3389-2_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
The FAIR data principle as a commitment to support long-term research data management is widely accepted in the scientific community. However, although many established infrastructures provide comprehensive and long-term stable services and platforms, a large quantity of research data is still hidden. Currently, high-throughput plant genomics and phenomics technologies are producing research data in abundance, the storage of which is not covered by established core databases. This concerns the data volume, for example, time series of images or high-resolution hyperspectral data; the quality of data formatting and annotation, e.g., with regard to structure and annotation specifications of core databases; uncovered data domains; or organizational constraints prohibiting primary data storage outside institutional boundaries. To share these potentially dark data in a FAIR way and master these challenges the ELIXIR Germany/de.NBI service Plant Genomic and Phenomics Research Data Repository (PGP) implements an on-premise approach, which allows research data to be kept in place and wrapped in FAIR-aware software infrastructure. In this chapter, the e!DAL infrastructure software and the PGP repository are presented as best practice on how to easily setup FAIR-compliant and intuitive research data services.
Collapse
Affiliation(s)
- Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, OT Gatersleben, Germany.
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, OT Gatersleben, Germany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, OT Gatersleben, Germany
| |
Collapse
|
9
|
A repository for the publication and sharing of heterogeneous materials data. Sci Data 2022; 9:787. [PMID: 36575234 PMCID: PMC9794830 DOI: 10.1038/s41597-022-01897-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 12/14/2022] [Indexed: 12/28/2022] Open
Abstract
National Materials Data Management and Service platform (NMDMS) is a materials data repository for the publication and sharing of heterogeneous materials scientific data and follows the FAIR principles: Findable, Accessible, Interoperable, and Reusable. To ensure data are 'Interoperable, NMDMS uses a user-friendly semi-structured scientific data model, named dynamic container', to define, exchange, and store heterogeneous scientific data. Then, a personalized yet standardized data submission subsystem, a rigorous project data review and publication subsystem, and a multi-granularity data query and retrieval subsystem collaboratively make data 'Reusable', 'Findable', and 'Accessible'. Finally, China's "National Key R&D Program: Material Genetic Engineering Key Special Project" has adopted NMDMS to publish and share its project data. There are 12,251,040 pieces of data published in NMDMS since 2018, under 87 categories and 1,912 user-defined schemas from 45 projects. The platform has been accessed 908875 times, and 2403,208 pieces of data have been downloaded. In short, NMDMS effectively accelerates the publication and sharing of material project data in China.
Collapse
|
10
|
Martin EL, Barrote VR, Cawood PA. A resource for automated search and collation of geochemical datasets from journal supplements. Sci Data 2022; 9:724. [PMID: 36433993 PMCID: PMC9700723 DOI: 10.1038/s41597-022-01730-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 09/27/2022] [Indexed: 11/27/2022] Open
Abstract
This article presents a resource for automated search, extraction and collation of geochemical and geochronological data from the Figshare repository using web scraping code. To answer fundamental questions about the Earth's evolution, such as spatial and temporal evolution and interrelationships between the planet's solid and surficial reservoirs, researchers must utilize global geochemical datasets. Due to the volume of data being published, these datasets become quickly outdated. We present a resource that allows researchers to rapidly curate and update their own databases from existing published data. We use open-source Python code to web scrape the Figshare repository for journal supplementary files using the application programming interface, allowing for the collection and download of hundreds of supplementary files and metadata in minutes. Use of this web scraping tool is demonstrated here by collation of a zircon geochronology and chemistry database of >150,000 analyses. The database is consistent in reproducing trends in other published zircon compilations. Providing a resource for automated collection of Figshare data files will encourage data sharing and reuse.
Collapse
Affiliation(s)
- Erin L Martin
- School of Earth, Atmosphere and Environment, Monash University, Clayton, Victoria, 3800, Australia.
| | - Vitor R Barrote
- School of Earth, Atmosphere and Environment, Monash University, Clayton, Victoria, 3800, Australia
| | - Peter A Cawood
- School of Earth, Atmosphere and Environment, Monash University, Clayton, Victoria, 3800, Australia
| |
Collapse
|
11
|
Wahid KA, Glerean E, Sahlsten J, Jaskari J, Kaski K, Naser MA, He R, Mohamed ASR, Fuller CD. Artificial Intelligence for Radiation Oncology Applications Using Public Datasets. Semin Radiat Oncol 2022; 32:400-414. [PMID: 36202442 PMCID: PMC9587532 DOI: 10.1016/j.semradonc.2022.06.009] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Artificial intelligence (AI) has exceptional potential to positively impact the field of radiation oncology. However, large curated datasets - often involving imaging data and corresponding annotations - are required to develop radiation oncology AI models. Importantly, the recent establishment of Findable, Accessible, Interoperable, Reusable (FAIR) principles for scientific data management have enabled an increasing number of radiation oncology related datasets to be disseminated through data repositories, thereby acting as a rich source of data for AI model building. This manuscript reviews the current and future state of radiation oncology data dissemination, with a particular emphasis on published imaging datasets, AI data challenges, and associated infrastructure. Moreover, we provide historical context of FAIR data dissemination protocols, difficulties in the current distribution of radiation oncology data, and recommendations regarding data dissemination for eventual utilization in AI models. Through FAIR principles and standardized approaches to data dissemination, radiation oncology AI research has nothing to lose and everything to gain.
Collapse
Affiliation(s)
- Kareem A Wahid
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Enrico Glerean
- Department of Neuroscience and Biomedical Engineering, Aalto University School of Science, Espoo, Finland; Department of Computer Science, Aalto University School of Science, Espoo, Finland
| | - Jaakko Sahlsten
- Department of Computer Science, Aalto University School of Science, Espoo, Finland
| | - Joel Jaskari
- Department of Computer Science, Aalto University School of Science, Espoo, Finland
| | - Kimmo Kaski
- Department of Computer Science, Aalto University School of Science, Espoo, Finland
| | - Mohamed A Naser
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Renjie He
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Abdallah S R Mohamed
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Clifton D Fuller
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA.
| |
Collapse
|
12
|
Garijo D, Ménager H, Hwang L, Trisovic A, Hucka M, Morrell T, Allen A. Nine best practices for research software registries and repositories. PeerJ Comput Sci 2022; 8:e1023. [PMID: 36092012 PMCID: PMC9455149 DOI: 10.7717/peerj-cs.1023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 06/09/2022] [Indexed: 06/15/2023]
Abstract
Scientific software registries and repositories improve software findability and research transparency, provide information for software citations, and foster preservation of computational methods in a wide range of disciplines. Registries and repositories play a critical role by supporting research reproducibility and replicability, but developing them takes effort and few guidelines are available to help prospective creators of these resources. To address this need, the FORCE11 Software Citation Implementation Working Group convened a Task Force to distill the experiences of the managers of existing resources in setting expectations for all stakeholders. In this article, we describe the resultant best practices which include defining the scope, policies, and rules that govern individual registries and repositories, along with the background, examples, and collaborative work that went into their development. We believe that establishing specific policies such as those presented here will help other scientific software registries and repositories better serve their users and their disciplines.
Collapse
Affiliation(s)
| | - Hervé Ménager
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, Paris, France
| | - Lorraine Hwang
- University of California, Davis, Davis, California, United States
| | - Ana Trisovic
- Harvard University, Boston, Massachusetts, United States
| | - Michael Hucka
- California Institute of Technology, Pasadena, California, United States
| | - Thomas Morrell
- California Institute of Technology, Pasadena, California, United States
| | - Alice Allen
- University of Maryland, College Park, MD, United States
| | | | | |
Collapse
|
13
|
Jiao C, Li K, Fang Z. Data sharing practices across knowledge domains: A dynamic examination of data availability statements in PLOS ONE publications. J Inf Sci 2022. [DOI: 10.1177/01655515221101830] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
As the importance of research data gradually grows in sciences, data sharing has come to be encouraged and even mandated by journals and funders in recent years. Following this trend, the data availability statement has been increasingly embraced by academic communities as a means of sharing research data as part of research articles. This article presents a quantitative study of which mechanisms and repositories are used to share research data in PLOS ONE articles. We offer a dynamic examination of this topic from the disciplinary and temporal perspectives based on all statements in English-language research articles published between 2014 and 2020 in the journal. We find a slow yet steady growth in the use of data repositories to share data over time, as opposed to sharing data in the article and/or supplementary materials; this indicates improved compliance with the journal’s data sharing policies. We also find that multidisciplinary data repositories have been increasingly used over time, whereas some disciplinary repositories show a decreasing trend. Our findings can help academic publishers and funders to improve their data sharing policies and serve as an important baseline dataset for future studies on data sharing activities.
Collapse
Affiliation(s)
- Chenyue Jiao
- School of Information Sciences, University of Illinois Urbana-Champaign, USA
| | - Kai Li
- School of Information Resource Management, Renmin University of China, China
| | - Zhichao Fang
- Centre for Science and Technology Studies, Leiden University, The Netherlands
| |
Collapse
|
14
|
Thomas BR, Tan XL, Javadzadeh S, Robinson EJ, McDonald BS, Krupiczojc MA, Rahman SR, Rahman S, Ahmed RA, Begum R, Khanam H, Kelsell DP, Grigg J, Knell RJ, O'Toole EA. Modeling of Temporal Exposure to the Ambient Environment and Eczema Severity. JID INNOVATIONS 2022; 2:100062. [PMID: 34993502 PMCID: PMC8713123 DOI: 10.1016/j.xjidi.2021.100062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 09/10/2021] [Accepted: 09/13/2021] [Indexed: 11/28/2022] Open
Abstract
Atopic eczema is a common and complex disease. Missing genetic hereditability and increasing prevalence in industrializing nations point toward an environmental driver. We investigated the temporal association of weather and pollution parameters with eczema severity. This cross-sectional clinical study was performed between May 2018 and March 2020 and is part of the Tower Hamlets Eczema Assessment. All participants had a diagnosis of eczema, lived in East London, were of Bangladeshi ethnicity, and were aged <31 years. The primary outcome was the probability of having an Eczema Area and Severity Index score > 10 after previous ambient exposure to commonly studied meteorological variables and pollutants. There were 430 participants in the groups with Eczema Area and Severity Index ≤ 10 and 149 in those with Eczema Area and Severity Index > 10. Using logistic generalized additive models and a model selection process, we found that tropospheric ozone averaged over the preceding 270 days was strongly associated with eczema severity alongside the exposure to fine particles with diameters of 2.5 μm or less (fine particulate matter) averaged over the preceding 120 days. In our models and analyses, fine particulate matter appeared to largely act in a supporting role to ozone. We show that long-term exposure to ground-level ozone at high levels has the strongest association with eczema severity.
Collapse
Key Words
- AIC, Akaike Information Criterion
- EASI, Eczema Area and Severity Index
- EseC, European Socio-Economic Classification
- GAM, generalized additive model
- IGA, Investigators Global Assessment
- MAv, moving average
- NO, nitric oxide
- NO2, nitrogen dioxide
- NOx, nitrogen oxide
- O3, ozone
- PM, particulate matter
- SCORAD, SCORing Atopic Dermatitis
- SE, standard error
- THEA, Tower Hamlets Eczema Assessment
- VOC, volatile organic compound
Collapse
Affiliation(s)
- Bjorn R Thomas
- Centre for Cell Biology and Cutaneous Research, Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom.,Department of Dermatology, The Royal London Hospital, Barts Health NHS Trust, London, United Kingdom
| | - Xiang L Tan
- Centre for Cell Biology and Cutaneous Research, Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom
| | - Shagayegh Javadzadeh
- Centre for Cell Biology and Cutaneous Research, Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom
| | - Elizabeth J Robinson
- Department of Dermatology, The Royal London Hospital, Barts Health NHS Trust, London, United Kingdom
| | - Bryan S McDonald
- Centre for Cell Biology and Cutaneous Research, Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom.,Department of Dermatology, The Royal London Hospital, Barts Health NHS Trust, London, United Kingdom
| | - Malvina A Krupiczojc
- Centre for Cell Biology and Cutaneous Research, Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom.,Department of Dermatology, The Royal London Hospital, Barts Health NHS Trust, London, United Kingdom
| | - Syedia R Rahman
- Centre for Cell Biology and Cutaneous Research, Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom.,Department of Dermatology, The Royal London Hospital, Barts Health NHS Trust, London, United Kingdom
| | - Samiha Rahman
- Centre for Cell Biology and Cutaneous Research, Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom.,Department of Dermatology, The Royal London Hospital, Barts Health NHS Trust, London, United Kingdom
| | - Rehana A Ahmed
- Centre for Cell Biology and Cutaneous Research, Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom.,Department of Dermatology, The Royal London Hospital, Barts Health NHS Trust, London, United Kingdom
| | - Rubina Begum
- Centre for Cell Biology and Cutaneous Research, Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom.,Department of Dermatology, The Royal London Hospital, Barts Health NHS Trust, London, United Kingdom
| | - Habiba Khanam
- Centre for Cell Biology and Cutaneous Research, Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom.,Department of Dermatology, The Royal London Hospital, Barts Health NHS Trust, London, United Kingdom
| | - David P Kelsell
- Centre for Cell Biology and Cutaneous Research, Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom
| | - Jonathan Grigg
- Centre for Cell Biology and Cutaneous Research, Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom.,Department of Dermatology, The Royal London Hospital, Barts Health NHS Trust, London, United Kingdom
| | - Robert J Knell
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, United Kingdom
| | - Edel A O'Toole
- Centre for Cell Biology and Cutaneous Research, Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom.,Department of Dermatology, The Royal London Hospital, Barts Health NHS Trust, London, United Kingdom
| |
Collapse
|
15
|
Hansson K, Dahlgren A. Open research data repositories: Practices, norms, and metadata for sharing images. J Assoc Inf Sci Technol 2021. [DOI: 10.1002/asi.24571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Karin Hansson
- Department of Culture and Aesthetics Stockholm University Stockholm Sweden
| | - Anna Dahlgren
- Department of Culture and Aesthetics Stockholm University Stockholm Sweden
| |
Collapse
|
16
|
Porubsky V, Smith L, Sauro HM. Publishing reproducible dynamic kinetic models. Brief Bioinform 2021; 22:bbaa152. [PMID: 32793969 PMCID: PMC8138891 DOI: 10.1093/bib/bbaa152] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Revised: 05/19/2020] [Accepted: 06/17/2020] [Indexed: 11/14/2022] Open
Abstract
Publishing repeatable and reproducible computational models is a crucial aspect of the scientific method in computational biology and one that is often forgotten in the rush to publish. The pressures of academic life and the lack of any reward system at institutions, granting agencies and journals means that publishing reproducible science is often either non-existent or, at best, presented in the form of an incomplete description. In the article, we will focus on repeatability and reproducibility in the systems biology field where a great many published models cannot be reproduced and in many cases even repeated. This review describes the current landscape of software tooling, model repositories, model standards and best practices for publishing repeatable and reproducible kinetic models. The review also discusses possible future remedies including working more closely with journals to help reviewers and editors ensure that published kinetic models are at minimum, repeatable. Contact: hsauro@uw.edu.
Collapse
Affiliation(s)
- Veronica Porubsky
- Department of Bioengineering, University of Washington, Seattle, 98105,USA
| | - Lucian Smith
- Department of Bioengineering, University of Washington, Seattle, 98105,USA
| | - Herbert M Sauro
- Department of Bioengineering, University of Washington, Seattle, 98105,USA
| |
Collapse
|
17
|
Boyd C. Use of Optional Data Curation Features by Users of Harvard Dataverse Repository. JOURNAL OF ESCIENCE LIBRARIANSHIP 2021. [DOI: 10.7191/jeslib.2021.1191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
Objective: Investigate how different groups of depositors vary in their use of optional data curation features that provide support for FAIR research data in the Harvard Dataverse repository.
Methods: A numerical score based upon the presence or absence of characteristics associated with the use of optional features was assigned to each of the 29,295 datasets deposited in Harvard Dataverse between 2007 and 2019. Statistical analyses were performed to investigate patterns of optional feature use amongst different groups of depositors and their relationship to other dataset characteristics.
Results: Members of groups make greater use of Harvard Dataverse's optional features than individual researchers. Datasets that undergo a data curation review before submission to Harvard Dataverse, are associated with a publication, or contain restricted files also make greater use of optional features.
Conclusions: Individual researchers might benefit from increased outreach and improved documentation about the benefits and use of optional features to improve their datasets' level of curation beyond the FAIR-informed support that the Harvard Dataverse repository provides by default. Platform designers, developers, and managers may also use the numerical scoring approach to explore how different user groups use optional application features.
Collapse
|
18
|
Zeng T, Wu L, Bratt S, Acuna DE. Assigning credit to scientific datasets using article citation networks. J Informetr 2020. [DOI: 10.1016/j.joi.2020.101013] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
19
|
|
20
|
Dhillon V. Blockchain Based Peer-Review Interfaces for Digital Medicine. FRONTIERS IN BLOCKCHAIN 2020. [DOI: 10.3389/fbloc.2020.00008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
21
|
Which Are the Tools Available for Scholars? A Review of Assisting Software for Authors during Peer Reviewing Process. PUBLICATIONS 2019. [DOI: 10.3390/publications7030059] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
There is a large amount of Information Technology and Communication (ITC) tools that surround scholar activity. The prominent place of the peer-review process upon publication has promoted a crowded market of technological tools in several formats. Despite this abundance, many tools are unexploited or underused because they are not known by the academic community. In this study, we explored the availability and characteristics of the assisting tools for the peer-reviewing process. The aim was to provide a more comprehensive understanding of the tools available at this time, and to hint at new trends for further developments. The result of an examination of literature assisted the creation of a novel taxonomy of types of software available in the market. This new classification is divided into nine categories as follows: (I) Identification and social media, (II) Academic search engines, (III) Journal-abstract matchmakers, (IV) Collaborative text editors, (V) Data visualization and analysis tools, (VI) Reference management, (VII) Proofreading and plagiarism detection, (VIII) Data archiving, and (IX) Scientometrics and Altmetrics. Considering these categories and their defining traits, a curated list of 220 software tools was completed using a crowdfunded database (AlternativeTo) to identify relevant programs and ongoing trends and perspectives of tools developed and used by scholars.
Collapse
|
22
|
Misra BB, Langefeld CD, Olivier M, Cox LA. Integrated Omics: Tools, Advances, and Future Approaches. J Mol Endocrinol 2018; 62:JME-18-0055. [PMID: 30006342 DOI: 10.1530/jme-18-0055] [Citation(s) in RCA: 248] [Impact Index Per Article: 35.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2018] [Revised: 07/02/2018] [Accepted: 07/12/2018] [Indexed: 12/13/2022]
Abstract
With the rapid adoption of high-throughput omic approaches to analyze biological samples such as genomics, transcriptomics, proteomics, and metabolomics, each analysis can generate tera- to peta-byte sized data files on a daily basis. These data file sizes, together with differences in nomenclature among these data types, make the integration of these multi-dimensional omics data into biologically meaningful context challenging. Variously named as integrated omics, multi-omics, poly-omics, trans-omics, pan-omics, or shortened to just 'omics', the challenges include differences in data cleaning, normalization, biomolecule identification, data dimensionality reduction, biological contextualization, statistical validation, data storage and handling, sharing, and data archiving. The ultimate goal is towards the holistic realization of a 'systems biology' understanding of the biological question in hand. Commonly used approaches in these efforts are currently limited by the 3 i's - integration, interpretation, and insights. Post integration, these very large datasets aim to yield unprecedented views of cellular systems at exquisite resolution for transformative insights into processes, events, and diseases through various computational and informatics frameworks. With the continued reduction in costs and processing time for sample analyses, and increasing types of omics datasets generated such as glycomics, lipidomics, microbiomics, and phenomics, an increasing number of scientists in this interdisciplinary domain of bioinformatics face these challenges. We discuss recent approaches, existing tools, and potential caveats in the integration of omics datasets for development of standardized analytical pipelines that could be adopted by the global omics research community.
Collapse
Affiliation(s)
- Biswapriya B Misra
- B Misra, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Carl D Langefeld
- C Langefeld, Biostatistical Sciences, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Michael Olivier
- M Olivier, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Laura A Cox
- L Cox, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| |
Collapse
|
23
|
Misra BB. Updates on resources, software tools, and databases for plant proteomics in 2016-2017. Electrophoresis 2018; 39:1543-1557. [PMID: 29420853 DOI: 10.1002/elps.201700401] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2017] [Revised: 01/23/2018] [Accepted: 02/02/2018] [Indexed: 11/05/2022]
Abstract
Proteomics data processing, annotation, and analysis can often lead to major hurdles in large-scale high-throughput bottom-up proteomics experiments. Given the recent rise in protein-based big datasets being generated, efforts in in silico tool development occurrences have had an unprecedented increase; so much so, that it has become increasingly difficult to keep track of all the advances in a particular academic year. However, these tools benefit the plant proteomics community in circumventing critical issues in data analysis and visualization, as these continually developing open-source and community-developed tools hold potential in future research efforts. This review will aim to introduce and summarize more than 50 software tools, databases, and resources developed and published during 2016-2017 under the following categories: tools for data pre-processing and analysis, statistical analysis tools, peptide identification tools, databases and spectral libraries, and data visualization and interpretation tools. Intended for a well-informed proteomics community, finally, efforts in data archiving and validation datasets for the community will be discussed as well. Additionally, the author delineates the current and most commonly used proteomics tools in order to introduce novice readers to this -omics discovery platform.
Collapse
Affiliation(s)
- Biswapriya B Misra
- Department of Internal Medicine, Section of Molecular Medicine, Medical Center Boulevard, Winston-Salem, NC, USA
| |
Collapse
|
24
|
Peters I, Kraker P, Lex E, Gumpenberger C, Gorraiz JI. Zenodo in the Spotlight of Traditional and New Metrics. Front Res Metr Anal 2017. [DOI: 10.3389/frma.2017.00013] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
25
|
He L, Han Z. Do usage counts of scientific data make sense? An investigation of the Dryad repository. LIBRARY HI TECH 2017. [DOI: 10.1108/lht-12-2016-0158] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
The purpose of this paper is to evaluate the impact of scientific data in order to assess the reliability of data to support data curation, to establish trust between researchers to support reuse of digital data and encourage researchers to share more data.
Design/methodology/approach
The authors compared the correlations between usage counts of associated data in Dryad and citation counts of articles in Web of Science in different subject areas in order to assess the possibility of using altmetric indicators to evaluate scientific data.
Findings
There are high positive correlations between usage counts of data and citation counts of associated articles. The citation counts of article’s shared data are higher than the average citation counts in most of the subject areas examined by the authors.
Practical implications
The paper suggests that usage counts of data could be potentially used to evaluate scholarly impact of scientific data, especially for those subject areas without special data repositories.
Originality/value
The study examines the possibility to use usage counts to evaluate the impact of scientific data in a generic repository Dryad by different subject categories.
Collapse
|
26
|
Abstract
Purpose
The purpose of this paper is to highlight the problem of establishing metrics for the impact of research data when norms of behaviour have not yet become established.
Design/methodology/approach
The paper considers existing research into data citation and explores the citation of data journals.
Findings
The paper finds that the diversity of data and its citation precludes the drawing of any simple conclusions about how to measure the impact of data, and an over emphasis on metrics before norms of behaviour have become established may adversely affect the data ecosystem.
Originality/value
The paper considers multiple different types of data citation, including for the first time the citation of data journals.
Collapse
|
27
|
Abstract
Purpose
Data sharing is widely thought to help research quality and efficiency. Data sharing mandates are increasingly being adopted by journals and the purpose of this paper is to assess whether they work.
Design/methodology/approach
This study examines two evolutionary biology journals, Evolution and Heredity, that have data sharing mandates and make extensive use of Dryad. It uses a quantitative analysis of presence in Dryad, downloads and citations.
Findings
Within both journals, data sharing seems to be complete, showing that the mandates work on a technical level. Low correlations (0.15-0.18) between data downloads and article citation counts for articles published in 2012 within these journals indicate a weak relationship between data sharing and research impact. An average of 40-55 data downloads per article after a few years suggests that some use is found for shared life sciences data.
Research limitations/implications
The value of shared data uses is unclear.
Practical implications
Data sharing mandates should be encouraged as an effective strategy.
Originality/value
This is the first analysis of the effectiveness of data sharing mandates.
Collapse
|