1
|
Cunha-Oliveira T, Ioannidis JPA, Oliveira PJ. Best practices for data management and sharing in experimental biomedical research. Physiol Rev 2024; 104:1387-1408. [PMID: 38451234 DOI: 10.1152/physrev.00043.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 02/07/2024] [Accepted: 02/29/2024] [Indexed: 03/08/2024] Open
Abstract
Effective data management is crucial for scientific integrity and reproducibility, a cornerstone of scientific progress. Well-organized and well-documented data enable validation and building on results. Data management encompasses activities including organization, documentation, storage, sharing, and preservation. Robust data management establishes credibility, fostering trust within the scientific community and benefiting researchers' careers. In experimental biomedicine, comprehensive data management is vital due to the typically intricate protocols, extensive metadata, and large datasets. Low-throughput experiments, in particular, require careful management to address variations and errors in protocols and raw data quality. Transparent and accountable research practices rely on accurate documentation of procedures, data collection, and analysis methods. Proper data management ensures long-term preservation and accessibility of valuable datasets. Well-managed data can be revisited, contributing to cumulative knowledge and potential new discoveries. Publicly funded research has an added responsibility for transparency, resource allocation, and avoiding redundancy. Meeting funding agency expectations increasingly requires rigorous methodologies, adherence to standards, comprehensive documentation, and widespread sharing of data, code, and other auxiliary resources. This review provides critical insights into raw and processed data, metadata, high-throughput versus low-throughput datasets, a common language for documentation, experimental and reporting guidelines, efficient data management systems, sharing practices, and relevant repositories. We systematically present available resources and optimal practices for wide use by experimental biomedical researchers.
Collapse
Affiliation(s)
- Teresa Cunha-Oliveira
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
- Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
| | - John P A Ioannidis
- Meta-Research Innovation Center at Stanford (METRICS), Stanford, California, United States
- Department of Statistics, Stanford University, Stanford, California, United States
| | - Paulo J Oliveira
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
- Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
2
|
Ferrena A, Zheng XY, Jackson K, Hoang B, Morrow B, Zheng D. scDAPP: a comprehensive single-cell transcriptomics analysis pipeline optimized for cross-group comparison. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.06.592708. [PMID: 38766089 PMCID: PMC11100619 DOI: 10.1101/2024.05.06.592708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Single-cell transcriptomics profiling has increasingly been used to evaluate cross-group differences in cell population and cell-type gene expression. This often leads to large datasets with complex experimental designs that need advanced comparative analysis. Concurrently, bioinformatics software and analytic approaches also become more diverse and constantly undergo improvement. Thus, there is an increased need for automated and standardized data processing and analysis pipelines, which should be efficient and flexible too. To address these, we develop the single-cell Differential Analysis and Processing Pipeline (scDAPP), a R-based workflow for comparative analysis of single cell (or nucleus) transcriptomic data between two or more groups and at the levels of single cells or "pseudobulking" samples. The pipeline automates many steps of pre-processing using data-learnt parameters, uses previously benchmarked software, and generates comprehensive intermediate data and final results that are valuable for both beginners and experts of scRNA-seq analysis. Moreover, the analytic reports, augmented by extensive data visualization, increase the transparency of computational analysis and parameter choices, while facilitate users to go seamlessly from raw data to biological interpretation. Availability and Implementation: scDAPP is freely available for non-commercial usage as an R package under the MIT license. Source code, documentation and sample data are available at the GitHub (https://github.com/bioinfoDZ/scDAPP).
Collapse
Affiliation(s)
- Alexander Ferrena
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
- Institute for Clinical and Translational Research, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Xiang Yu Zheng
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Kevyn Jackson
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Bang Hoang
- Department of Orthopedic Surgery, Montefiore Medical Center, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Bernice Morrow
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
- Departments of Obstetrics and Gynecology, and Pediatrics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Deyou Zheng
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
- Department of Neurology, Albert Einstein College of Medicine, Bronx, NY, USA
- Department of Neuroscience, Albert Einstein College of Medicine, Bronx, NY, USA
| |
Collapse
|
3
|
Alser M, Lawlor B, Abdill RJ, Waymost S, Ayyala R, Rajkumar N, LaPierre N, Brito J, Ribeiro-Dos-Santos AM, Almadhoun N, Sarwal V, Firtina C, Osinski T, Eskin E, Hu Q, Strong D, Kim BDBD, Abedalthagafi MS, Mutlu O, Mangul S. Packaging and containerization of computational methods. Nat Protoc 2024:10.1038/s41596-024-00986-0. [PMID: 38565959 DOI: 10.1038/s41596-024-00986-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 02/12/2024] [Indexed: 04/04/2024]
Abstract
Methods for analyzing the full complement of a biomolecule type, e.g., proteomics or metabolomics, generate large amounts of complex data. The software tools used to analyze omics data have reshaped the landscape of modern biology and become an essential component of biomedical research. These tools are themselves quite complex and often require the installation of other supporting software, libraries and/or databases. A researcher may also be using multiple different tools that require different versions of the same supporting materials. The increasing dependence of biomedical scientists on these powerful tools creates a need for easier installation and greater usability. Packaging and containerization are different approaches to satisfy this need by delivering omics tools already wrapped in additional software that makes the tools easier to install and use. In this systematic review, we describe and compare the features of prominent packaging and containerization platforms. We outline the challenges, advantages and limitations of each approach and some of the most widely used platforms from the perspectives of users, software developers and system administrators. We also propose principles to make the distribution of omics software more sustainable and robust to increase the reproducibility of biomedical and life science research.
Collapse
Affiliation(s)
- Mohammed Alser
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Brendan Lawlor
- Department of Computer Science, Munster Technological University, Cork, Ireland
- Department of Biological Sciences, Munster Technological University, Cork, Ireland
| | - Richard J Abdill
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Sharon Waymost
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Ram Ayyala
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
- Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, USA
| | - Neha Rajkumar
- Department of Bioengineering, University of California, Los Angeles, Los Angeles, CA, USA
| | - Nathan LaPierre
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Jaqueline Brito
- Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, USA
| | | | - Nour Almadhoun
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Varuni Sarwal
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Can Firtina
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Tomasz Osinski
- Center for Advanced Research Computing, University of Southern California, Los Angeles, CA, USA
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, CA, USA
| | - Qiyang Hu
- Office of Advanced Research Computing, University of California, Los Angeles, CA, USA
| | - Derek Strong
- Center for Advanced Research Computing, University of Southern California, Los Angeles, CA, USA
| | - Byoung-Do B D Kim
- Center for Advanced Research Computing, University of Southern California, Los Angeles, CA, USA
| | - Malak S Abedalthagafi
- Department of Pathology & Laboratory Medicine, Emory University Hospital, Atlanta, GA, USA
- King Salman Center for Disability Research, Riyadh, Saudi Arabia
| | - Onur Mutlu
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Serghei Mangul
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
- Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
4
|
Alessandri S, Ratto ML, Rabellino S, Piacenti G, Contaldo SG, Pernice S, Beccuti M, Calogero RA, Alessandri L. CREDO: a friendly Customizable, REproducible, DOcker file generator for bioinformatics applications. BMC Bioinformatics 2024; 25:110. [PMID: 38475691 DOI: 10.1186/s12859-024-05695-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 02/09/2024] [Indexed: 03/14/2024] Open
Abstract
BACKGROUND The analysis of large and complex biological datasets in bioinformatics poses a significant challenge to achieving reproducible research outcomes due to inconsistencies and the lack of standardization in the analysis process. These issues can lead to discrepancies in results, undermining the credibility and impact of bioinformatics research and creating mistrust in the scientific process. To address these challenges, open science practices such as sharing data, code, and methods have been encouraged. RESULTS CREDO, a Customizable, REproducible, DOcker file generator for bioinformatics applications, has been developed as a tool to moderate reproducibility issues by building and distributing docker containers with embedded bioinformatics tools. CREDO simplifies the process of generating Docker images, facilitating reproducibility and efficient research in bioinformatics. The crucial step in generating a Docker image is creating the Dockerfile, which requires incorporating heterogeneous packages and environments such as Bioconductor and Conda. CREDO stores all required package information and dependencies in a Github-compatible format to enhance Docker image reproducibility, allowing easy image creation from scratch. The user-friendly GUI and CREDO's ability to generate modular Docker images make it an ideal tool for life scientists to efficiently create Docker images. Overall, CREDO is a valuable tool for addressing reproducibility issues in bioinformatics research and promoting open science practices.
Collapse
Affiliation(s)
| | - Maria L Ratto
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Turin, Italy
| | - Sergio Rabellino
- Department of Computer Science, University of Torino, Turin, Italy
| | - Gabriele Piacenti
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Turin, Italy
| | | | - Simone Pernice
- Department of Computer Science, University of Torino, Turin, Italy
| | - Marco Beccuti
- Department of Computer Science, University of Torino, Turin, Italy
| | - Raffaele A Calogero
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Turin, Italy.
| | - Luca Alessandri
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Turin, Italy
- Department of Pathology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
5
|
Petersen C, Mucke L, Corces MR. CHOIR improves significance-based detection of cell types and states from single-cell data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.18.576317. [PMID: 38328105 PMCID: PMC10849522 DOI: 10.1101/2024.01.18.576317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Clustering is a critical step in the analysis of single-cell data, as it enables the discovery and characterization of putative cell types and states. However, most popular clustering tools do not subject clustering results to statistical inference testing, leading to risks of overclustering or underclustering data and often resulting in ineffective identification of cell types with widely differing prevalence. To address these challenges, we present CHOIR (clustering hierarchy optimization by iterative random forests), which applies a framework of random forest classifiers and permutation tests across a hierarchical clustering tree to statistically determine which clusters represent distinct populations. We demonstrate the enhanced performance of CHOIR through extensive benchmarking against 14 existing clustering methods across 100 simulated and 4 real single-cell RNA-seq, ATAC-seq, spatial transcriptomic, and multi-omic datasets. CHOIR can be applied to any single-cell data type and provides a flexible, scalable, and robust solution to the important challenge of identifying biologically relevant cell groupings within heterogeneous single-cell data.
Collapse
Affiliation(s)
- Cathrine Petersen
- Gladstone Institute of Neurological Disease, Gladstone Institutes, San Francisco, CA, USA
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Lennart Mucke
- Gladstone Institute of Neurological Disease, Gladstone Institutes, San Francisco, CA, USA
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA
- Department of Neurology and Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - M. Ryan Corces
- Gladstone Institute of Neurological Disease, Gladstone Institutes, San Francisco, CA, USA
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA
- Department of Neurology and Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| |
Collapse
|
6
|
Samuel S, Mietchen D. Computational reproducibility of Jupyter notebooks from biomedical publications. Gigascience 2024; 13:giad113. [PMID: 38206590 PMCID: PMC10783158 DOI: 10.1093/gigascience/giad113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 08/09/2023] [Accepted: 12/08/2023] [Indexed: 01/12/2024] Open
Abstract
BACKGROUND Jupyter notebooks facilitate the bundling of executable code with its documentation and output in one interactive environment, and they represent a popular mechanism to document and share computational workflows, including for research publications. The reproducibility of computational aspects of research is a key component of scientific reproducibility but has not yet been assessed at scale for Jupyter notebooks associated with biomedical publications. APPROACH We address computational reproducibility at 2 levels: (i) using fully automated workflows, we analyzed the computational reproducibility of Jupyter notebooks associated with publications indexed in the biomedical literature repository PubMed Central. We identified such notebooks by mining the article's full text, trying to locate them on GitHub, and attempting to rerun them in an environment as close to the original as possible. We documented reproduction success and exceptions and explored relationships between notebook reproducibility and variables related to the notebooks or publications. (ii) This study represents a reproducibility attempt in and of itself, using essentially the same methodology twice on PubMed Central over the course of 2 years, during which the corpus of Jupyter notebooks from articles indexed in PubMed Central has grown in a highly dynamic fashion. RESULTS Out of 27,271 Jupyter notebooks from 2,660 GitHub repositories associated with 3,467 publications, 22,578 notebooks were written in Python, including 15,817 that had their dependencies declared in standard requirement files and that we attempted to rerun automatically. For 10,388 of these, all declared dependencies could be installed successfully, and we reran them to assess reproducibility. Of these, 1,203 notebooks ran through without any errors, including 879 that produced results identical to those reported in the original notebook and 324 for which our results differed from the originally reported ones. Running the other notebooks resulted in exceptions. CONCLUSIONS We zoom in on common problems and practices, highlight trends, and discuss potential improvements to Jupyter-related workflows associated with biomedical publications.
Collapse
Affiliation(s)
- Sheeba Samuel
- Heinz-Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University Jena, Jena 07743, Germany
- Michael Stifel Center Jena, Jena 07743, Germany
| | - Daniel Mietchen
- Ronin Institute, Montclair 07043-2314, NJ, United States
- Institute for Globally Distributed Open Research and Education (IGDORE)
- FIZ Karlsruhe—Leibniz Institute for Information Infrastructure, Berlin 76344, Germany
| |
Collapse
|
7
|
Johnson AL, Bouvette M, Rangu N, Morley T, Schultz A, Torgerson T, Vassar M. Data-Sharing Across Otolaryngology: Comparing Journal Policies and Their Adherence to the FAIR Principles. Ann Otol Rhinol Laryngol 2024; 133:105-110. [PMID: 37431814 DOI: 10.1177/00034894231185642] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2023]
Abstract
OBJECTIVE Data-sharing plays an essential role in advancing scientific understanding. Here, we aim to identify the commonalities and differences in data-sharing policies endorsed by otolaryngology journals and to assess their adherence to the FAIR (findable, accessible, interoperable, reusable) principles. METHODS Data-sharing policies were searched for among 111 otolaryngology journals, as listed by Scimago Journal & Country Rank. Policy extraction of the top biomedical journals as ranked by Google Scholar metrics were used as a comparison. The FAIR principles for scientific data management and stewardship were used for the extraction framework. This occurred in a blind, masked, and independent fashion. RESULTS Of the 111 ranked otolaryngology journals, 100 met inclusion criteria. Of those 100 journals, 79 provided data-sharing policies. There was a clear lack of standardization across policies, along with specific gaps in accessibility and reusability which need to be addressed. Seventy-two policies (of 79; 91%) designated that metadata should have globally unique and persistent identifiers. Seventy-one (of 79; 90%) policies specified that metadata should clearly include the identifier of the data they describe. Fifty-six policies (of 79; 71%) outlined that metadata should be richly described with a plurality of accurate and relevant attributes. CONCLUSION Otolaryngology journals have varying data-sharing policies, and adherence to the FAIR principles appears to be moderate. This calls for increased data transparency, allowing for results to be reproduced, confirmed, and debated.
Collapse
Affiliation(s)
- Austin L Johnson
- Department of Otolaryngology, The University of Texas Medical Branch, Galveston, TX, USA
| | - Max Bouvette
- University of Oklahoma College of Medicine, Oklahoma, OK, USA
| | - Nitin Rangu
- University of Oklahoma College of Medicine, Oklahoma, OK, USA
| | - Timothy Morley
- Alabama College of Osteopathic Medicine, Dothan, AL, USA
| | - Adam Schultz
- Oklahoma State University Center for Health Sciences, Tulsa, OK, USA
| | - Trevor Torgerson
- Department of Head and Neck Surgery & Communication Sciences, Duke University Medical Center, Durham, NC, USA
| | - Matt Vassar
- Oklahoma State University Center for Health Sciences, Tulsa, OK, USA
| |
Collapse
|
8
|
Post AR, Ho N, Rasmussen E, Post I, Cho A, Hofer J, Maness AT, Parnell T, Nix DA. Hypermedia-based software architecture enables Test-Driven Development. JAMIA Open 2023; 6:ooad089. [PMID: 37860604 PMCID: PMC10582517 DOI: 10.1093/jamiaopen/ooad089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 08/12/2023] [Accepted: 10/04/2023] [Indexed: 10/21/2023] Open
Abstract
Objectives Using agile software development practices, develop and evaluate an architecture and implementation for reliable and user-friendly self-service management of bioinformatic data stored in the cloud. Materials and methods Comprehensive Oncology Research Environment (CORE) Browser is a new open-source web application for cancer researchers to manage sequencing data organized in a flexible format in Amazon Simple Storage Service (S3) buckets. It has a microservices- and hypermedia-based architecture, which we integrated with Test-Driven Development (TDD), the iterative writing of computable specifications for how software should work prior to development. Relying on repeating patterns found in hypermedia-based architectures, we hypothesized that hypermedia would permit developing test "templates" that can be parameterized and executed for each microservice, maximizing code coverage while minimizing effort. Results After one-and-a-half years of development, the CORE Browser backend had 121 test templates and 875 custom tests that were parameterized and executed 3031 times, providing 78% code coverage. Discussion Architecting to permit test reuse through a hypermedia approach was a key success factor for our testing efforts. CORE Browser's application of hypermedia and TDD illustrates one way to integrate software engineering methods into data-intensive networked applications. Separating bioinformatic data management from analysis distinguishes this platform from others in bioinformatics and may provide stable data management while permitting analysis methods to advance more rapidly. Conclusion Software engineering practices are underutilized in informatics. Similar informatics projects will more likely succeed through application of good architecture and automated testing. Our approach is broadly applicable to data management tools involving cloud data storage.
Collapse
Affiliation(s)
- Andrew R Post
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT 84112, United States
| | - Nancy Ho
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
| | - Erik Rasmussen
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
| | - Ivan Post
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
| | - Aika Cho
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
| | - John Hofer
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
| | - Arthur T Maness
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
| | - Timothy Parnell
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
| | - David A Nix
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
| |
Collapse
|
9
|
Yang J, Liu Y, Shang J, Chen Q, Chen Q, Ren L, Zhang N, Yu Y, Li Z, Song Y, Yang S, Scherer A, Tong W, Hong H, Xiao W, Shi L, Zheng Y. The Quartet Data Portal: integration of community-wide resources for multiomics quality control. Genome Biol 2023; 24:245. [PMID: 37884999 PMCID: PMC10601216 DOI: 10.1186/s13059-023-03091-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 10/17/2023] [Indexed: 10/28/2023] Open
Abstract
The Quartet Data Portal facilitates community access to well-characterized reference materials, reference datasets, and related resources established based on a family of four individuals with identical twins from the Quartet Project. Users can request DNA, RNA, protein, and metabolite reference materials, as well as datasets generated across omics, platforms, labs, protocols, and batches. Reproducible analysis tools allow for objective performance assessment of user-submitted data, while interactive visualization tools support rapid exploration of reference datasets. A closed-loop "distribution-collection-evaluation-integration" workflow enables updates and integration of community-contributed multiomics data. Ultimately, this portal helps promote the advancement of reference datasets and multiomics quality control.
Collapse
Affiliation(s)
- Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, Guangdong, China
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qiaochu Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qingwang Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Zhihui Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yueqiang Song
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Shengpeng Yang
- Intelligent Storage, Alibaba Cloud, Alibaba Group, Hangzhou, Zhejiang, China
| | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Wenming Xiao
- Office of Oncological Diseases, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China.
- International Human Phenome Institutes (Shanghai), Shanghai, China.
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China.
| |
Collapse
|
10
|
Mezuk B, Zhong C, Firestone M. Integrative approaches to methods training for early-career scientists: Rationale and process evaluation of the first cohort of the Michigan Integrative Well-Being and Inequality Training Program. J Clin Transl Sci 2023; 7:e169. [PMID: 37588674 PMCID: PMC10425869 DOI: 10.1017/cts.2023.595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 06/10/2023] [Accepted: 07/12/2023] [Indexed: 08/18/2023] Open
Abstract
Background The Michigan Integrative Well-Being and Inequality (MIWI) Training Program aims to provide state-of-the-art, interdisciplinary training to enhance the methodological skills of early-career scientists interested in integrative approaches to understanding health disparities. The goals of this paper are to describe the scientific rationale and core design elements of MIWI, and to conduct a process evaluation of the first cohort of trainees (called "scholars") to complete this program. Methods Mixed methods process evaluation of program components and assessment of trainee skills and network development of the first cohort (n = 15 scholars). Results The program drew 57 applicants from a wide range of disciplines. Of the 15 scholars in the first cohort, 53% (n = 8) identified as an underrepresented minority, 60% (n = 9) were within 2 years of completing their terminal degree, and most (n = 11, 73%) were from a social/behavioral science discipline (e.g., social work, public health). In the post-program evaluation, scholars rated their improvement in a variety of skills on a one (not at all) to five (greatly improved) scale. Areas of greatest growth included being an interdisciplinary researcher (mean = 4.47), developing new research collaborations (mean = 4.53), and designing a research study related to integrative health (mean = 4.27). The qualitative process evaluation indicated that scholars reported a strong sense of community and that the program broadened their research networks. Conclusions These findings have implications for National Institutes of Health (NIH) efforts to train early-career scientists, particularly from underrepresented groups, working at the intersection of multiple disciplines and efforts to support the formation of research networks.
Collapse
Affiliation(s)
- Briana Mezuk
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Center for Research Group Dynamics, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Chuwen Zhong
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Monica Firestone
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
11
|
Soriano J, Belmonte-Tebar A, de la Casa-Esperon E. Synaptonemal & CO analyzer: A tool for synaptonemal complex and crossover analysis in immunofluorescence images. Front Cell Dev Biol 2023; 11:1005145. [PMID: 36743415 PMCID: PMC9894712 DOI: 10.3389/fcell.2023.1005145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 01/09/2023] [Indexed: 01/20/2023] Open
Abstract
During the formation of ova and sperm, homologous chromosomes get physically attached through the synaptonemal complex and exchange DNA at crossover sites by a process known as meiotic recombination. Chromosomes that do not recombine or have anomalous crossover distributions often separate poorly during the subsequent cell division and end up in abnormal numbers in ova or sperm, which can lead to miscarriage or developmental defects. Crossover numbers and distribution along the synaptonemal complex can be visualized by immunofluorescent microscopy. However, manual analysis of large numbers of cells is very time-consuming and a major bottleneck for recombination studies. Some image analysis tools have been created to overcome this situation, but they are not readily available, do not provide synaptonemal complex data, or do not tackle common experimental difficulties, such as overlapping chromosomes. To overcome these limitations, we have created and validated an open-source ImageJ macro routine that facilitates and speeds up the crossover and synaptonemal complex analyses in mouse chromosome spreads, as well as in other vertebrate species. It is free, easy to use and fulfills the recommendations for enhancing rigor and reproducibility in biomedical studies.
Collapse
Affiliation(s)
- Joaquim Soriano
- Centro Regional de Investigaciones Biomédicas (CRIB), Universidad de Castilla-La Mancha, Albacete, Spain
| | - Angela Belmonte-Tebar
- Centro Regional de Investigaciones Biomédicas (CRIB), Universidad de Castilla-La Mancha, Albacete, Spain
| | - Elena de la Casa-Esperon
- Centro Regional de Investigaciones Biomédicas (CRIB), Universidad de Castilla-La Mancha, Albacete, Spain,Biology of Cell Growth, Differentiation and Activation Group, Department of Inorganic and Organic Chemistry and Biochemistry, School of Pharmacy, Universidad de Castilla-La Mancha, Albacete, Spain,*Correspondence: Elena de la Casa-Esperon,
| |
Collapse
|
12
|
Teixeira da Silva JA. A Synthesis of the Formats for Correcting Erroneous and Fraudulent Academic Literature, and Associated Challenges. JOURNAL FOR GENERAL PHILOSOPHY OF SCIENCE = ZEITSCHRIFT FUR ALLGEMEINE WISSENSCHAFTSTHEORIE 2022; 53:583-599. [PMID: 35669840 PMCID: PMC9159037 DOI: 10.1007/s10838-022-09607-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 11/14/2021] [Accepted: 02/12/2022] [Indexed: 06/15/2023]
Abstract
UNLABELLED Academic publishing is undergoing a highly transformative process, and many established rules and value systems that are in place, such as traditional peer review (TPR) and preprints, are facing unprecedented challenges, including as a result of post-publication peer review. The integrity and validity of the academic literature continue to rely naively on blind trust, while TPR and preprints continue to fail to effectively screen out errors, fraud, and misconduct. Imperfect TPR invariably results in imperfect papers that have passed through varying levels of rigor of screening and validation. If errors or misconduct were not detected during TPR's editorial screening, but are detected at the post-publication stage, an opportunity is created to correct the academic record. Currently, the most common forms of correcting the academic literature are errata, corrigenda, expressions of concern, and retractions or withdrawals. Some additional measures to correct the literature have emerged, including manuscript versioning, amendments, partial retractions and retract and replace. Preprints can also be corrected if their version is updated. This paper discusses the risks, benefits and limitations of these forms of correcting the academic literature. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s10838-022-09607-4.
Collapse
|
13
|
Tumescheit C, Firth AE, Brown K. CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments. PeerJ 2022; 10:e12983. [PMID: 35310163 PMCID: PMC8932311 DOI: 10.7717/peerj.12983] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 02/01/2022] [Indexed: 01/11/2023] Open
Abstract
Background Throughout biology, multiple sequence alignments (MSAs) form the basis of much investigation into biological features and relationships. These alignments are at the heart of many bioinformatics analyses. However, sequences in MSAs are often incomplete or very divergent, which can lead to poor alignment and large gaps. This slows down computation and can impact conclusions without being biologically relevant. Cleaning the alignment by removing common issues such as gaps, divergent sequences, large insertions and deletions and poorly aligned sequence ends can substantially improve analyses. Manual editing of MSAs is very widespread but is time-consuming and difficult to reproduce. Results We present a comprehensive, user-friendly MSA trimming tool with multiple visualisation options. Our highly customisable command line tool aims to give intervention power to the user by offering various options, and outputs graphical representations of the alignment before and after processing to give the user a clear overview of what has been removed. The main functionalities of the tool include removing regions of low coverage due to insertions, removing gaps, cropping poorly aligned sequence ends and removing sequences that are too divergent or too short. The thresholds for each function can be specified by the user and parameters can be adjusted to each individual MSA. CIAlign is designed with an emphasis on solving specific and common alignment problems and on providing transparency to the user. Conclusion CIAlign effectively removes problematic regions and sequences from MSAs and provides novel visualisation options. This tool can be used to fine-tune alignments for further analysis and processing. The tool is aimed at anyone who wishes to automatically clean up parts of an MSA and those requiring a new, accessible way of visualising large MSAs.
Collapse
Affiliation(s)
| | - Andrew E. Firth
- Department of Pathology, University of Cambridge, Cambridge, United Kingdom
| | - Katherine Brown
- Department of Pathology, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
14
|
Yang J, Liu Y, Shang J, Huang Y, Yu Y, Li Z, Shi L, Ran Z. BioVisReport: A Markdown-based lightweight website builder for reproducible and interactive visualization of results from peer-reviewed publications. Comput Struct Biotechnol J 2022; 20:3133-3139. [PMID: 35782729 PMCID: PMC9233186 DOI: 10.1016/j.csbj.2022.06.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 06/05/2022] [Accepted: 06/05/2022] [Indexed: 11/18/2022] Open
Abstract
Interactive visualization is an effective way to promote the reproducibility of results presented in biomedical publications and to facilitate additional exploration of the reported data. However, there is a lack of convenient tools that balance reproducibility with ease of use. To address this problem, we develop BioVisReport, a lightweight solution for the rapid generation of an interactive website based on a user-defined Markdown file, which acts as a text markup language without requiring users to master complex syntax and allows them to preview the results in real-time. Interactive websites generated by the tool can help readers conveniently reproduce research findings and perform further in-depth analyses beyond those reported in the original peer-reviewed publications. Currently, BioVisReport offers 17 basic types of plots for visualizing published data. In addition, the extensibility of BioVisReport supports flexible integration of user-developed Python plugins with multiple programming languages. BioVisReport is freely available at https://biovis.report/.
Collapse
Affiliation(s)
- Jingcheng Yang
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, 2005 Songhu Road, Shanghai 200438, China
- Greater Bay Area Institute of Precision Medicine, 115 Jiaoxi Road, Guangzhou 511458, China
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, 2005 Songhu Road, Shanghai 200438, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, 2005 Songhu Road, Shanghai 200438, China
| | - Yechao Huang
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, 2005 Songhu Road, Shanghai 200438, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, 2005 Songhu Road, Shanghai 200438, China
| | - Zhihui Li
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, 2005 Songhu Road, Shanghai 200438, China
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, 2005 Songhu Road, Shanghai 200438, China
| | - Zihan Ran
- Department of Research, Shanghai University of Medicine & Health Sciences Affiliated Zhoupu Hospital, 1500 Zhouyuan Road, Shanghai 201318, China
- Inspection and Quarantine Department, The College of Medical Technology, Shanghai University of Medicine & Health Sciences, 279 Zhouzhu Road, Shanghai 201318, China
- Corresponding author at: Department of Research, Shanghai University of Medicine & Health Sciences Affiliated Zhoupu Hospital, 1500 Zhouyuan Road, Shanghai 201318, China.
| |
Collapse
|
15
|
Marini F, Ludt A, Linke J, Strauch K. GeneTonic: an R/Bioconductor package for streamlining the interpretation of RNA-seq data. BMC Bioinformatics 2021; 22:610. [PMID: 34949163 PMCID: PMC8697502 DOI: 10.1186/s12859-021-04461-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 10/26/2021] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND The interpretation of results from transcriptome profiling experiments via RNA sequencing (RNA-seq) can be a complex task, where the essential information is distributed among different tabular and list formats-normalized expression values, results from differential expression analysis, and results from functional enrichment analyses. A number of tools and databases are widely used for the purpose of identification of relevant functional patterns, yet often their contextualization within the data and results at hand is not straightforward, especially if these analytic components are not combined together efficiently. RESULTS We developed the GeneTonic software package, which serves as a comprehensive toolkit for streamlining the interpretation of functional enrichment analyses, by fully leveraging the information of expression values in a differential expression context. GeneTonic is implemented in R and Shiny, leveraging packages that enable HTML-based interactive visualizations for executing drilldown tasks seamlessly, viewing the data at a level of increased detail. GeneTonic is integrated with the core classes of existing Bioconductor workflows, and can accept the output of many widely used tools for pathway analysis, making this approach applicable to a wide range of use cases. Users can effectively navigate interlinked components (otherwise available as flat text or spreadsheet tables), bookmark features of interest during the exploration sessions, and obtain at the end a tailored HTML report, thus combining the benefits of both interactivity and reproducibility. CONCLUSION GeneTonic is distributed as an R package in the Bioconductor project ( https://bioconductor.org/packages/GeneTonic/ ) under the MIT license. Offering both bird's-eye views of the components of transcriptome data analysis and the detailed inspection of single genes, individual signatures, and their relationships, GeneTonic aims at simplifying the process of interpretation of complex and compelling RNA-seq datasets for many researchers with different expertise profiles.
Collapse
Affiliation(s)
- Federico Marini
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Obere Zahlbacher Str. 69, 55131 Mainz, Germany
- Center for Thrombosis and Hemostasis (CTH), University Medical Center of the Johannes Gutenberg University Mainz, Langenbeckstr. 1, 55131 Mainz, Germany
| | - Annekathrin Ludt
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Obere Zahlbacher Str. 69, 55131 Mainz, Germany
| | - Jan Linke
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Obere Zahlbacher Str. 69, 55131 Mainz, Germany
- Center for Thrombosis and Hemostasis (CTH), University Medical Center of the Johannes Gutenberg University Mainz, Langenbeckstr. 1, 55131 Mainz, Germany
| | - Konstantin Strauch
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Obere Zahlbacher Str. 69, 55131 Mainz, Germany
| |
Collapse
|
16
|
Grimes DR, Heathers J. The new normal? Redaction bias in biomedical science. ROYAL SOCIETY OPEN SCIENCE 2021; 8:211308. [PMID: 34966555 PMCID: PMC8633797 DOI: 10.1098/rsos.211308] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 11/01/2021] [Indexed: 06/14/2023]
Abstract
A concerning amount of biomedical research is not reproducible. Unreliable results impede empirical progress in medical science, ultimately putting patients at risk. Many proximal causes of this irreproducibility have been identified, a major one being inappropriate statistical methods and analytical choices by investigators. Within this, we formally quantify the impact of inappropriate redaction beyond a threshold value in biomedical science. This is effectively truncation of a dataset by removing extreme data points, and we elucidate its potential to accidentally or deliberately engineer a spurious result in significance testing. We demonstrate that the removal of a surprisingly small number of data points can be used to dramatically alter a result. It is unknown how often redaction bias occurs in the broader literature, but given the risk of distortion to the literature involved, we suggest that it must be studiously avoided, and mitigated with approaches to counteract any potential malign effects to the research quality of medical science.
Collapse
Affiliation(s)
- David Robert Grimes
- School of Physical Sciences, Dublin City University, Glasnevin, Dublin 9, Ireland
- Department of Oncology, University of Oxford, Oxford, Oxfordshire OX3 7DQ, UK
| | | |
Collapse
|
17
|
Chetnik K, Benedetti E, Gomari DP, Schweickart A, Batra R, Buyukozkan M, Wang Z, Arnold M, Zierer J, Suhre K, Krumsiek J. maplet: an extensible R toolbox for modular and reproducible metabolomics pipelines. Bioinformatics 2021; 38:1168-1170. [PMID: 34694386 PMCID: PMC8796365 DOI: 10.1093/bioinformatics/btab741] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 09/24/2021] [Accepted: 10/22/2021] [Indexed: 02/03/2023] Open
Abstract
This article presents maplet, an open-source R package for the creation of highly customizable, fully reproducible statistical pipelines for metabolomics data analysis. It builds on the SummarizedExperiment data structure to create a centralized pipeline framework for storing data, analysis steps, results and visualizations. maplet's key design feature is its modularity, which offers several advantages, such as ensuring code quality through the maintenance of individual functions and promoting collaborative development by removing technical barriers to code contribution. With over 90 functions, the package includes a wide range of functionalities, covering many widely used statistical approaches and data visualization techniques. AVAILABILITY AND IMPLEMENTATION The maplet package is implemented in R and freely available at https://github.com/krumsieklab/maplet.
Collapse
Affiliation(s)
- Kelsey Chetnik
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Elisa Benedetti
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Daniel P Gomari
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Neuherberg, Germany
| | - Annalise Schweickart
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Richa Batra
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Mustafa Buyukozkan
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Zeyu Wang
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Matthias Arnold
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Neuherberg, Germany
| | | | - Karsten Suhre
- Department of Physiology and Biophysics, Weill Cornell Medical College—Qatar Education City, Doha, Qatar
| | | |
Collapse
|
18
|
Peng K, Huang YN, Sarwal V, Alachkar H, Wong‐Beringer A, Mangul S. Integrating big data computational skills in education to facilitate reproducibility and transparency in pharmaceutical sciences. JOURNAL OF THE AMERICAN COLLEGE OF CLINICAL PHARMACY 2021. [DOI: 10.1002/jac5.1519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Kerui Peng
- Department of Clinical Pharmacy School of Pharmacy, University of Southern California Los Angeles California 90089 USA
| | - Yu Ning Huang
- Department of Clinical Pharmacy School of Pharmacy, University of Southern California Los Angeles California 90089 USA
| | - Varuni Sarwal
- Department of Computer Science University of California Los Angeles California 90095 USA
| | - Houda Alachkar
- Department of Clinical Pharmacy School of Pharmacy, University of Southern California Los Angeles California 90089 USA
| | - Annie Wong‐Beringer
- Department of Clinical Pharmacy School of Pharmacy, University of Southern California Los Angeles California 90089 USA
| | - Serghei Mangul
- Department of Clinical Pharmacy School of Pharmacy, University of Southern California Los Angeles California 90089 USA
| |
Collapse
|
19
|
Post AR, Luther J, Loveless JM, Ward M, Hewitt S. Enhancing research informatics core user satisfaction through agile practices. JAMIA Open 2021; 4:ooab103. [PMID: 34927001 PMCID: PMC8672926 DOI: 10.1093/jamiaopen/ooab103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 10/06/2021] [Accepted: 11/18/2021] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVE The Huntsman Cancer Institute Research Informatics Shared Resource (RISR), a software and database development core facility, sought to address a lack of published operational best practices for research informatics cores. It aimed to use those insights to enhance effectiveness after an increase in team size from 20 to 31 full-time equivalents coincided with a reduction in user satisfaction. MATERIALS AND METHODS RISR migrated from a water-scrum-fall model of software development to agile software development practices, which emphasize iteration and collaboration. RISR's agile implementation emphasizes the product owner role, which is responsible for user engagement and may be particularly valuable in software development that requires close engagement with users like in science. RESULTS All RISR's software development teams implemented agile practices in early 2020. All project teams are led by a product owner who serves as the voice of the user on the development team. Annual user survey scores for service quality and turnaround time recorded 9 months after implementation increased by 17% and 11%, respectively. DISCUSSION RISR is illustrative of the increasing size of research informatics cores and the need to identify best practices for maintaining high effectiveness. Agile practices may address concerns about the fit of software engineering practices in science. The study had one time point after implementing agile practices and one site, limiting its generalizability. CONCLUSIONS Agile software development may substantially increase a research informatics core facility's effectiveness and should be studied further as a potential best practice for how such cores are operated.
Collapse
Affiliation(s)
- Andrew R Post
- Research Informatics Shared Resource, Huntsman
Cancer Institute, University of Utah, Salt Lake City, Utah,
USA
- Department of Biomedical Informatics, University of
Utah, Salt Lake City, Utah, USA
| | - Jared Luther
- Research Informatics Shared Resource, Huntsman
Cancer Institute, University of Utah, Salt Lake City, Utah,
USA
| | - J Maxwell Loveless
- Research Administration, Huntsman Cancer Institute,
University of Utah, Salt Lake City, Utah, USA
| | - Melanie Ward
- Research Administration, Huntsman Cancer Institute,
University of Utah, Salt Lake City, Utah, USA
| | - Shirleen Hewitt
- Research Informatics Shared Resource, Huntsman
Cancer Institute, University of Utah, Salt Lake City, Utah,
USA
| |
Collapse
|
20
|
Leipzig J, Nüst D, Hoyt CT, Ram K, Greenberg J. The role of metadata in reproducible computational research. PATTERNS (NEW YORK, N.Y.) 2021; 2:100322. [PMID: 34553169 PMCID: PMC8441584 DOI: 10.1016/j.patter.2021.100322] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Reproducible computational research (RCR) is the keystone of the scientific method for in silico analyses, packaging the transformation of raw data to published results. In addition to its role in research integrity, improving the reproducibility of scientific studies can accelerate evaluation and reuse. This potential and wide support for the FAIR principles have motivated interest in metadata standards supporting reproducibility. Metadata provide context and provenance to raw data and methods and are essential to both discovery and validation. Despite this shared connection with scientific data, few studies have explicitly described how metadata enable reproducible computational research. This review employs a functional content analysis to identify metadata standards that support reproducibility across an analytic stack consisting of input data, tools, notebooks, pipelines, and publications. Our review provides background context, explores gaps, and discovers component trends of embeddedness and methodology weight from which we derive recommendations for future work.
Collapse
Affiliation(s)
- Jeremy Leipzig
- Metadata Research Center, College of Computing and Informatics, Drexel University, Philadelphia, PA, USA
| | - Daniel Nüst
- Institute for Geoinformatics, University of Münster, Münster, Germany
| | | | - Karthik Ram
- Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, CA, USA
| | - Jane Greenberg
- Metadata Research Center, College of Computing and Informatics, Drexel University, Philadelphia, PA, USA
| |
Collapse
|
21
|
Cervenka M, Pascual JM, Rho JM, Thiele E, Yellen G, Whittemore V, Hartman AL. Metabolism-based therapies for epilepsy: new directions for future cures. Ann Clin Transl Neurol 2021; 8:1730-1737. [PMID: 34247456 PMCID: PMC8351378 DOI: 10.1002/acn3.51423] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 06/28/2021] [Indexed: 12/22/2022] Open
Abstract
OBJECTIVE Thousands of years after dietary therapy was proposed to treat seizures, how alterations in metabolism relates to epilepsy remains unclear, and metabolism-based therapies are not always effective. METHODS We consider the state of the science in metabolism-based therapies for epilepsy across the research lifecycle from basic to translational to clinical studies. RESULTS This analysis creates a conceptual framework for creative, rigorous, and transparent research to benefit people with epilepsy through the understanding and modification of metabolism. INTERPRETATION Despite intensive past efforts to evaluate metabolism-based therapies for epilepsy, distinct ways of framing a problem offer the chance to engage different mindsets and new (or newly applied) technologies. A comprehensive, creative, and inclusive problem-directed research agenda is needed, with a renewed and stringent adherence to rigor and transparency across all levels of investigation.
Collapse
Affiliation(s)
- Mackenzie Cervenka
- Department of NeurologyJohns Hopkins University School of MedicineBaltimoreMarylandUSA
| | - Juan M. Pascual
- Department of NeurologyUniversity of Texas SouthwesternDallasTexasUSA
| | - Jong M. Rho
- Departments of Neurosciences and PediatricsUniversity of CaliforniaSan DiegoCaliforniaUSA
| | - Elizabeth Thiele
- Department of NeurologyHarvard Medical SchoolBostonMassachusettsUSA
| | - Gary Yellen
- Department of NeurobiologyHarvard Medical SchoolBostonMassachusettsUSA
| | - Vicky Whittemore
- National Institute of Neurological Disorders and StrokeNational Institutes of HealthRockvilleMarylandUSA
| | - Adam L. Hartman
- National Institute of Neurological Disorders and StrokeNational Institutes of HealthRockvilleMarylandUSA
| |
Collapse
|
22
|
Hauschild AC, Eick L, Wienbeck J, Heider D. Fostering reproducibility, reusability, and technology transfer in health informatics. iScience 2021; 24:102803. [PMID: 34296072 PMCID: PMC8282945 DOI: 10.1016/j.isci.2021.102803] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
Computational methods can transform healthcare. In particular, health informatics with artificial intelligence has shown tremendous potential when applied in various fields of medical research and has opened a new era for precision medicine. The development of reusable biomedical software for research or clinical practice is time-consuming and requires rigorous compliance with quality requirements as defined by international standards. However, research projects rarely implement such measures, hindering smooth technology transfer into the research community or manufacturers as well as reproducibility and reusability. Here, we present a guideline for quality management systems (QMS) for academic organizations incorporating the essential components while confining the requirements to an easily manageable effort. It provides a starting point to implement a QMS tailored to specific needs effortlessly and greatly facilitates technology transfer in a controlled manner, thereby supporting reproducibility and reusability. Ultimately, the emerging standardized workflows can pave the way for an accelerated deployment in clinical practice.
Collapse
Affiliation(s)
- Anne-Christin Hauschild
- Department of Data Science in Biomedicine, Faculty of Mathematics & Computer Science, Philipps University of Marburg, Hans-Meerwein-Strasse 6, Marburg, 35032, Germany
| | - Lisa Eick
- Department of Data Science in Biomedicine, Faculty of Mathematics & Computer Science, Philipps University of Marburg, Hans-Meerwein-Strasse 6, Marburg, 35032, Germany
| | - Joachim Wienbeck
- Department of Data Science in Biomedicine, Faculty of Mathematics & Computer Science, Philipps University of Marburg, Hans-Meerwein-Strasse 6, Marburg, 35032, Germany
| | - Dominik Heider
- Department of Data Science in Biomedicine, Faculty of Mathematics & Computer Science, Philipps University of Marburg, Hans-Meerwein-Strasse 6, Marburg, 35032, Germany
| |
Collapse
|
23
|
Righelli D, Angelini C. Easyreporting simplifies the implementation of Reproducible Research layers in R software. PLoS One 2021; 16:e0244122. [PMID: 33970927 PMCID: PMC8109797 DOI: 10.1371/journal.pone.0244122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 04/20/2021] [Indexed: 11/19/2022] Open
Abstract
During last years "irreproducibility" became a general problem in omics data analysis due to the use of sophisticated and poorly described computational procedures. For avoiding misleading results, it is necessary to inspect and reproduce the entire data analysis as a unified product. Reproducible Research (RR) provides general guidelines for public access to the analytic data and related analysis code combined with natural language documentation, allowing third-parties to reproduce the findings. We developed easyreporting, a novel R/Bioconductor package, to facilitate the implementation of an RR layer inside reports/tools. We describe the main functionalities and illustrate the organization of an analysis report using a typical case study concerning the analysis of RNA-seq data. Then, we show how to use easyreporting in other projects to trace R functions automatically. This latter feature helps developers to implement procedures that automatically keep track of the analysis steps. Easyreporting can be useful in supporting the reproducibility of any data analysis project and shows great advantages for the implementation of R packages and GUIs. It turns out to be very helpful in bioinformatics, where the complexity of the analyses makes it extremely difficult to trace all the steps and parameters used in the study.
Collapse
Affiliation(s)
- Dario Righelli
- Department of Statistical Sciences, University of Padova, Padua, Italy
- Istituto per le Applicazioni del Calcolo “Mauro Picone”, National Research Council, Naples, Italy
- * E-mail: (DR); (CA)
| | - Claudia Angelini
- Istituto per le Applicazioni del Calcolo “Mauro Picone”, National Research Council, Naples, Italy
- * E-mail: (DR); (CA)
| |
Collapse
|
24
|
Samuel S, König-Ries B. Understanding experiments and research practices for reproducibility: an exploratory study. PeerJ 2021; 9:e11140. [PMID: 33976964 PMCID: PMC8067906 DOI: 10.7717/peerj.11140] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 03/01/2021] [Indexed: 11/20/2022] Open
Abstract
Scientific experiments and research practices vary across disciplines. The research practices followed by scientists in each domain play an essential role in the understandability and reproducibility of results. The "Reproducibility Crisis", where researchers find difficulty in reproducing published results, is currently faced by several disciplines. To understand the underlying problem in the context of the reproducibility crisis, it is important to first know the different research practices followed in their domain and the factors that hinder reproducibility. We performed an exploratory study by conducting a survey addressed to researchers representing a range of disciplines to understand scientific experiments and research practices for reproducibility. The survey findings identify a reproducibility crisis and a strong need for sharing data, code, methods, steps, and negative and positive results. Insufficient metadata, lack of publicly available data, and incomplete information in study methods are considered to be the main reasons for poor reproducibility. The survey results also address a wide number of research questions on the reproducibility of scientific results. Based on the results of our explorative study and supported by the existing published literature, we offer general recommendations that could help the scientific community to understand, reproduce, and reuse experimental data and results in the research data lifecycle.
Collapse
Affiliation(s)
- Sheeba Samuel
- Heinz Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University Jena, Jena, Thuringia, Germany
- Michael Stifel Center Jena, Jena, Thuringia, Germany
| | - Birgitta König-Ries
- Heinz Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University Jena, Jena, Thuringia, Germany
- Michael Stifel Center Jena, Jena, Thuringia, Germany
| |
Collapse
|
25
|
Rajesh A, Chang Y, Abedalthagafi MS, Wong-Beringer A, Love MI, Mangul S. Improving the completeness of public metadata accompanying omics studies. Genome Biol 2021; 22:106. [PMID: 33858487 PMCID: PMC8048353 DOI: 10.1186/s13059-021-02332-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 03/29/2021] [Indexed: 12/17/2022] Open
Affiliation(s)
- Anushka Rajesh
- Department of Pharmacology and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA 90089 USA
| | - Yutong Chang
- Department of Pharmacology and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA 90089 USA
| | - Malak S. Abedalthagafi
- Genomics Research Department, King Fahad Medical City and King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia
| | - Annie Wong-Beringer
- Department of Clinical Pharmacy, University of Southern California, Los Angeles, CA 90089 USA
| | - Michael I. Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516 USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514 USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, University of Southern California, Los Angeles, CA 90089 USA
| |
Collapse
|
26
|
Del Prete E, Facchiano A, Profumo A, Angelini C, Romano P. GeenaR: A Web Tool for Reproducible MALDI-TOF Analysis. Front Genet 2021; 12:635814. [PMID: 33854526 PMCID: PMC8039533 DOI: 10.3389/fgene.2021.635814] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 03/01/2021] [Indexed: 12/21/2022] Open
Abstract
Mass spectrometry is a widely applied technology with a strong impact in the proteomics field. MALDI-TOF is a combined technology in mass spectrometry with many applications in characterizing biological samples from different sources, such as the identification of cancer biomarkers, the detection of food frauds, the identification of doping substances in athletes’ fluids, and so on. The massive quantity of data, in the form of mass spectra, are often biased and altered by different sources of noise. Therefore, extracting the most relevant features that characterize the samples is often challenging and requires combining several computational methods. Here, we present GeenaR, a novel web tool that provides a complete workflow for pre-processing, analyzing, visualizing, and comparing MALDI-TOF mass spectra. GeenaR is user-friendly, provides many different functionalities for the analysis of the mass spectra, and supports reproducible research since it produces a human-readable report that contains function parameters, results, and the code used for processing the mass spectra. First, we illustrate the features available in GeenaR. Then, we describe its internal structure. Finally, we prove its capabilities in analyzing oncological datasets by presenting two case studies related to ovarian cancer and colorectal cancer. GeenaR is available at http://proteomics.hsanmartino.it/geenar/.
Collapse
Affiliation(s)
- Eugenio Del Prete
- Institute for Applied Mathematics, National Research Council, Naples, Italy
| | - Angelo Facchiano
- Institute of Food Sciences, National Research Council, Avellino, Italy
| | - Aldo Profumo
- Proteomica e Spettrometria di Massa, IRCCS Ospedale Policlinico San Martino IST, Genova, Italy
| | - Claudia Angelini
- Institute for Applied Mathematics, National Research Council, Naples, Italy
| | - Paolo Romano
- Proteomica e Spettrometria di Massa, IRCCS Ospedale Policlinico San Martino IST, Genova, Italy
| |
Collapse
|
27
|
Roberts JM, Rich-Edwards JW, McElrath TF, Garmire L, Myatt L. Subtypes of Preeclampsia: Recognition and Determining Clinical Usefulness. Hypertension 2021; 77:1430-1441. [PMID: 33775113 DOI: 10.1161/hypertensionaha.120.14781] [Citation(s) in RCA: 92] [Impact Index Per Article: 30.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The concept that preeclampsia is a multisystemic syndrome is appreciated in both research and clinical care. Our understanding of pathophysiology recognizes the role of inflammation, oxidative and endoplasm reticulum stress, and angiogenic dysfunction. Yet, we have not progressed greatly toward clinically useful prediction nor had substantial success in prevention or treatment. One possibility is that the maternal syndrome may be reached through different pathophysiological pathways, that is, subtypes of preeclampsia, that in their specificity yield more clinical utility. For example, early and late onset preeclampsia are increasingly acknowledged as different pathophysiological processes leading to a common presentation. Other subtypes of preeclampsia are supported by disparate clinical outcomes, long-range prognosis, organ systems involved, and risk factors. These insights have been supplemented by discovery-driven methods, which cluster preeclampsia cases into groups indicating different pathophysiologies. In this presentation, we review likely subtypes based on current knowledge and suggest others. We present a consideration of the requirements for a clinically meaningful preeclampsia subtype. A useful subtype should (1) identify a specific pathophysiological pathway or (2) specifically indicate maternal or fetal outcome, (3) be recognizable in a clinically useful time frame, and (4) these results should be reproducible and generalizable (but at varying frequency) including in low resource settings. We recommend that the default consideration be that preeclampsia includes several subtypes rather than trying to force all cases into a single pathophysiological pathway. The recognition of subtypes and deciphering their different pathophysiologies will provide specific targets for prevention, prediction, and treatment directing personalized care.
Collapse
Affiliation(s)
- James M Roberts
- Magee-Womens Research Institute, Department of Obstetrics Gynecology and Reproductive Sciences, Epidemiology and Clinical and Translational Research, University of Pittsburgh (J.M.R.)
| | - Janet W Rich-Edwards
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Harvard University, Boston, MA (J.W.R.-E.).,Division of Women's Health, Department of Medicine (J.W.R.-E.), Brigham and Women's Hospital and Harvard Medical School, Boston, MA
| | - Thomas F McElrath
- Division of Maternal-Fetal Medicine (T.F.M.), Brigham and Women's Hospital and Harvard Medical School, Boston, MA
| | - Lana Garmire
- Department of Computational Medicine and Bioinformatics, Medical School, University of Michigan (L.G.)
| | - Leslie Myatt
- Department of Obstetrics and Gynecology, Moore Institute of Nutrition and Wellness, Oregon Health and Science University (L.M.)
| | | |
Collapse
|
28
|
Haendel MA, Chute CG, Bennett TD, Eichmann DA, Guinney J, Kibbe WA, Payne PRO, Pfaff ER, Robinson PN, Saltz JH, Spratt H, Suver C, Wilbanks J, Wilcox AB, Williams AE, Wu C, Blacketer C, Bradford RL, Cimino JJ, Clark M, Colmenares EW, Francis PA, Gabriel D, Graves A, Hemadri R, Hong SS, Hripscak G, Jiao D, Klann JG, Kostka K, Lee AM, Lehmann HP, Lingrey L, Miller RT, Morris M, Murphy SN, Natarajan K, Palchuk MB, Sheikh U, Solbrig H, Visweswaran S, Walden A, Walters KM, Weber GM, Zhang XT, Zhu RL, Amor B, Girvin AT, Manna A, Qureshi N, Kurilla MG, Michael SG, Portilla LM, Rutter JL, Austin CP, Gersing KR. The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment. J Am Med Inform Assoc 2021; 28:427-443. [PMID: 32805036 PMCID: PMC7454687 DOI: 10.1093/jamia/ocaa196] [Citation(s) in RCA: 285] [Impact Index Per Article: 95.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 08/14/2020] [Indexed: 01/12/2023] Open
Abstract
Objective Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. Materials and Methods The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. Results Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. Conclusions The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19.
Collapse
Affiliation(s)
- Melissa A Haendel
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, Oregon, USA.,Translational and Integrative Sciences Center, Department of Molecular Toxicology, Oregon State University, Corvallis, Oregon, USA
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, Maryland, USA
| | - Tellen D Bennett
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora, Colorado, USA
| | - David A Eichmann
- School of Library and Information Science, The University of Iowa, Iowa City, Iowa, USA
| | | | | | - Philip R O Payne
- Institute for Informatics, Washington University in St. Louis, Saint Louis,Missouri, USA
| | - Emily R Pfaff
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | | | - Joel H Saltz
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA
| | - Heidi Spratt
- University of Texas Medical Branch, Galveston, Texas, USA
| | | | | | | | - Andrew E Williams
- Tufts Medical Center Clinical and Translational Science Institute, Tufts Medical Center, Boston,Massachusetts, USA
| | - Chunlei Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, California, USA
| | - Clair Blacketer
- Janssen Research and Development, LLC, Raritan, New Jersey, USA
| | - Robert L Bradford
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | - James J Cimino
- University of Alabama-Birmingham, Birmingham, Alabama, USA
| | - Marshall Clark
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | - Evan W Colmenares
- Department of Pharmaceutical Outcomes and Policy, University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | | | - Davera Gabriel
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Alexis Graves
- University of Iowa Institute for Clinical and Translational Science, The University of Iowa, Iowa City, Iowa, USA
| | - Raju Hemadri
- National Center for Advancing Translational Science, Bethesda, Maryland, USA
| | - Stephanie S Hong
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - George Hripscak
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Dazhi Jiao
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | | | | | - Adam M Lee
- University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | - Harold P Lehmann
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | | | - Robert T Miller
- Tufts Clinical and Translational Science Institute, Tufts University, Boston,Massachusetts, USA
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh,Pennsylvania, USA
| | | | | | | | - Usman Sheikh
- National Center for Advancing Translational Science, Bethesda, Maryland, USA
| | - Harold Solbrig
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh,Pennsylvania, USA
| | - Anita Walden
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, Oregon, USA.,Sage Bionetworks, Seattle, Washington, USA
| | - Kellie M Walters
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | - Griffin M Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston,Massachusetts, USA
| | | | - Richard L Zhu
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | | | | | - Amin Manna
- Palantir Technologies, Palo Alto, California, USA
| | | | - Michael G Kurilla
- Division of Clinical Innovation, National Center for Advancing Translational Science, Bethesda, Maryland, USA
| | - Sam G Michael
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland, USA
| | - Lili M Portilla
- Office of Strategic Alliances, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland, USA
| | - Joni L Rutter
- Office of the Director, National Center for Advancing Translational Science, Bethesda, Maryland, USA
| | - Christopher P Austin
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland, USA
| | - Ken R Gersing
- National Center for Advancing Translational Science, Bethesda, Maryland, USA
| | | |
Collapse
|
29
|
Ten simple rules for writing a paper about scientific software. PLoS Comput Biol 2020; 16:e1008390. [PMID: 33180774 PMCID: PMC7660560 DOI: 10.1371/journal.pcbi.1008390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Papers describing software are an important part of computational fields of scientific research. These “software papers” are unique in a number of ways, and they require special consideration to improve their impact on the scientific community and their efficacy at conveying important information. Here, we discuss 10 specific rules for writing software papers, covering some of the different scenarios and publication types that might be encountered, and important questions from which all computational researchers would benefit by asking along the way. Computational researchers have a responsibility to ensure that the software they write stands up to the same scientific scrutiny as traditional research studies. These 10 simple rules make doing so easier by enhancing usability, reproducibility, transparency, and other crucial characteristics that aren’t taught in most computer science or research methods curricula.
Collapse
|