1
|
Chilimoniuk J, Erol A, Rödiger S, Burdukiewicz M. Challenges and opportunities in processing NanoString nCounter data. Comput Struct Biotechnol J 2024; 23:1951-1958. [PMID: 38736697 PMCID: PMC11087919 DOI: 10.1016/j.csbj.2024.04.061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 04/25/2024] [Accepted: 04/25/2024] [Indexed: 05/14/2024] Open
Abstract
NanoString nCounter is a medium-throughput technology used in mRNA and miRNA differential expression studies. It offers several advantages, including the absence of an amplification step and the ability to analyze low-grade samples. Despite its considerable strengths, the popularity of the nCounter platform in experimental research stabilized in 2022 and 2023, and this trend may continue in the upcoming years. Such stagnation could potentially be attributed to the absence of a standardized analytical pipeline or the indication of optimal processing methods for nCounter data analysis. To standardize the description of the nCounter data analysis workflow, we divided it into five distinct steps: data pre-processing, quality control, background correction, normalization and differential expression analysis. Next, we evaluated eleven R packages dedicated to nCounter data processing to point out functionalities belonging to these steps and provide comments on their applications in studies of mRNA and miRNA samples.
Collapse
Affiliation(s)
| | - Anna Erol
- Clinical Research Centre, Medical University of Białystok, Białystok, Poland
| | - Stefan Rödiger
- Institute of Biotechnology, Faculty Environment and Natural Sciences, Brandenburg University of Technology Cottbus - Senftenberg, Senftenberg, Germany
| | - Michał Burdukiewicz
- Clinical Research Centre, Medical University of Białystok, Białystok, Poland
- Institute of Biotechnology and Biomedicine, Autonomous University of Barcelona, Barcelona, Spain
| |
Collapse
|
2
|
Szczepanik M, Wagner AS, Heunis S, Waite LK, Eickhoff SB, Hanke M. Teaching Research Data Management with DataLad: A Multi-year, Multi-domain Effort. Neuroinformatics 2024:10.1007/s12021-024-09665-7. [PMID: 38713426 DOI: 10.1007/s12021-024-09665-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/22/2024] [Indexed: 05/08/2024]
Abstract
Research data management has become an indispensable skill in modern neuroscience. Researchers can benefit from following good practices as well as from having proficiency in using particular software solutions. But as these domain-agnostic skills are commonly not included in domain-specific graduate education, community efforts increasingly provide early career scientists with opportunities for organised training and materials for self-study. Investing effort in user documentation and interacting with the user base can, in turn, help developers improve quality of their software. In this work, we detail and evaluate our multi-modal teaching approach to research data management in the DataLad ecosystem, both in general and with concrete software use. Spanning an online and printed handbook, a modular course suitable for in-person and virtual teaching, and a flexible collection of research data management tips in a knowledge base, our free and open source collection of training material has made research data management and software training available to various different stakeholders over the past five years.
Collapse
Affiliation(s)
- Michał Szczepanik
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Research Center Jülich, Jülich, Germany.
| | - Adina S Wagner
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Research Center Jülich, Jülich, Germany
| | - Stephan Heunis
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Research Center Jülich, Jülich, Germany
| | - Laura K Waite
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Research Center Jülich, Jülich, Germany
| | - Simon B Eickhoff
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Research Center Jülich, Jülich, Germany
- Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Michael Hanke
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Research Center Jülich, Jülich, Germany
- Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| |
Collapse
|
3
|
Wheeler J, Rosengart A, Jiang Z, Tan K, Treutle N, Ionides EL. Informing policy via dynamic models: Cholera in Haiti. PLoS Comput Biol 2024; 20:e1012032. [PMID: 38683863 PMCID: PMC11081515 DOI: 10.1371/journal.pcbi.1012032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 05/09/2024] [Accepted: 03/29/2024] [Indexed: 05/02/2024] Open
Abstract
Public health decisions must be made about when and how to implement interventions to control an infectious disease epidemic. These decisions should be informed by data on the epidemic as well as current understanding about the transmission dynamics. Such decisions can be posed as statistical questions about scientifically motivated dynamic models. Thus, we encounter the methodological task of building credible, data-informed decisions based on stochastic, partially observed, nonlinear dynamic models. This necessitates addressing the tradeoff between biological fidelity and model simplicity, and the reality of misspecification for models at all levels of complexity. We assess current methodological approaches to these issues via a case study of the 2010-2019 cholera epidemic in Haiti. We consider three dynamic models developed by expert teams to advise on vaccination policies. We evaluate previous methods used for fitting these models, and we demonstrate modified data analysis strategies leading to improved statistical fit. Specifically, we present approaches for diagnosing model misspecification and the consequent development of improved models. Additionally, we demonstrate the utility of recent advances in likelihood maximization for high-dimensional nonlinear dynamic models, enabling likelihood-based inference for spatiotemporal incidence data using this class of models. Our workflow is reproducible and extendable, facilitating future investigations of this disease system.
Collapse
Affiliation(s)
- Jesse Wheeler
- Statistics Department, University of Michigan, Ann Arbor, Michigan, United States of America
| | - AnnaElaine Rosengart
- Statistics and Data Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Zhuoxun Jiang
- Statistics Department, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Kevin Tan
- Wharton Statistics and Data Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Noah Treutle
- Statistics Department, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Edward L. Ionides
- Statistics Department, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
4
|
Cook RJ, Lawless JF. Statistical and Scientific Considerations Concerning the Interpretation, Replicability, and Transportability of Research Findings. J Rheumatol 2024; 51:117-129. [PMID: 37967911 DOI: 10.3899/jrheum.2023-0499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/01/2023] [Indexed: 11/17/2023]
Abstract
To advance scientific understanding of disease processes and related intervention effects, study results should be free from bias and replicable. More broadly, investigators seek results that are transportable, that is, applicable to a perceived study population as well as in other environments and populations. We review fundamental statistical issues that arise in the analysis of observational data from disease cohorts and other sources and discuss how these issues affect the transportability and replicability of research results. Much of the literature focuses on estimating average exposure or intervention effects at the population level, but we argue for more nuanced analyses of conditional effects that reflect the complexity of disease processes.
Collapse
Affiliation(s)
- Richard J Cook
- R.J. Cook, PhD, J.F. Lawless, PhD, Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada.
| | - Jerald F Lawless
- R.J. Cook, PhD, J.F. Lawless, PhD, Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada
| |
Collapse
|
5
|
Siraji MA, Rahman M. Primer on Reproducible Research in R: Enhancing Transparency and Scientific Rigor. Clocks Sleep 2023; 6:1-10. [PMID: 38534796 DOI: 10.3390/clockssleep6010001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 11/27/2023] [Accepted: 12/13/2023] [Indexed: 03/28/2024] Open
Abstract
Achieving research reproducibility is a precarious aspect of scientific practice. However, many studies across disciplines fail to be fully reproduced due to inadequate dissemination methods. Traditional publication practices often fail to provide a comprehensive description of the research context and procedures, hindering reproducibility. To address these challenges, this article presents a tutorial on reproducible research using the R programming language. The tutorial aims to equip researchers, including those with limited coding knowledge, with the necessary skills to enhance reproducibility in their work. It covers three essential components: version control using Git, dynamic document creation using rmarkdown, and managing R package dependencies with renv. The tutorial also provides insights into sharing reproducible research and offers specific considerations for the field of sleep and chronobiology research. By following the tutorial, researchers can adopt practices that enhance the transparency, rigor, and replicability of their work, contributing to a culture of reproducible research and advancing scientific knowledge.
Collapse
Affiliation(s)
- Mushfiqul Anwar Siraji
- Department of Psychology, Jeffery Cheah School of Medicine and Health Science, Monash University Malaysia, Jalan Lagoon Selatan, Bandar Sunway, Selangor Darul Ehsan 47500, Malaysia
- Department of History and Psychology, School of Humanities and Social Sciences, North South University, Dhaka 1229, Bangladesh
| | - Munia Rahman
- Department of Psychology, University of Dhaka, Dhaka 1000, Bangladesh
| |
Collapse
|
6
|
Xiong X, Cribben I. The state of play of reproducibility in Statistics: an empirical analysis. AM STAT 2022. [DOI: 10.1080/00031305.2022.2131625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Affiliation(s)
- Xin Xiong
- Department of Biostatistics, Harvard T. H. Chan School of Public Health
| | - Ivor Cribben
- Department of Accounting and Business Analytics, Alberta School of Business, University of Alberta
| |
Collapse
|
7
|
Pouwels XGLV, Sampson CJ, Arnold RJG. Opportunities and Barriers to the Development and Use of Open Source Health Economic Models: A Survey. VALUE IN HEALTH : THE JOURNAL OF THE INTERNATIONAL SOCIETY FOR PHARMACOECONOMICS AND OUTCOMES RESEARCH 2022; 25:473-479. [PMID: 35365297 DOI: 10.1016/j.jval.2021.10.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 09/02/2021] [Accepted: 10/05/2021] [Indexed: 06/14/2023]
Abstract
OBJECTIVES Health economic (HE) models are routinely used to support health policy and resource allocation decisions but are often considered "black boxes" that may be prone to error and bias. Open source models (OSMs) have been advocated to increase the transparency, credibility, and reuse of HE models. Previous studies have demonstrated interest in OSMs among the health economics and outcomes research community, but the number of OSMs remains low. METHODS We conducted an online survey of ISPOR (the leading professional society for health economics and outcomes research) members' perspectives on the usefulness of OSMs and barriers to their development and implementation. RESULTS Respondents (N = 230) included academics (27%), pharmaceutical (or related) industry representatives (23%), health research or consulting representatives (21%), governmental or nonprofit agency representatives (10%), and others (19%). Respondents were generally not familiar with barriers to the development and adoption of OSMs. Most agreed that OSMs would improve transparency (92%), efficiency (76%), and HE model reuse (86%) and promote confidence in using HE models (75%). The use of OSMs by health technology assessment authorities was considered a very important indicator of the usefulness of OSMs by 49% of respondents. Three-quarters of respondents perceived legal concerns and the ability to transfer data as important barriers to the development and use of OSMs. CONCLUSIONS Respondents believe that OSMs could increase the transparency, efficiency, and credibility of HE models, but that several barriers hamper their widespread adoption. Our results suggest that fundamental changes may be needed across the health economics and outcomes research community if OSMs are to become widely adopted.
Collapse
Affiliation(s)
- Xavier G L V Pouwels
- Department of Health Technology and Services Research, Faculty of Behavioural, Management, and Social Sciences, University of Twente, Enschede, The Netherlands
| | | | - Renée J G Arnold
- National Institutes of Health/National Heart, Lung, and Blood Institute, Bethesda, MD, USA; Master of Public Health Program, Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Arnold Consultancy & Technology, LLC, New York, NY, USA.
| |
Collapse
|
8
|
Peer L, Biniossek C, Betz D, Christian TM. Reproducible Research Publication Workflow: A Canonical Workflow Framework and FAIR Digital Object Approach to Quality Research Output. DATA INTELLIGENCE 2022. [DOI: 10.1162/dint_a_00133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Abstract
In this paper we present the Reproducible Research Publication Workflow (RRPW) as an example of how generic canonical workflows can be applied to a specific context. The RRPW includes essential steps between submission and final publication of the manuscript and the research artefacts (i.e., data, code, etc.) that underlie the scholarly claims in the manuscript. A key aspect of the RRPW is the inclusion of artefact review and metadata creation as part of the publication workflow. The paper discusses a formalized technical structure around a set of canonical steps which helps codify and standardize the process for researchers, curators, and publishers. The proposed application of canonical workflows can help achieve the goals of improved transparency and reproducibility, increase FAIR compliance of all research artefacts at all steps, and facilitate better exchange of annotated and machine-readable metadata.
Collapse
Affiliation(s)
- Limor Peer
- 1nstitution for Social and Policy Studies, Yale University, Connecticut 06520, USA
| | - Claudia Biniossek
- Center for Empirical Research in Economics and Behavioral Sciences (CEREB), University of Erfurt, Thüringen 99089, Germany
| | - Dirk Betz
- Center for Empirical Research in Economics and Behavioral Sciences (CEREB), University of Erfurt, Thüringen 99089, Germany
| | - Thu-Mai Christian
- Odum Institute for Research in Social Science, University of North Carolina System, North Carolina 27514-3916, USA
| |
Collapse
|
9
|
Open-Source, Adaptable, All-in-One Smartphone-Based System for Quantitative Analysis of Point-of-Care Diagnostics. Diagnostics (Basel) 2022; 12:diagnostics12030589. [PMID: 35328142 PMCID: PMC8947044 DOI: 10.3390/diagnostics12030589] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 02/20/2022] [Accepted: 02/22/2022] [Indexed: 02/06/2023] Open
Abstract
Point-of-care (POC) diagnostics, in particular lateral flow assays (LFA), represent a great opportunity for rapid, precise, low-cost and accessible diagnosis of disease. Especially with the ongoing coronavirus disease 2019 (COVID-19) pandemic, rapid point-of-care tests are becoming everyday tools for identification and prevention. Using smartphones as biosensors can enhance POC devices as portable, low-cost platforms for healthcare and medicine, food and environmental monitoring, improving diagnosis and documentation in remote, low-resource locations. We present an open-source, all-in-one smartphone-based system for quantitative analysis of LFAs. It consists of a 3D-printed photo box, a smartphone for image acquisition, and an R Shiny software package with modular, customizable analysis workflow for image editing, analysis, data extraction, calibration and quantification of the assays. This system is less expensive than commonly used hardware and software, so it could prove very beneficial for diagnostic testing in the context of pandemics, as well as in low-resource countries.
Collapse
|
10
|
Auer S, Haeltermann NA, Weissberger TL, Erlich JC, Susilaradeya D, Julkowska M, Gazda MA, Schwessinger B, Jadavji NM. A community-led initiative for training in reproducible research. eLife 2021; 10:64719. [PMID: 34151774 PMCID: PMC8282331 DOI: 10.7554/elife.64719] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 06/18/2021] [Indexed: 12/15/2022] Open
Abstract
Open and reproducible research practices increase the reusability and impact of scientific research. The reproducibility of research results is influenced by many factors, most of which can be addressed by improved education and training. Here we describe how workshops developed by the Reproducibility for Everyone (R4E) initiative can be customized to provide researchers at all career stages and across most disciplines with education and training in reproducible research practices. The R4E initiative, which is led by volunteers, has reached more than 3000 researchers worldwide to date, and all workshop materials, including accompanying resources, are available under a CC-BY 4.0 license at https://www.repro4everyone.org/.
Collapse
Affiliation(s)
- Susann Auer
- Department of Plant Physiology, Institute of Botany, Faculty of Biology, Technische Universität Dresden, Dresden, Germany
| | - Nele A Haeltermann
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, United States
| | - Tracey L Weissberger
- QUEST Center, Berlin Institute of Health, Charité Universitätsmedizin Berlin, Berlin, Germany
| | - Jeffrey C Erlich
- Shanghai Key Laboratory of Brain Functional Genomics, East China Normal University, Shanghai, China
| | - Damar Susilaradeya
- Medical Technology Cluster, Indonesian Medical Education and Research Institute, Faculty of Medicine, Universitas Indonesia, Jakarta, Indonesia
| | | | - Małgorzata Anna Gazda
- CIBO/InBIOO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Campus Agrário de Vairão, Porto, Portugal.,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | | | - Nafisa M Jadavji
- Department of Biomedical Science, Midwestern University, Glendale, United States.,Department of Neuroscience, Carleton University, Ottawa, Canada
| | -
- Reproducibility for Everyone, New York, United States
| |
Collapse
|
11
|
Krafczyk MS, Shi A, Bhaskar A, Marinov D, Stodden V. Learning from reproducing computational results: introducing three principles and the Reproduction Package. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2021; 379:20200069. [PMID: 33775145 PMCID: PMC8059663 DOI: 10.1098/rsta.2020.0069] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 01/20/2021] [Indexed: 06/12/2023]
Abstract
We carry out efforts to reproduce computational results for seven published articles and identify barriers to computational reproducibility. We then derive three principles to guide the practice and dissemination of reproducible computational research: (i) Provide transparency regarding how computational results are produced; (ii) When writing and releasing research software, aim for ease of (re-)executability; (iii) Make any code upon which the results rely as deterministic as possible. We then exemplify these three principles with 12 specific guidelines for their implementation in practice. We illustrate the three principles of reproducible research with a series of vignettes from our experimental reproducibility work. We define a novel Reproduction Package, a formalism that specifies a structured way to share computational research artifacts that implements the guidelines generated from our reproduction efforts to allow others to build, reproduce and extend computational science. We make our reproduction efforts in this paper publicly available as exemplar Reproduction Packages. This article is part of the theme issue 'Reliability and reproducibility in computational science: implementing verification, validation and uncertainty quantification in silico'.
Collapse
Affiliation(s)
- M. S. Krafczyk
- University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - A. Shi
- University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - A. Bhaskar
- University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - D. Marinov
- University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - V. Stodden
- University of Illinois at Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
12
|
Abstract
A systematic and reproducible “workflow”—the process that moves a scientific investigation from raw data to coherent research question to insightful contribution—should be a fundamental part of academic data-intensive research practice. In this paper, we elaborate basic principles of a reproducible data analysis workflow by defining 3 phases: the Explore, Refine, and Produce Phases. Each phase is roughly centered around the audience to whom research decisions, methodologies, and results are being immediately communicated. Importantly, each phase can also give rise to a number of research products beyond traditional academic publications. Where relevant, we draw analogies between design principles and established practice in software development. The guidance provided here is not intended to be a strict rulebook; rather, the suggestions for practices and tools to advance reproducible, sound data-intensive analysis may furnish support for both students new to research and current researchers who are new to data-intensive work.
Collapse
|
13
|
Abstract
Advances in computing technology have spurred two extraordinary phenomena in science: large-scale and high-throughput data collection coupled with the creation and implementation of complex statistical algorithms for data analysis. These two phenomena have brought about tremendous advances in scientific discovery but have raised two serious concerns. The complexity of modern data analyses raises questions about the reproducibility of the analyses, meaning the ability of independent analysts to recreate the results claimed by the original authors using the original data and analysis techniques. Reproducibility is typically thwarted by a lack of availability of the original data and computer code. A more general concern is the replicability of scientific findings, which concerns the frequency with which scientific claims are confirmed by completely independent investigations. Although reproducibility and replicability are related, they focus on different aspects of scientific progress. In this review, we discuss the origins of reproducible research, characterize the current status of reproducibility in public health research, and connect reproducibility to current concerns about the replicability of scientific findings. Finally, we describe a path forward for improving both the reproducibility and replicability of public health research in the future.
Collapse
Affiliation(s)
- Roger D Peng
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland 21205, USA; ,
| | - Stephanie C Hicks
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland 21205, USA; ,
| |
Collapse
|
14
|
Beruski GC, Del Ponte EM, Pereira AB, Gleason ML, Câmara GMS, Araújo Junior IP, Sentelhas PC. Performance and Profitability of Rain-Based Thresholds for Timing Fungicide Applications in Soybean Rust Control. PLANT DISEASE 2020; 104:2704-2712. [PMID: 32716274 DOI: 10.1094/pdis-01-20-0210-re] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Soybean rust (SBR), caused by the fungus Phakopsora pachyrhizi, is the most damaging disease of soybean in Brazil. Effective management is achieved by means of calendar-timed sprays of fungicide mixtures, which do not explicitly consider weather-associated disease risk. Two rain-based action thresholds of disease severity values (DSV50 and DSV80) were proposed and compared with two leaf wetness duration-temperature thresholds of daily values of infection probability (DVIP6 and DVIP9) and with a calendar program, with regard to performance and profitability. An unsprayed check treatment plot was included for calculating relative control. Disease severity and yield data were obtained from 29 experiments conducted at six sites across four states in Brazil during the 2012-13, 2014-15, and 2015-16 growing seasons, which represented different growing regions and climatic conditions. The less conservative rainfall action threshold (DSV80) resulted in fewer fungicide sprays compared with the other treatments, and the more conservative one (DSV50) resulted in fewer sprays than the DVIP thresholds. Yield was generally higher with the increase in spray number, but the economic analysis showed no significant differences in the risk of not offsetting the costs of fungicide sprays regardless of the system. Therefore, based on the simplicity and the profitability of the rain-based model, the system is a good candidate for incorporating into the management of SBR in soybean production fields in Brazil.
Collapse
Affiliation(s)
- Gustavo C Beruski
- Departamento de Engenharia de Biossistemas, ESALQ - Universidade de São Paulo, Piracicaba, São Paulo State, 13418-900, Brazil
| | - Emerson M Del Ponte
- Departamento de Fitopatologia, Universidade Federal de Viçosa, Viçosa, Minas Gerais State, 36570-000, Brazil
| | - André B Pereira
- Departamento de Ciências do Solo e Engenharia Agrícola, Universidade Estadual de Ponta, Ponta Grossa, Paraná State, 84010-330, Brazil
| | - Mark L Gleason
- Department of Plant Pathology and Microbiology, Iowa State University, Ames, IA 50011-1101, U.S.A
| | - Gil M S Câmara
- Departamento de Produção Vegetal, ESALQ - Universidade de São Paulo, Piracicaba, São Paulo State, 13418-900, Brazil
| | - Ivan P Araújo Junior
- Departamento de Fitopatologia, Fundação Mato Grosso, Rondonópolis, Mato Grosso State, 78750-000, Brazil
| | - Paulo C Sentelhas
- Departamento de Engenharia de Biossistemas, ESALQ - Universidade de São Paulo, Piracicaba, São Paulo State, 13418-900, Brazil
| |
Collapse
|
15
|
Konkol M, Nüst D, Goulier L. Publishing computational research - a review of infrastructures for reproducible and transparent scholarly communication. Res Integr Peer Rev 2020; 5:10. [PMID: 32685199 PMCID: PMC7359270 DOI: 10.1186/s41073-020-00095-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 06/24/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND The trend toward open science increases the pressure on authors to provide access to the source code and data they used to compute the results reported in their scientific papers. Since sharing materials reproducibly is challenging, several projects have developed solutions to support the release of executable analyses alongside articles. METHODS We reviewed 11 applications that can assist researchers in adhering to reproducibility principles. The applications were found through a literature search and interactions with the reproducible research community. An application was included in our analysis if it (i) was actively maintained at the time the data for this paper was collected, (ii) supports the publication of executable code and data, (iii) is connected to the scholarly publication process. By investigating the software documentation and published articles, we compared the applications across 19 criteria, such as deployment options and features that support authors in creating and readers in studying executable papers. RESULTS From the 11 applications, eight allow publishers to self-host the system for free, whereas three provide paid services. Authors can submit an executable analysis using Jupyter Notebooks or R Markdown documents (10 applications support these formats). All approaches provide features to assist readers in studying the materials, e.g., one-click reproducible results or tools for manipulating the analysis parameters. Six applications allow for modifying materials after publication. CONCLUSIONS The applications support authors to publish reproducible research predominantly with literate programming. Concerning readers, most applications provide user interfaces to inspect and manipulate the computational analysis. The next step is to investigate the gaps identified in this review, such as the costs publishers have to expect when hosting an application, the consideration of sensitive data, and impacts on the review process.
Collapse
Affiliation(s)
- Markus Konkol
- Institute for Geoinformatics, University of Münster, Münster, Germany
| | - Daniel Nüst
- Institute for Geoinformatics, University of Münster, Münster, Germany
| | - Laura Goulier
- Institute for Geoinformatics, University of Münster, Münster, Germany
| |
Collapse
|
16
|
Kurilov R, Haibe-Kains B, Brors B. Assessment of modelling strategies for drug response prediction in cell lines and xenografts. Sci Rep 2020; 10:2849. [PMID: 32071383 PMCID: PMC7028927 DOI: 10.1038/s41598-020-59656-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Accepted: 01/23/2020] [Indexed: 12/20/2022] Open
Abstract
Data from several large high-throughput drug response screens have become available to the scientific community recently. Although many efforts have been made to use this information to predict drug sensitivity, our ability to accurately predict drug response based on genetic data remains limited. In order to systematically examine how different aspects of modelling affect the resulting prediction accuracy, we built a range of models for seven drugs (erlotinib, pacliatxel, lapatinib, PLX4720, sorafenib, nutlin-3 and nilotinib) using data from the largest available cell line and xenograft drug sensitivity screens. We found that the drug response metric, the choice of the molecular data type and the number of training samples have a substantial impact on prediction accuracy. We also compared the tasks of drug response prediction with tissue type prediction and found that, unlike for drug response, tissue type can be predicted with high accuracy. Furthermore, we assessed our ability to predict drug response in four xenograft cohorts (treated either with erlotinib, gemcitabine or paclitaxel) using models trained on cell line data. We could predict response in an erlotinib-treated cohort with a moderate accuracy (correlation ≈ 0.5), but were unable to correctly predict responses in cohorts treated with gemcitabine or paclitaxel.
Collapse
Affiliation(s)
- Roman Kurilov
- Division of Applied Bioinformatics, German Cancer Research Center, Heidelberg, Germany. .,Faculty of Biosciences, Heidelberg University, Heidelberg, Germany.
| | - Benjamin Haibe-Kains
- Princess Margaret Cancer Centre, Toronto, Ontario, M5G 1L7, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Ontario, M5G 1L7, Canada.,Department of Computer Science, University of Toronto, Toronto, Ontario, M5T 3A1, Canada.,Ontario Institute for Cancer Research, Toronto, Ontario, M5G 1L7, Canada
| | - Benedikt Brors
- Division of Applied Bioinformatics, German Cancer Research Center, Heidelberg, Germany.,National Center for Tumor Diseases (NCT), Heidelberg, Germany.,German Cancer Consortium (DKTK), Core Center Heidelberg, Heidelberg, Germany
| |
Collapse
|
17
|
Abstract
Making scientific analyses reproducible, well documented, and easily shareable is crucial to maximizing their impact and ensuring that others can build on them. However, accomplishing these goals is not easy, requiring careful attention to organization, workflow, and familiarity with tools that are not a regular part of every scientist's toolbox. We have developed an R package, workflowr, to help all scientists, regardless of background, overcome these challenges. Workflowr aims to instill a particular "workflow" - a sequence of steps to be repeated and integrated into research practice - that helps make projects more reproducible and accessible.This workflow integrates four key elements: (1) version control (via Git); (2) literate programming (via R Markdown); (3) automatic checks and safeguards that improve code reproducibility; and (4) sharing code and results via a browsable website. These features exploit powerful existing tools, whose mastery would take considerable study. However, the workflowr interface is simple enough that novice users can quickly enjoy its many benefits. By simply following the workflowr "workflow", R users can create projects whose results, figures, and development history are easily accessible on a static website - thereby conveniently shareable with collaborators by sending them a URL - and accompanied by source code and reproducibility safeguards. The workflowr R package is open source and available on CRAN, with full documentation and source code available at https://github.com/jdblischak/workflowr.
Collapse
Affiliation(s)
- John D. Blischak
- Department of Human Genetics, University of Chicago, Chicago, IL, 60637, USA
| | - Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL, 60637, USA
- Research Computing Center, University of Chicago, Chicago, IL, 60637, USA
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, IL, 60637, USA
- Department of Statistics, University of Chicago, Chicago, IL, 60637, USA
| |
Collapse
|
18
|
Liu DM, Salganik MJ. Successes and Struggles with Computational Reproducibility: Lessons from the Fragile Families Challenge. SOCIUS : SOCIOLOGICAL RESEARCH FOR A DYNAMIC WORLD 2019; 5:10.1177/2378023119849803. [PMID: 37309413 PMCID: PMC10260256 DOI: 10.1177/2378023119849803] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Reproducibility is fundamental to science, and an important component of reproducibility is computational reproducibility: the ability of a researcher to recreate the results of a published study using the original author's raw data and code. Although most people agree that computational reproducibility is important, it is still difficult to achieve in practice. In this article, the authors describe their approach to enabling computational reproducibility for the 12 articles in this special issue of Socius about the Fragile Families Challenge. The approach draws on two tools commonly used by professional software engineers but not widely used by academic researchers: software containers (e.g., Docker) and cloud computing (e.g., Amazon Web Services). These tools made it possible to standardize the computing environment around each submission, which will ease computational reproducibility both today and in the future. Drawing on their successes and struggles, the authors conclude with recommendations to researchers and journals.
Collapse
|
19
|
Vision dominates in perceptual language: English sensory vocabulary is optimized for usage. Cognition 2018; 179:213-220. [DOI: 10.1016/j.cognition.2018.05.008] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Revised: 05/02/2018] [Accepted: 05/08/2018] [Indexed: 11/19/2022]
|
20
|
Nüst D, Granell C, Hofer B, Konkol M, Ostermann FO, Sileryte R, Cerutti V. Reproducible research and GIScience: an evaluation using AGILE conference papers. PeerJ 2018; 6:e5072. [PMID: 30013826 PMCID: PMC6047504 DOI: 10.7717/peerj.5072] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Accepted: 06/04/2018] [Indexed: 11/20/2022] Open
Abstract
The demand for reproducible research is on the rise in disciplines concerned with data analysis and computational methods. Therefore, we reviewed current recommendations for reproducible research and translated them into criteria for assessing the reproducibility of articles in the field of geographic information science (GIScience). Using this criteria, we assessed a sample of GIScience studies from the Association of Geographic Information Laboratories in Europe (AGILE) conference series, and we collected feedback about the assessment from the study authors. Results from the author feedback indicate that although authors support the concept of performing reproducible research, the incentives for doing this in practice are too small. Therefore, we propose concrete actions for individual researchers and the GIScience conference series to improve transparency and reproducibility. For example, to support researchers in producing reproducible work, the GIScience conference series could offer awards and paper badges, provide author guidelines for computational research, and publish articles in Open Access formats.
Collapse
Affiliation(s)
- Daniel Nüst
- Institute for Geoinformatics, University of Münster, Münster, Germany
| | - Carlos Granell
- Institute of New Imaging Technologies, Universitat Jaume I de Castellón, Castellón, Spain
| | - Barbara Hofer
- Interfaculty Department of Geoinformatics - Z_GIS, University of Salzburg, Salzburg, Austria
| | - Markus Konkol
- Institute for Geoinformatics, University of Münster, Münster, Germany
| | - Frank O. Ostermann
- Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, Enschede, The Netherlands
| | - Rusne Sileryte
- Faculty of Architecture and the Built Environment, Delft University of Technology, Delft, The Netherlands
| | - Valentina Cerutti
- Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, Enschede, The Netherlands
| |
Collapse
|
21
|
Gelfond J, Goros M, Hernandez B, Bokov A. A System for an Accountable Data Analysis Process in R. THE R JOURNAL 2018; 10:6-21. [PMID: 30505573 PMCID: PMC6261481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Efficiently producing transparent analyses may be difficult for beginners or tedious for the experienced. This implies a need for computing systems and environments that can efficiently satisfy reproducibility and accountability standards. To this end, we have developed a system, R package, and R Shiny application called adapr (Accountable Data Analysis Process in R) that is built on the principle of accountable units. An accountable unit is a data file (statistic, table or graphic) that can be associated with a provenance, meaning how it was created, when it was created and who created it, and this is similar to the 'verifiable computational results' (VCR) concept proposed by Gavish and Donoho. Both accountable units and VCRs are version controlled, sharable, and can be incorporated into a collaborative project. However, accountable units use file hashes and do not involve watermarking or public repositories like VCRs. Reproducing collaborative work may be highly complex, requiring repeating computations on multiple systems from multiple authors; however, determining the provenance of each unit is simpler, requiring only a search using file hashes and version control systems.
Collapse
Affiliation(s)
- Jonathan Gelfond
- Department of Epidemiology and Biostatistics, UT Health San Antonio, TX, USA,
| | - Martin Goros
- Department of Epidemiology and Biostatistics, UT Health San Antonio, TX, USA,
| | - Brian Hernandez
- Department of Epidemiology and Biostatistics, UT Health San Antonio, TX, USA,
| | - Alex Bokov
- Department of Epidemiology and Biostatistics, UT Health San Antonio, TX, USA,
| |
Collapse
|
22
|
Rasmussen CH, Smith MK, Ito K, Sundararajan V, Magnusson MO, Niclas Jonsson E, Fostvedt L, Burger P, McFadyen L, Tensfeldt TG, Nicholas T. PharmTeX: a LaTeX-Based Open-Source Platform for Automated Reporting Workflow. AAPS JOURNAL 2018; 20:52. [PMID: 29549459 DOI: 10.1208/s12248-018-0202-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Accepted: 02/09/2018] [Indexed: 11/30/2022]
Abstract
Every year, the pharmaceutical industry generates a large number of scientific reports related to drug research, development, and regulatory submissions. Many of these reports are created using text processing tools such as Microsoft Word. Given the large number of figures, tables, references, and other elements, this is often a tedious task involving hours of copying and pasting and substantial efforts in quality control (QC). In the present article, we present the LaTeX-based open-source reporting platform, PharmTeX, a community-based effort to make reporting simple, reproducible, and user-friendly. The PharmTeX creators put a substantial effort into simplifying the sometimes complex elements of LaTeX into user-friendly functions that rely on advanced LaTeX and Perl code running in the background. Using this setup makes LaTeX much more accessible for users with no prior LaTeX experience. A software collection was compiled for users not wanting to manually install the required software components. The PharmTeX templates allow for inclusion of tables directly from mathematical software output as well and figures from several formats. Code listings can be included directly from source. No previous experience and only a few hours of training are required to start writing reports using PharmTeX. PharmTeX significantly reduces the time required for creating a scientific report fully compliant with regulatory and industry expectations. QC is made much simpler, since there is a direct link between analysis output and report input. PharmTeX makes available to report authors the strengths of LaTeX document processing without the need for extensive training. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
| | | | - Kaori Ito
- Applied Pharmacometrics and Research, 6 School Street, Mystic, Connecticut, 06355, USA
| | | | | | | | - Luke Fostvedt
- Pfizer, 10646 Science Center Dr, San Diego, California, 92121, USA
| | - Paula Burger
- Pfizer, 445 Eastern Point Rd, Groton, Connecticut, 06340, USA
| | | | | | | |
Collapse
|
23
|
Stodden V, Seiler J, Ma Z. An empirical analysis of journal policy effectiveness for computational reproducibility. Proc Natl Acad Sci U S A 2018; 115:2584-2589. [PMID: 29531050 PMCID: PMC5856507 DOI: 10.1073/pnas.1708290115] [Citation(s) in RCA: 109] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A key component of scientific communication is sufficient information for other researchers in the field to reproduce published findings. For computational and data-enabled research, this has often been interpreted to mean making available the raw data from which results were generated, the computer code that generated the findings, and any additional information needed such as workflows and input parameters. Many journals are revising author guidelines to include data and code availability. This work evaluates the effectiveness of journal policy that requires the data and code necessary for reproducibility be made available postpublication by the authors upon request. We assess the effectiveness of such a policy by (i) requesting data and code from authors and (ii) attempting replication of the published findings. We chose a random sample of 204 scientific papers published in the journal Science after the implementation of their policy in February 2011. We found that we were able to obtain artifacts from 44% of our sample and were able to reproduce the findings for 26%. We find this policy-author remission of data and code postpublication upon request-an improvement over no policy, but currently insufficient for reproducibility.
Collapse
Affiliation(s)
- Victoria Stodden
- School of Information Sciences, University of Illinois at Urbana-Champaign, Champaign, IL 61820;
| | - Jennifer Seiler
- Department of Statistics, Columbia University, New York, NY 10027
| | - Zhaokun Ma
- Department of Statistics, Columbia University, New York, NY 10027
| |
Collapse
|
24
|
Almugbel R, Hung LH, Hu J, Almutairy A, Ortogero N, Tamta Y, Yeung KY. Reproducible Bioconductor workflows using browser-based interactive notebooks and containers. J Am Med Inform Assoc 2018; 25:4-12. [PMID: 29092073 PMCID: PMC6381817 DOI: 10.1093/jamia/ocx120] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 08/31/2017] [Accepted: 09/28/2017] [Indexed: 11/14/2022] Open
Abstract
Objective Bioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocImageBuilder, that allows users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server. Materials and methods We present four different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets, process RNA-seq data and KinomeScan data. These interactive notebooks are available on GitHub. The analytical results can be viewed in a browser. Most importantly, the software contents can be executed and modified. This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need to install any software and ensuring reproducibility. All the notebooks were produced using custom files generated by BiocImageBuilder. Results BiocImageBuilder facilitates the publication of workflows with a point-and-click user interface. We demonstrate that interactive notebooks can be used to disseminate a wide range of bioinformatics analyses. The use of software containers to mirror the original software environment ensures reproducibility of results. Parameters and code can be dynamically modified, allowing for robust verification of published results and encouraging rapid adoption of new methods. Conclusion Given the increasing complexity of bioinformatics workflows, we anticipate that these interactive software notebooks will become as necessary for documenting software methods as traditional laboratory notebooks have been for documenting bench protocols, and as ubiquitous.
Collapse
Affiliation(s)
- Reem Almugbel
- Institute of Technology, University of Washington, Tacoma, WA, USA
| | - Ling-Hong Hung
- Institute of Technology, University of Washington, Tacoma, WA, USA
| | - Jiaming Hu
- Institute of Technology, University of Washington, Tacoma, WA, USA
| | - Abeer Almutairy
- Institute of Technology, University of Washington, Tacoma, WA, USA
| | - Nicole Ortogero
- Department of Clinical Investigation, Madigan Army Medical Center, Tacoma, WA, USA
| | - Yashaswi Tamta
- Institute of Technology, University of Washington, Tacoma, WA, USA
| | - Ka Yee Yeung
- Institute of Technology, University of Washington, Tacoma, WA, USA
| |
Collapse
|
25
|
Abstract
Secondary ion mass spectrometry (SIMS) has become an increasingly utilized tool in biologically relevant studies. Of these, high lateral resolution methodologies using the NanoSIMS 50/50L have been especially powerful within many biological fields over the past decade. Here, the authors provide a review of this technology, sample preparation and analysis considerations, examples of recent biological studies, data analyses, and current outlooks. Specifically, the authors offer an overview of SIMS and development of the NanoSIMS. The authors describe the major experimental factors that should be considered prior to NanoSIMS analysis and then provide information on best practices for data analysis and image generation, which includes an in-depth discussion of appropriate colormaps. Additionally, the authors provide an open-source method for data representation that allows simultaneous visualization of secondary electron and ion information within a single image. Finally, the authors present a perspective on the future of this technology and where they think it will have the greatest impact in near future.
Collapse
|
26
|
Stodden V, McNutt M, Bailey DH, Deelman E, Gil Y, Hanson B, Heroux MA, Ioannidis JPA, Taufer M. Enhancing reproducibility for computational methods. Science 2017; 354:1240-1241. [PMID: 27940837 DOI: 10.1126/science.aah6168] [Citation(s) in RCA: 124] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Affiliation(s)
- Victoria Stodden
- University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA.
| | - Marcia McNutt
- National Academy of Sciences, Washington, DC 20418, USA
| | | | - Ewa Deelman
- University of Southern California, Los Angeles, CA 90007, USA
| | - Yolanda Gil
- University of Southern California, Los Angeles, CA 90007, USA
| | - Brooks Hanson
- American Geophysical Union, Washington, DC 20009, USA
| | | | | | | |
Collapse
|
27
|
Eglen SJ, Marwick B, Halchenko YO, Hanke M, Sufi S, Gleeson P, Silver RA, Davison AP, Lanyon L, Abrams M, Wachtler T, Willshaw DJ, Pouzat C, Poline JB. Toward standard practices for sharing computer code and programs in neuroscience. Nat Neurosci 2017; 20:770-773. [PMID: 28542156 PMCID: PMC6386137 DOI: 10.1038/nn.4550] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Computational techniques are central in many areas of neuroscience, and are relatively easy to share. This paper describes why computer programs underlying scientific publications should be shared, and lists simple steps for sharing. Together with ongoing efforts in data sharing, this should aid reproducibility of research.
Collapse
Affiliation(s)
- Stephen J. Eglen
- Cambridge Computational Biology Institute, Department of Applied Mathematics and Theoretical Physics, University of Cambridge, UK
| | - Ben Marwick
- Department of Anthropology, University of Washington, Seattle, WA 98195-3100 USA
| | - Yaroslav O. Halchenko
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH 03755 USA
| | - Michael Hanke
- Institute of Psychology II, Otto-von-Guericke-University Magdeburg, 39106 Magdeburg, Germany
- Center for Behavioral Brain Sciences, 39106 Magdeburg, Germany
| | - Shoaib Sufi
- Software Sustainability Institute, University of Manchester, UK
| | - Padraig Gleeson
- Department of Neuroscience, Physiology and Pharmacology, University College London, UK
| | - R. Angus Silver
- Department of Neuroscience, Physiology and Pharmacology, University College London, UK
| | - Andrew P. Davison
- Unité de Neurosciences, Information et Complexité, CNRS, Gif sur Yvette, France
| | - Linda Lanyon
- International Neuroinformatics Coordinating Facility, Karolinska Institutet, Stockholm, Sweden
| | - Mathew Abrams
- International Neuroinformatics Coordinating Facility, Karolinska Institutet, Stockholm, Sweden
| | - Thomas Wachtler
- Department of Biology II, Ludwig-Maximilians-Universität Muünchen, Germany
| | - David J. Willshaw
- Institute for Adaptive and Neural Computation, School of Informatics, University of Edinburgh, UK
| | - Christophe Pouzat
- MAP5 Paris-Descartes University and CNRS UMR 8145, 75006 Paris, France
| | - Jean-Baptiste Poline
- Henry H. Wheeler, Jr. Brain Imaging Center, Helen Wills Neuroscience Institute, University of California, Berkeley, USA
| |
Collapse
|
28
|
Reproducibility and Practical Adoption of GEOBIA with Open-Source Software in Docker Containers. REMOTE SENSING 2017. [DOI: 10.3390/rs9030290] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
29
|
Madeyski L, Kitchenham B. Would wider adoption of reproducible research be beneficial for empirical software engineering research? JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2017. [DOI: 10.3233/jifs-169146] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Lech Madeyski
- Faculty of Computer Science and Management, Wroclaw University of Science and Technology, Wroclaw, Poland
| | | |
Collapse
|
30
|
May S, McKnight B. Graphics and statistics for cardiology: survival analysis. Heart 2017; 103:335-340. [DOI: 10.1136/heartjnl-2015-308229] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/20/2016] [Revised: 11/21/2016] [Accepted: 11/21/2016] [Indexed: 01/15/2023] Open
|
31
|
Denaxas S, Direk K, Gonzalez-Izquierdo A, Pikoula M, Cakiroglu A, Moore J, Hemingway H, Smeeth L. Methods for enhancing the reproducibility of biomedical research findings using electronic health records. BioData Min 2017; 10:31. [PMID: 28912836 PMCID: PMC5594436 DOI: 10.1186/s13040-017-0151-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2017] [Accepted: 08/28/2017] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND The ability of external investigators to reproduce published scientific findings is critical for the evaluation and validation of biomedical research by the wider community. However, a substantial proportion of health research using electronic health records (EHR), data collected and generated during clinical care, is potentially not reproducible mainly due to the fact that the implementation details of most data preprocessing, cleaning, phenotyping and analysis approaches are not systematically made available or shared. With the complexity, volume and variety of electronic health record data sources made available for research steadily increasing, it is critical to ensure that scientific findings from EHR data are reproducible and replicable by researchers. Reporting guidelines, such as RECORD and STROBE, have set a solid foundation by recommending a series of items for researchers to include in their research outputs. Researchers however often lack the technical tools and methodological approaches to actuate such recommendations in an efficient and sustainable manner. RESULTS In this paper, we review and propose a series of methods and tools utilized in adjunct scientific disciplines that can be used to enhance the reproducibility of research using electronic health records and enable researchers to report analytical approaches in a transparent manner. Specifically, we discuss the adoption of scientific software engineering principles and best-practices such as test-driven development, source code revision control systems, literate programming and the standardization and re-use of common data management and analytical approaches. CONCLUSION The adoption of such approaches will enable scientists to systematically document and share EHR analytical workflows and increase the reproducibility of biomedical research using such complex data sources.
Collapse
Affiliation(s)
- Spiros Denaxas
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA UK.,Farr Institute of Health Informatics Research, 222 Euston Road, London, UK
| | - Kenan Direk
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA UK.,Farr Institute of Health Informatics Research, 222 Euston Road, London, UK
| | - Arturo Gonzalez-Izquierdo
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA UK.,Farr Institute of Health Informatics Research, 222 Euston Road, London, UK
| | - Maria Pikoula
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA UK.,Farr Institute of Health Informatics Research, 222 Euston Road, London, UK
| | - Aylin Cakiroglu
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT UK
| | - Jason Moore
- Institute of Biomedical Informatics, University of Pennsylvania, Richards Medical Research Laboratories, 3700 Hamilton Walk, Philadelphia, 19104 USA
| | - Harry Hemingway
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA UK.,Farr Institute of Health Informatics Research, 222 Euston Road, London, UK
| | - Liam Smeeth
- EHR Research Group, Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, Keppel Streeet, London, WC1E 7HT UK
| |
Collapse
|
32
|
Mair P. Thou Shalt Be Reproducible! A Technology Perspective. Front Psychol 2016; 7:1079. [PMID: 27471486 PMCID: PMC4943952 DOI: 10.3389/fpsyg.2016.01079] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2016] [Accepted: 07/01/2016] [Indexed: 11/21/2022] Open
Abstract
This article elaborates on reproducibility in psychology from a technological viewpoint. Modern open source computational environments are shown and explained that foster reproducibility throughout the whole research life cycle, and to which emerging psychology researchers should be sensitized, are shown and explained. First, data archiving platforms that make datasets publicly available are presented. Second, R is advocated as the data-analytic lingua franca in psychology for achieving reproducible statistical analysis. Third, dynamic report generation environments for writing reproducible manuscripts that integrate text, data analysis, and statistical outputs such as figures and tables in a single document are described. Supplementary materials are provided in order to get the reader started with these technologies.
Collapse
Affiliation(s)
- Patrick Mair
- Department of Psychology, Harvard University Cambridge, MA, USA
| |
Collapse
|
33
|
Reassessing and Revising Commuting Zones for 2010: History, Assessment, and Updates for U.S. ‘Labor-Sheds’ 1990–2010. POPULATION RESEARCH AND POLICY REVIEW 2016. [DOI: 10.1007/s11113-016-9386-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
34
|
O'Neill K, Brinkman RR. Publishing code is essential for reproducible flow cytometry bioinformatics. Cytometry A 2016; 89:10-1. [DOI: 10.1002/cyto.a.22805] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2015] [Revised: 11/18/2015] [Accepted: 11/21/2015] [Indexed: 11/06/2022]
Affiliation(s)
- Kieran O'Neill
- Genome Sciences Centre, British Columbia Cancer Agency; Vancouver British Columbia Canada
- Department of Pathology, University of British Columbia; Vancouver British Columbia Canada
| | - Ryan R. Brinkman
- Terry Fox Laboratory, British Columbia Cancer Agency; Vancouver British Columbia Canada
- Department of Medical Genetics, University of British Columbia; Vancouver British Columbia Canada
| |
Collapse
|
35
|
Kadlec J, StClair B, Ames DP, Gill RA. WaterML R package for managing ecological experiment data on a CUAHSI HydroServer. ECOL INFORM 2015. [DOI: 10.1016/j.ecoinf.2015.05.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
36
|
Fowler CS. Segregation as a multi-scalar phenomenon and its implications for neighborhood-scale research: the case of South Seattle 1990-2010. URBAN GEOGRAPHY 2015; 37:1-25. [PMID: 27041785 PMCID: PMC4811614 DOI: 10.1080/02723638.2015.1043775] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Neighborhoods and neighborhood change are often at least implicitly understood in relation to processes taking place at scales both smaller than and larger than the neighborhood itself. Until recently our capacity to represent these multi-scalar processes with quantitative measures has been limited. Recent work on "segregation profiles" by Reardon and collaborators (Reardon et al., 2008, 2009) expands our capacity to explore the relationship between population measures and scale. With the methodological tools now available, we need a conceptual shift in how we view population measures in order to bring our theories and measures of neighborhoods into alignment. I argue that segregation can be beneficially viewed as multi-scalar; not a value calculable at some 'correct' scale, but a continuous function with respect to scale. This shift requires new ways of thinking about and analyzing segregation with respect to scale that engage with the complexity of the multi-scalar measure. Using block level data for eight neighborhoods in Seattle, Washington I explore the implications of a multi-scalar segregation measure for understanding neighborhoods and neighborhood change from 1990 to 2010.
Collapse
Affiliation(s)
- Christopher S. Fowler
- Assistant Professor of Geography, The Pennsylvania State University, 314 Walker Building, Univesity Park, PA 16802-5011
| |
Collapse
|
37
|
Dye TS. Structure and growth of the leeward Kohala field system: an analysis with directed graphs. PLoS One 2014; 9:e102431. [PMID: 25058167 PMCID: PMC4109926 DOI: 10.1371/journal.pone.0102431] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Accepted: 06/17/2014] [Indexed: 12/02/2022] Open
Abstract
This study illustrates how the theory of directed graphs can be used to investigate the structure and growth of the leeward Kohala field system, a traditional Hawaiian archaeological site that presents an unparalleled opportunity to investigate relative chronology. The relative chronological relationships of agricultural walls and trails in two detailed study areas are represented as directed graphs and then investigated using graph theoretic concepts including cycle, level, and connectedness. The structural properties of the directed graphs reveal structure in the field system at several spatial scales. A process of deduction yields a history of construction in each detailed study area that is different than the history produced by an earlier investigation. These results indicate that it is now possible to study the structure and growth of the entire field system remnant using computer software implementations of graph theoretic concepts applied to observations of agricultural wall and trail intersections made on aerial imagery and/or during fieldwork. A relative chronology of field system development with a resolution of one generation is a possible result.
Collapse
Affiliation(s)
- Thomas S. Dye
- Department of Anthropology, University of Hawai'i, Honolulu, Hawaii, United States of America
- * E-mail:
| |
Collapse
|
38
|
Liu Z, Pounds S. An R package that automatically collects and archives details for reproducible computing. BMC Bioinformatics 2014; 15:138. [PMID: 24886202 PMCID: PMC4026591 DOI: 10.1186/1471-2105-15-138] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2013] [Accepted: 04/24/2014] [Indexed: 01/03/2023] Open
Abstract
Background It is scientifically and ethically imperative that the results of statistical analysis of biomedical research data be computationally reproducible in the sense that the reported results can be easily recapitulated from the study data. Some statistical analyses are computationally a function of many data files, program files, and other details that are updated or corrected over time. In many applications, it is infeasible to manually maintain an accurate and complete record of all these details about a particular analysis. Results Therefore, we developed the rctrack package that automatically collects and archives read only copies of program files, data files, and other details needed to computationally reproduce an analysis. Conclusions The rctrack package uses the trace function to temporarily embed detail collection procedures into functions that read files, write files, or generate random numbers so that no special modifications of the primary R program are necessary. At the conclusion of the analysis, rctrack uses these details to automatically generate a read only archive of data files, program files, result files, and other details needed to recapitulate the analysis results. Information about this archive may be included as an appendix of a report generated by Sweave or knitR. Here, we describe the usage, implementation, and other features of the rctrack package. The rctrack package is freely available from http://www.stjuderesearch.org/site/depts/biostats/rctrack under the GPL license.
Collapse
Affiliation(s)
| | - Stan Pounds
- Department of Biostatistics, St Jude Children's Research Hospital, Memphis TN 38105, USA.
| |
Collapse
|
39
|
Horton NJ. I Hear, I Forget. I Do, I Understand: A Modified Moore-Method Mathematical Statistics Course. AM STAT 2013. [DOI: 10.1080/00031305.2013.849207] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
40
|
Aghaeepour N, Finak G, Hoos H, Mosmann TR, Brinkman R, Gottardo R, Scheuermann RH. Critical assessment of automated flow cytometry data analysis techniques. Nat Methods 2013; 10:228-38. [PMID: 23396282 PMCID: PMC3906045 DOI: 10.1038/nmeth.2365] [Citation(s) in RCA: 350] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2012] [Accepted: 01/14/2013] [Indexed: 12/14/2022]
Abstract
In this analysis, the authors directly compared the performance of flow cytometry data processing algorithms to manual gating approaches. The results offer information of practical utility about the performance of the algorithms as applied to different data sets and challenges. Traditional methods for flow cytometry (FCM) data processing rely on subjective manual gating. Recently, several groups have developed computational methods for identifying cell populations in multidimensional FCM data. The Flow Cytometry: Critical Assessment of Population Identification Methods (FlowCAP) challenges were established to compare the performance of these methods on two tasks: (i) mammalian cell population identification, to determine whether automated algorithms can reproduce expert manual gating and (ii) sample classification, to determine whether analysis pipelines can identify characteristics that correlate with external variables (such as clinical outcome). This analysis presents the results of the first FlowCAP challenges. Several methods performed well as compared to manual gating or external variables using statistical performance measures, which suggests that automated methods have reached a sufficient level of maturity and accuracy for reliable use in FCM data analysis.
Collapse
Affiliation(s)
- Nima Aghaeepour
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, British Columbia, Canada
| | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Abstract
With the development of novel assay technologies, biomedical experiments and analyses have gone through substantial evolution. Today, a typical experiment can simultaneously measure hundreds to thousands of individual features (e.g. genes) in dozens of biological conditions, resulting in gigabytes of data that need to be processed and analyzed. Because of the multiple steps involved in the data generation and analysis and the lack of details provided, it can be difficult for independent researchers to try to reproduce a published study. With the recent outrage following the halt of a cancer clinical trial due to the lack of reproducibility of the published study, researchers are now facing heavy pressure to ensure that their results are reproducible. Despite the global demand, too many published studies remain non-reproducible mainly due to the lack of availability of experimental protocol, data and/or computer code. Scientific discovery is an iterative process, where a published study generates new knowledge and data, resulting in new follow-up studies or clinical trials based on these results. As such, it is important for the results of a study to be quickly confirmed or discarded to avoid wasting time and money on novel projects. The availability of high-quality, reproducible data will also lead to more powerful analyses (or meta-analyses) where multiple data sets are combined to generate new knowledge. In this article, we review some of the recent developments regarding biomedical reproducibility and comparability and discuss some of the areas where the overall field could be improved.
Collapse
Affiliation(s)
- Yunda Huang
- Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Mailstop M2-C200, Seattle, WA 98109-1024, USA
| | | |
Collapse
|
42
|
Goodman AF. Analysis, biomedicine, collaboration, and determinism challenges and guidance: wish list for biopharmaceuticals on the interface of computing and statistics. J Biopharm Stat 2012; 21:1140-57. [PMID: 22023682 DOI: 10.1080/10543406.2011.613361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
Abstract
I have personally witnessed processing advance from desk calculators and mainframes, through timesharing and PCs, to supercomputers and cloud computing. I have also witnessed resources grow from too little data into almost too much data, and from theory dominating data into data beginning to dominate theory while needing new theory. Finally, I have witnessed problems advance from simple in a lone discipline into becoming almost too complex in multiple disciplines, as well as approaches evolve from analysis driving solutions into solutions by data mining beginning to drive the analysis itself. How we do all of this has transitioned from competition overcoming collaboration into collaboration starting to overcome competition, as well as what is done being more important than how it is done has transitioned into how it is done becoming as important as what is done. In addition, what or how we do it being more important than what or how we should actually do it has shifted into what or how we should do it becoming just as important as what or how we do it, if not more so. Although we have come a long way in both our methodology and technology, are they sufficient for our current or future complex and multidisciplinary problems with their massive databases? Since the apparent answer is not a resounding yes, we are presented with tremendous challenges and opportunities. This personal perspective adapts my background and experience to be appropriate for biopharmaceuticals. In these times of exploding change, informed perspectives on what challenges should be explored with accompanying guidance may be even more valuable than the far more typical literature reviews in conferences and journals of what has already been accomplished without challenges or guidance. Would we believe that an architect who designs a skyscraper determines the skyscraper's exact exterior, interior and furnishings or only general characteristics? Why not increase dependability of conclusions in genetics and translational medicine by enriching genetic determinism with uncertainty? Uncertainty is our friend if exploited or potential enemy if ignored. Genes design proteins, but they cannot operationally determine all protein characteristics: they begin a long chain of complex events occurring many times via intricate feedbacks plus interactions which are not all determined. Genes influence proteins and diseases by just determining their probability distributions, not by determining them. From any sample of diseased people, we may more successfully infer gene probability distributions than genes themselves, and it poses an issue to resolve. My position is supported by 2-3 articles a week in ScienceDaily, 2011.
Collapse
Affiliation(s)
- Arnold F Goodman
- Collaborative Data Solutions, Villa Park, California 92861-1227, USA.
| |
Collapse
|
43
|
Evidence for a Late Pliocene faunal transition based on a new rodent assemblage from Oldowan locality Hadar A.L. 894, Afar Region, Ethiopia. J Hum Evol 2012; 62:328-37. [DOI: 10.1016/j.jhevol.2011.02.013] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2010] [Revised: 02/18/2011] [Accepted: 02/23/2011] [Indexed: 11/16/2022]
|
44
|
Crane JD, Ogborn DI, Cupido C, Melov S, Hubbard A, Bourgeois JM, Tarnopolsky MA. Massage Therapy Attenuates Inflammatory Signaling After Exercise-Induced Muscle Damage. Sci Transl Med 2012; 4:119ra13. [DOI: 10.1126/scitranslmed.3002882] [Citation(s) in RCA: 178] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
45
|
Delescluse M, Franconville R, Joucla S, Lieury T, Pouzat C. Making neurophysiological data analysis reproducible: why and how? ACTA ACUST UNITED AC 2011; 106:159-70. [PMID: 21986476 DOI: 10.1016/j.jphysparis.2011.09.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2011] [Revised: 08/29/2011] [Accepted: 09/22/2011] [Indexed: 10/16/2022]
Abstract
Reproducible data analysis is an approach aiming at complementing classical printed scientific articles with everything required to independently reproduce the results they present. "Everything" covers here: the data, the computer codes and a precise description of how the code was applied to the data. A brief history of this approach is presented first, starting with what economists have been calling replication since the early eighties to end with what is now called reproducible research in computational data analysis oriented fields like statistics and signal processing. Since efficient tools are instrumental for a routine implementation of these approaches, a description of some of the available ones is presented next. A toy example demonstrates then the use of two open source software programs for reproducible data analysis: the "Sweave family" and the org-mode of emacs. The former is bound to R while the latter can be used with R, Matlab, Python and many more "generalist" data processing software. Both solutions can be used with Unix-like, Windows and Mac families of operating systems. It is argued that neuroscientists could communicate much more efficiently their results by adopting the reproducible research paradigm from their lab books all the way to their articles, thesis and books.
Collapse
Affiliation(s)
- Matthieu Delescluse
- Laboratoire de physiologie cérébrale, CNRS UMR 8118, UFR biomédicale, Université Paris-Descartes, 45 rue des Saints-Péres, 75006 Paris, France.
| | | | | | | | | |
Collapse
|
46
|
Abstract
Reproducible research is a concept of providing access to data and software along with published scientific findings. By means of some case studies from different disciplines, we will illustrate reasons why readers should be given the possibility to look at the data and software independently from the authors of the original publication. We report results of a survey comprising 100 papers recently published in Bioinformatics. The main finding is that authors of this journal share a culture of making data available. However, the number of papers where source code for simulation studies or analyzes is available is still rather limited.
Collapse
|
47
|
|
48
|
|
49
|
Affiliation(s)
- Jill P Mesirov
- Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142, USA.
| |
Collapse
|
50
|
Baggerly KA, Coombes KR. Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology. Ann Appl Stat 2009. [DOI: 10.1214/09-aoas291] [Citation(s) in RCA: 213] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|