1
|
Martín del Pico E, Gelpí JL, Capella-Gutierrez S. FAIRsoft-a practical implementation of FAIR principles for research software. Bioinformatics 2024; 40:btae464. [PMID: 39037960 PMCID: PMC11330317 DOI: 10.1093/bioinformatics/btae464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 05/26/2024] [Accepted: 07/20/2024] [Indexed: 07/24/2024] Open
Abstract
MOTIVATION Software plays a crucial and growing role in research. Unfortunately, the computational component in Life Sciences research is often challenging to reproduce and verify. It could be undocumented, opaque, contain unknown errors that affect the outcome, or be directly unavailable and impossible to use for others. These issues are detrimental to the overall quality of scientific research. One step to address this problem is the formulation of principles that research software in the domain should meet to ensure its quality and sustainability, resembling the FAIR (findable, accessible, interoperable, and reusable) data principles. RESULTS We present here a comprehensive series of quantitative indicators based on a pragmatic interpretation of the FAIR Principles and their implementation on OpenEBench, ELIXIR's open platform providing both support for scientific benchmarking and an active observatory of quality-related features for Life Sciences research software. The results serve to understand the current practices around research software quality-related features and provide objective indications for improving them. AVAILABILITY AND IMPLEMENTATION Software metadata, from 11 different sources, collected, integrated, and analysed in the context of this manuscript are available at https://doi.org/10.5281/zenodo.7311067. Code used for software metadata retrieval and processing is available in the following repository: https://gitlab.bsc.es/inb/elixir/software-observatory/FAIRsoft_ETL.
Collapse
Affiliation(s)
| | - Josep Lluís Gelpí
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
- Biochemistry and Molecular Biomedicine Department, University of Barcelona, 08028 Barcelona, Spain
| | | |
Collapse
|
2
|
Moreau D, Wiebels K. Nine quick tips for open meta-analyses. PLoS Comput Biol 2024; 20:e1012252. [PMID: 39052540 PMCID: PMC11271959 DOI: 10.1371/journal.pcbi.1012252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024] Open
Abstract
Open science principles are revolutionizing the transparency, reproducibility, and accessibility of research. Meta-analysis has become a key technique for synthesizing data across studies in a principled way; however, its impact is contingent on adherence to open science practices. Here, we outline 9 quick tips for open meta-analyses, aimed at guiding researchers to maximize the reach and utility of their findings. We advocate for outlining preregistering clear protocols, opting for open tools and software, and the use of version control systems to ensure transparency and facilitate collaboration. We further emphasize the importance of reproducibility, for example, by sharing search syntax and analysis scripts, and discuss the benefits of planning for dynamic updating to enable living meta-analyses. We also recommend publication in open-access formats, as well as open data, open code, and open access publication. We close by encouraging active promotion of research findings to bridge the gap between complex syntheses and public discourse, and provide a detailed submission checklist to equip researchers, reviewers and journal editors with a structured approach to conducting and reporting open meta-analyses.
Collapse
Affiliation(s)
- David Moreau
- School of Psychology and Centre for Brain Research, University of Auckland, Auckland, New Zealand
| | - Kristina Wiebels
- School of Psychology and Centre for Brain Research, University of Auckland, Auckland, New Zealand
| |
Collapse
|
3
|
Herre C, Ho A, Eisenbraun B, Vincent J, Nicholson T, Boutsioukis G, Meyer PA, Ottaviano M, Krause KL, Key J, Sliz P. Introduction of the Capsules environment to support further growth of the SBGrid structural biology software collection. Acta Crystallogr D Struct Biol 2024; 80:439-450. [PMID: 38832828 PMCID: PMC11154594 DOI: 10.1107/s2059798324004881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 05/23/2024] [Indexed: 06/06/2024] Open
Abstract
The expansive scientific software ecosystem, characterized by millions of titles across various platforms and formats, poses significant challenges in maintaining reproducibility and provenance in scientific research. The diversity of independently developed applications, evolving versions and heterogeneous components highlights the need for rigorous methodologies to navigate these complexities. In response to these challenges, the SBGrid team builds, installs and configures over 530 specialized software applications for use in the on-premises and cloud-based computing environments of SBGrid Consortium members. To address the intricacies of supporting this diverse application collection, the team has developed the Capsule Software Execution Environment, generally referred to as Capsules. Capsules rely on a collection of programmatically generated bash scripts that work together to isolate the runtime environment of one application from all other applications, thereby providing a transparent cross-platform solution without requiring specialized tools or elevated account privileges for researchers. Capsules facilitate modular, secure software distribution while maintaining a centralized, conflict-free environment. The SBGrid platform, which combines Capsules with the SBGrid collection of structural biology applications, aligns with FAIR goals by enhancing the findability, accessibility, interoperability and reusability of scientific software, ensuring seamless functionality across diverse computing environments. Its adaptability enables application beyond structural biology into other scientific fields.
Collapse
Affiliation(s)
- Carol Herre
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts, USA
| | - Alex Ho
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts, USA
| | - Ben Eisenbraun
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts, USA
| | - James Vincent
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts, USA
| | - Thomas Nicholson
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
| | | | - Peter A. Meyer
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts, USA
| | - Michelle Ottaviano
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts, USA
| | - Kurt L. Krause
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
| | - Jason Key
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts, USA
| | - Piotr Sliz
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts, USA
- Department of Pediatrics, Boston Children’s Hospital, Boston, Massachusetts, USA
| |
Collapse
|
4
|
Drocco JA, Halliday K, Stewart BJ, Sandholtz SH, Morrison MD, Thissen JB, Be NA, Zwilling CE, Wilcox RR, Culpepper SA, Barbey AK, Jaing CJ. Efforts to enhance reproducibility in a human performance research project. F1000Res 2023; 12:1430. [PMID: 39291139 PMCID: PMC11406130 DOI: 10.12688/f1000research.140735.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/03/2023] [Indexed: 09/19/2024] Open
Abstract
Background: Ensuring the validity of results from funded programs is a critical concern for agencies that sponsor biological research. In recent years, the open science movement has sought to promote reproducibility by encouraging sharing not only of finished manuscripts but also of data and code supporting their findings. While these innovations have lent support to third-party efforts to replicate calculations underlying key results in the scientific literature, fields of inquiry where privacy considerations or other sensitivities preclude the broad distribution of raw data or analysis may require a more targeted approach to promote the quality of research output. Methods: We describe efforts oriented toward this goal that were implemented in one human performance research program, Measuring Biological Aptitude, organized by the Defense Advanced Research Project Agency's Biological Technologies Office. Our team implemented a four-pronged independent verification and validation (IV&V) strategy including 1) a centralized data storage and exchange platform, 2) quality assurance and quality control (QA/QC) of data collection, 3) test and evaluation of performer models, and 4) an archival software and data repository. Results: Our IV&V plan was carried out with assistance from both the funding agency and participating teams of researchers. QA/QC of data acquisition aided in process improvement and the flagging of experimental errors. Holdout validation set tests provided an independent gauge of model performance. Conclusions: In circumstances that do not support a fully open approach to scientific criticism, standing up independent teams to cross-check and validate the results generated by primary investigators can be an important tool to promote reproducibility of results.
Collapse
Affiliation(s)
- Jeffrey A Drocco
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, California, 94550, USA
| | - Kyle Halliday
- Computing Directorate, Lawrence Livermore National Laboratory, Livermore, California, 94550, USA
| | - Benjamin J Stewart
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, California, 94550, USA
| | - Sarah H Sandholtz
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, California, 94550, USA
| | - Michael D Morrison
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, California, 94550, USA
| | - James B Thissen
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, California, 94550, USA
| | - Nicholas A Be
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, California, 94550, USA
| | - Christopher E Zwilling
- Beckman Institute for Advanced Science and Technology and Department of Psychology, University of Illinois Urbana-Champaign, Urbana, Illinois, 61801, USA
| | - Ramsey R Wilcox
- Beckman Institute for Advanced Science and Technology and Department of Psychology, University of Illinois Urbana-Champaign, Urbana, Illinois, 61801, USA
| | - Steven A Culpepper
- Department of Statistics, University of Illinois Urbana-Champaign, Champaign, Illinois, 61820, USA
| | - Aron K Barbey
- Beckman Institute for Advanced Science and Technology and Department of Psychology, University of Illinois Urbana-Champaign, Urbana, Illinois, 61801, USA
| | - Crystal J Jaing
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, California, 94550, USA
| |
Collapse
|
5
|
Kütt J, Margus G, Kask L, Rätsepso T, Soodla K, Bernasconi R, Birkedal R, Järv P, Laasmaa M, Vendelin M. Simple analysis of gel images with IOCBIO Gel. BMC Biol 2023; 21:225. [PMID: 37864184 PMCID: PMC10589977 DOI: 10.1186/s12915-023-01734-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 10/12/2023] [Indexed: 10/22/2023] Open
Abstract
BACKGROUND Current solutions for the analysis of Western Blot images lack either transparency and reproducibility or can be tedious to use if one has to ensure the reproducibility of the analysis. RESULTS Here, we present an open-source gel image analysis program, IOCBIO Gel. It is designed to simplify image analysis and link the analysis results with the metadata describing the measurements. The software runs on all major desktop operating systems. It allows one to use it in either a single-researcher environment with local storage of the data or in a multiple-researcher environment using a central database to facilitate data sharing within the research team and beyond. By recording the original image and all operations performed on it, such as image cropping, subtraction of background, sample lane selection, and integration boundaries, the software ensures the reproducibility of the analysis and simplifies making corrections at any stage of the research. The analysis results are available either through direct access to the database used to store it or through the export of the relevant data. CONCLUSIONS The software is not only limited to Western Blot image analysis and can be used to analyze images obtained as a part of many other widely used biochemical techniques such as isoelectric focusing. By recording the original data and all the analysis steps, the program improves reproducibility in the analysis and contributes to the implementation of FAIR principles in the related fields.
Collapse
Affiliation(s)
- Jaak Kütt
- Laboratory of Systems Biology, Department of Cybernetics, Tallinn University of Technology, Tallinn, Estonia
| | - Georg Margus
- Laboratory of Systems Biology, Department of Cybernetics, Tallinn University of Technology, Tallinn, Estonia
| | - Lauri Kask
- Laboratory of Systems Biology, Department of Cybernetics, Tallinn University of Technology, Tallinn, Estonia
| | - Triinu Rätsepso
- Laboratory of Systems Biology, Department of Cybernetics, Tallinn University of Technology, Tallinn, Estonia
| | - Kärol Soodla
- Laboratory of Systems Biology, Department of Cybernetics, Tallinn University of Technology, Tallinn, Estonia
| | - Romain Bernasconi
- Laboratory of Systems Biology, Department of Cybernetics, Tallinn University of Technology, Tallinn, Estonia
| | - Rikke Birkedal
- Laboratory of Systems Biology, Department of Cybernetics, Tallinn University of Technology, Tallinn, Estonia
| | - Priit Järv
- Applied Artificial Intelligence Group, Department of Software Science, Tallinn University of Technology, Akadeemia 21, Tallinn, 12618, Estonia
| | - Martin Laasmaa
- Laboratory of Systems Biology, Department of Cybernetics, Tallinn University of Technology, Tallinn, Estonia
| | - Marko Vendelin
- Laboratory of Systems Biology, Department of Cybernetics, Tallinn University of Technology, Tallinn, Estonia.
| |
Collapse
|
6
|
|
7
|
Garijo D, Ménager H, Hwang L, Trisovic A, Hucka M, Morrell T, Allen A. Nine best practices for research software registries and repositories. PeerJ Comput Sci 2022; 8:e1023. [PMID: 36092012 PMCID: PMC9455149 DOI: 10.7717/peerj-cs.1023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 06/09/2022] [Indexed: 06/15/2023]
Abstract
Scientific software registries and repositories improve software findability and research transparency, provide information for software citations, and foster preservation of computational methods in a wide range of disciplines. Registries and repositories play a critical role by supporting research reproducibility and replicability, but developing them takes effort and few guidelines are available to help prospective creators of these resources. To address this need, the FORCE11 Software Citation Implementation Working Group convened a Task Force to distill the experiences of the managers of existing resources in setting expectations for all stakeholders. In this article, we describe the resultant best practices which include defining the scope, policies, and rules that govern individual registries and repositories, along with the background, examples, and collaborative work that went into their development. We believe that establishing specific policies such as those presented here will help other scientific software registries and repositories better serve their users and their disciplines.
Collapse
Affiliation(s)
| | - Hervé Ménager
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, Paris, France
| | - Lorraine Hwang
- University of California, Davis, Davis, California, United States
| | - Ana Trisovic
- Harvard University, Boston, Massachusetts, United States
| | - Michael Hucka
- California Institute of Technology, Pasadena, California, United States
| | - Thomas Morrell
- California Institute of Technology, Pasadena, California, United States
| | - Alice Allen
- University of Maryland, College Park, MD, United States
| | | | | |
Collapse
|
8
|
Fienen MN, Corson-Dosch NT, White JT, Leaf AT, Hunt RJ. Risk-Based Wellhead Protection Decision Support: A Repeatable Workflow Approach. GROUND WATER 2022; 60:71-86. [PMID: 34463959 DOI: 10.1111/gwat.13129] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 07/12/2021] [Accepted: 08/18/2021] [Indexed: 06/13/2023]
Abstract
Environmental water management often benefits from a risk-based approach where information on the area of interest is characterized, assembled, and incorporated into a decision model considering uncertainty. This includes prior information from literature, field measurements, professional interpretation, and data assimilation resulting in a decision tool with a posterior uncertainty assessment accounting for prior understanding and what is learned through model development and data assimilation. Model construction and data assimilation are time consuming and prone to errors, which motivates a repeatable workflow where revisions resulting from new interpretations or discovery of errors can be addressed and the analyses repeated efficiently and rigorously. In this work, motivated by the real world application of delineating risk-based (probabilistic) sources of water to supply wells in a humid temperate climate, a scripted workflow was generated for groundwater model construction, data assimilation, particle-tracking and post-processing. The workflow leverages existing datasets describing hydrogeology, hydrography, water use, recharge, and lateral boundaries. These specific data are available in the United States but the tools can be applied to similar datasets worldwide. The workflow builds the model, performs ensemble-based history matching, and uses a posterior Monte Carlo approach to provide probabilistic capture zones describing source water to wells in a risk-based framework. The water managers can then select areas of varying levels of protection based on their tolerance for risk of potential wrongness of the underlying models. All the tools in this workflow are open-source and free, which facilitates testing of this repeatable and transparent approach to other environmental problems.
Collapse
Affiliation(s)
| | | | | | - Andrew T Leaf
- U.S. Geological Survey, Upper Midwest Water Science Center, Middleton, WI, USA
| | - Randall J Hunt
- U.S. Geological Survey, Upper Midwest Water Science Center, Middleton, WI, USA
| |
Collapse
|
9
|
Sawchuk SL, Khair S. Computational Reproducibility: A Practical Framework for Data Curators. JOURNAL OF ESCIENCE LIBRARIANSHIP 2021. [DOI: 10.7191/jeslib.2021.1206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
Introduction: This paper presents concrete and actionable steps to guide researchers, data curators, and data managers in improving their understanding and practice of computational reproducibility.
Objectives: Focusing on incremental progress rather than prescriptive rules, researchers and curators can build their knowledge and skills as the need arises. This paper presents a framework of incremental curation for reproducibility to support open science objectives.
Methods: A computational reproducibility framework developed for the Canadian Data Curation Forum serves as the model for this approach. This framework combines learning about reproducibility with recommended steps to improving reproducibility.
Conclusion: Computational reproducibility leads to more transparent and accurate research. The authors warn that fear of a crisis and focus on perfection should not prevent curation that may be ‘good enough.’
Collapse
|
10
|
Kohane IS, Aronow BJ, Avillach P, Beaulieu-Jones BK, Bellazzi R, Bradford RL, Brat GA, Cannataro M, Cimino JJ, García-Barrio N, Gehlenborg N, Ghassemi M, Gutiérrez-Sacristán A, Hanauer DA, Holmes JH, Hong C, Klann JG, Loh NHW, Luo Y, Mandl KD, Daniar M, Moore JH, Murphy SN, Neuraz A, Ngiam KY, Omenn GS, Palmer N, Patel LP, Pedrera-Jiménez M, Sliz P, South AM, Tan ALM, Taylor DM, Taylor BW, Torti C, Vallejos AK, Wagholikar KB, Weber GM, Cai T. What Every Reader Should Know About Studies Using Electronic Health Record Data but May Be Afraid to Ask. J Med Internet Res 2021; 23:e22219. [PMID: 33600347 PMCID: PMC7927948 DOI: 10.2196/22219] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 09/14/2020] [Accepted: 01/10/2021] [Indexed: 12/13/2022] Open
Abstract
Coincident with the tsunami of COVID-19–related publications, there has been a surge of studies using real-world data, including those obtained from the electronic health record (EHR). Unfortunately, several of these high-profile publications were retracted because of concerns regarding the soundness and quality of the studies and the EHR data they purported to analyze. These retractions highlight that although a small community of EHR informatics experts can readily identify strengths and flaws in EHR-derived studies, many medical editorial teams and otherwise sophisticated medical readers lack the framework to fully critically appraise these studies. In addition, conventional statistical analyses cannot overcome the need for an understanding of the opportunities and limitations of EHR-derived studies. We distill here from the broader informatics literature six key considerations that are crucial for appraising studies utilizing EHR data: data completeness, data collection and handling (eg, transformation), data type (ie, codified, textual), robustness of methods against EHR variability (within and across institutions, countries, and time), transparency of data and analytic code, and the multidisciplinary approach. These considerations will inform researchers, clinicians, and other stakeholders as to the recommended best practices in reviewing manuscripts, grants, and other outputs from EHR-data derived studies, and thereby promote and foster rigor, quality, and reliability of this rapidly growing field.
Collapse
Affiliation(s)
- Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Bruce J Aronow
- Biomedical Informatics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH, United States
| | - Paul Avillach
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | | | - Riccardo Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.,ICS Maugeri, Pavia, Italy
| | - Robert L Bradford
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Gabriel A Brat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Mario Cannataro
- Data Analytics Research Center, University Magna Graecia of Catanzaro, Catanzaro, Italy.,Department of Medical and Surgical Sciences, University Magna Graecia of Catanzaro, Catanzaro, Italy
| | - James J Cimino
- Informatics Institute, University of Alabama at Birmingham, Birmingham, AL, United States
| | | | - Nils Gehlenborg
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Marzyeh Ghassemi
- Department of Computer Science and Medicine, University of Toronto, Toronto, ON, Canada
| | | | - David A Hanauer
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, United States
| | - John H Holmes
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Chuan Hong
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Jeffrey G Klann
- Department of Medicine, Harvard Medical School, Boston, MA, United States.,Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA, United States
| | | | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Chicago, IL, United States
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
| | - Mohamad Daniar
- Clinical Research Informatics, Boston Children's Hospital, Boston, MA, United States
| | - Jason H Moore
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, United States
| | - Shawn N Murphy
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.,Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Antoine Neuraz
- Department of Biomedical Informatics, Necker-Enfant Malades Hospital, Assistance Publique - Hôpitaux de Paris, Paris, France.,Centre de Recherche des Cordeliers, INSERM UMRS 1138 Team 22, Université de Paris, Paris, France
| | - Kee Yuan Ngiam
- National University Health Systems, Singapore, Singapore
| | - Gilbert S Omenn
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, United States
| | - Nathan Palmer
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Lav P Patel
- Department of Internal Medicine, Division of Medical Informatics, University of Kansas Medical Center, Kansas City, KS, United States
| | | | - Piotr Sliz
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
| | - Andrew M South
- Section of Nephrology, Department of Pediatrics, Brenner Children's Hospital, Wake Forest School of Medicine, Winston Salem, NC, United States
| | - Amelia Li Min Tan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.,Department of Biomedical Informatics, National University of Singapore, Singapore, Singapore
| | - Deanne M Taylor
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, PA, United States.,Department of Pediatrics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA, United States
| | - Bradley W Taylor
- Clinical and Translational Science Institute, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Carlo Torti
- Department of Medical and Surgical Sciences, University Magna Graecia of Catanzaro, Catanzaro, Italy
| | - Andrew K Vallejos
- Clinical and Translational Science Institute, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Kavishwar B Wagholikar
- Department of Medicine, Harvard Medical School, Boston, MA, United States.,Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA, United States
| | | | - Griffin M Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Tianxi Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
11
|
Feygin SA, Lazarus JR, Forscher EH, Golfier-Vetterli V, Lee JW, Gupta A, Waraich RA, Sheppard CJR, Bayen AM. BISTRO. ACM T INTEL SYST TEC 2020. [DOI: 10.1145/3384344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
The current trend toward urbanization and adoption of flexible and innovative mobility technologies will have complex and difficult-to-predict effects on urban transportation systems. Comprehensive methodological frameworks that account for the increasingly uncertain future state of the urban mobility landscape do not yet exist. Furthermore, few approaches have enabled the massive ingestion of urban data in planning tools capable of offering the flexibility of scenario-based design.
This article introduces Berkeley Integrated System for Transportation Optimization (BISTRO), a new open source transportation planning decision support system that uses an agent-based simulation and optimization approach to anticipate and develop adaptive plans for possible technological disruptions and growth scenarios. The new framework was evaluated in the context of a machine learning competition hosted within Uber Technologies, Inc., in which over 400 engineers and data scientists participated. For the purposes of this competition, a benchmark model, based on the city of Sioux Falls, South Dakota, was adapted to the BISTRO framework. An important finding of this study was that in spite of rigorous analysis and testing done prior to the competition, the two top-scoring teams discovered an unbounded region of the search space, rendering the solutions largely uninterpretable for the purposes of decision-support. On the other hand, a follow-on study aimed to fix the objective function. It served to demonstrate BISTRO’s utility as a human-in-the-loop cyberphysical system: one that uses scenario-based optimization algorithms as a feedback mechanism to assist urban planners with iteratively refining objective function and constraints specification on intervention strategies. The portfolio of transportation intervention strategy alternatives eventually chosen achieves high-level regional planning goals developed through participatory stakeholder engagement practices.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Alexandre M. Bayen
- Electrical Engineering and Computer Science, Berkeley; Institute of Transportation Studies
| |
Collapse
|
12
|
Zawadzki NK, Hay JW. Characterizing the Validity and Real-World Utility of Health Technology Assessments in Healthcare: Future Directions Comment on "Problems and Promises of Health Technologies: The Role of Early Health Economic Modelling". Int J Health Policy Manag 2020; 9:352-355. [PMID: 32613807 PMCID: PMC7500389 DOI: 10.15171/ijhpm.2019.132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 11/30/2019] [Indexed: 12/04/2022] Open
Abstract
With their article, Grutters et al raise an important question: What do successful health technology assessments (HTAs) look like, and what is their real-world utility in decision-making? While many HTAs are published in peer-reviewed journals, many are considered proprietary and their attributes remain confidential, limiting researchers’ ability to answer these questions. Models for economic evaluations like cost-effectiveness analyses (CEAs) synthesize a wide range of evidence, are often statistically and mathematically sophisticated, and require untestable assumptions. As such, there is nearly universal agreement among researchers that enhancing transparency is an important issue in health economic modeling. However, the definition of transparency and guidelines for its implementation vary. Model registration combined with a linked database of model-based economic evaluations has been proposed as a solution, whereby registered models and their accompanying technical and nontechnical documentation are sourced into a single publicly-available repository, ideally in a standardized format to ensure consistent and complete representation of features, code, data sources, results, validation exercises, and policy recommendations. When such a repository is ultimately created, modelers will not have to reinvent the wheel for every new drug launched or new treatment pathway. These more open and transparent approaches will have substantial implications for model accuracy, reliability, and validity, improving trust and acceptance by healthcare decision-makers.
Collapse
Affiliation(s)
- Nadine K Zawadzki
- Schaeffer Center for Health Policy and Economics, Department of Pharmaceutical and Health Economics, School of Pharmacy, University of Southern California, Los Angeles, CA, USA
| | - Joel W Hay
- USC Clinical Economics Research and Education Program (CEREP), Los Angeles, CA, USA.,Schaeffer Center for Health Policy and Economics, Department of Pharmaceutical and Health Economics, School of Pharmacy, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
13
|
Desai B, Mattingly TJ, van den Broek RWM, Pham N, Frailer M, Yang J, Perfetto EM. Peer Review and Transparency in Evidence-Source Selection in Value and Health Technology Assessment. VALUE IN HEALTH : THE JOURNAL OF THE INTERNATIONAL SOCIETY FOR PHARMACOECONOMICS AND OUTCOMES RESEARCH 2020; 23:689-696. [PMID: 32540225 DOI: 10.1016/j.jval.2020.01.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2019] [Revised: 01/02/2020] [Accepted: 01/20/2020] [Indexed: 06/11/2023]
Abstract
OBJECTIVES Value and health technology assessment (V/HTA) is often used in clinical, access, and reimbursement decisions. V/HTA data-source selection may not be transparent, which is a necessary element for stakeholder understanding and trust and for fostering accountability among decision makers. Peer review is considered one mechanism for judging data trustworthiness. Our objective was (1) to use publicly available documentation of V/HTA methods to identify requirements for inclusion of peer-reviewed evidence sources, (2) to compare and contrast US and non-US approaches, and (3) to assess evidence sources used in published V/HTA reports. METHODS Publicly available methods documentation from 11 V/HTA organizations in North America and Europe were manually searched and abstracted for descriptions of requirements and recommendations regarding search strategy and evidence-source selection. The bibliographies of a subset of V/HTA reports published in 2018 were manually abstracted for evidence-source types used in each. RESULTS Heterogeneity in evidence-source retrieval and selection was observed across all V/HTA organizations, with more pronounced differences between US and non-US organizations. Not all documentation of organizations' methods address the evidence-source selection processes (7 of 11), and few explicitly reference peer-reviewed sources (3 of 11). Documentation of the evidence-source selection strategy was inconsistent across reports (6 of 13), and the level of detail provided varied across organizations. Some information on evidence-source selection was often included in confidential documentation and was not publicly available. CONCLUSIONS Disparities exist among V/HTA organizations in requirements and guidance regarding evidence-source selection. Standardization of evidence-source selection strategies and documentation could help improve V/HTA transparency and has implications for decision making based on report findings.
Collapse
Affiliation(s)
- Bansri Desai
- University of Maryland, School of Pharmacy, Baltimore, MD, USA.
| | | | | | - Ngan Pham
- University of Maryland, School of Pharmacy, Baltimore, MD, USA
| | - Megan Frailer
- University of Maryland, School of Pharmacy, Baltimore, MD, USA
| | - Joseph Yang
- University of Maryland, School of Pharmacy, Baltimore, MD, USA
| | - Eleanor M Perfetto
- University of Maryland, School of Pharmacy, Baltimore, MD, USA; National Health Council, Washington, DC, USA
| |
Collapse
|
14
|
Affiliation(s)
- Stephan Druskat
- German Aerospace Center (DLR)Humboldt-Universität zu Berlin, Friedrich Schiller University Jena
| |
Collapse
|
15
|
Elmenreich W, Moll P, Theuermann S, Lux M. Making simulation results reproducible-Survey, guidelines, and examples based on Gradle and Docker. PeerJ Comput Sci 2019; 5:e240. [PMID: 33816893 PMCID: PMC7924710 DOI: 10.7717/peerj-cs.240] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2019] [Accepted: 11/01/2019] [Indexed: 06/12/2023]
Abstract
This article addresses two research questions related to reproducibility within the context of research related to computer science. First, a survey on reproducibility addressed to researchers in the academic and private sectors is described and evaluated. The survey indicates a strong need for open and easily accessible results, in particular, reproducing an experiment should not require too much effort. The results of the survey are then used to formulate guidelines for making research results reproducible. In addition, this article explores four approaches based on software tools that could bring forward reproducibility in research results. After a general analysis of tools, three examples are further investigated based on actual research projects which are used to evaluate previously introduced tools. Results indicate that the evaluated tools contribute well to making simulation results reproducible but due to conflicting requirements, none of the presented solutions fulfills all intended goals perfectly.
Collapse
|
16
|
Emerson J, Bacon R, Kent A, Neumann PJ, Cohen JT. Publication of Decision Model Source Code: Attitudes of Health Economics Authors. PHARMACOECONOMICS 2019; 37:1409-1410. [PMID: 31065916 PMCID: PMC6860460 DOI: 10.1007/s40273-019-00796-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Affiliation(s)
- Joanna Emerson
- Center for the Evaluation of Value and Risk in Health, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, 02111, USA
| | - Rachel Bacon
- Center for the Evaluation of Value and Risk in Health, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, 02111, USA
| | - Alma Kent
- Center for the Evaluation of Value and Risk in Health, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, 02111, USA
| | - Peter J Neumann
- Center for the Evaluation of Value and Risk in Health, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, 02111, USA
| | - Joshua T Cohen
- Center for the Evaluation of Value and Risk in Health, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, 02111, USA.
| |
Collapse
|
17
|
McManus E, Turner D, Sach T. Can You Repeat That? Exploring the Definition of a Successful Model Replication in Health Economics. PHARMACOECONOMICS 2019; 37:1371-1381. [PMID: 31531833 DOI: 10.1007/s40273-019-00836-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
The International Society for Pharmacoeconomics and Outcomes Research (ISPOR) modelling taskforce suggests decision models should be thoroughly reported and transparent. However, the level of transparency and indeed how transparency should be assessed are yet to be defined. One way may be to attempt to replicate the model and its outputs. The ability to replicate a decision model could demonstrate adequate reporting transparency. This review aims to explore published definitions of replication success across all scientific disciplines and to consider how such a definition should be tailored for use in health economic models. A literature review was conducted to identify published definitions of a 'successful replication'. Using these as a foundation, several definitions of replication success were constructed, to be applicable to replications of economic decision models, with the associated strengths and weaknesses of such definitions discussed. A substantial body of literature discussing replicability was found; however, relatively few studies, ten, explicitly defined a successful replication. These definitions varied from subjective assessments to expecting exactly the same results to be reproduced. Whilst the definitions that have been found may help to construct a definition specific to health economics, no definition was found that completely encompassed the unique requirements for decision models. Replication is widely discussed in other scientific disciplines; however, as of yet, there is no consensus on how replicable models should be within health economics or what constitutes a successful replication. Replication studies can demonstrate how transparently a model is reported, identify potential calculation errors and inform future reporting practices. It may therefore be a useful adjunct to other transparency or quality measures.
Collapse
Affiliation(s)
- Emma McManus
- Norwich Medical School, University of East Anglia, Norwich, NR4 7TJ, UK.
| | - David Turner
- Norwich Medical School, University of East Anglia, Norwich, NR4 7TJ, UK
| | - Tracey Sach
- Norwich Medical School, University of East Anglia, Norwich, NR4 7TJ, UK
| |
Collapse
|
18
|
Abstract
Making scientific analyses reproducible, well documented, and easily shareable is crucial to maximizing their impact and ensuring that others can build on them. However, accomplishing these goals is not easy, requiring careful attention to organization, workflow, and familiarity with tools that are not a regular part of every scientist's toolbox. We have developed an R package, workflowr, to help all scientists, regardless of background, overcome these challenges. Workflowr aims to instill a particular "workflow" - a sequence of steps to be repeated and integrated into research practice - that helps make projects more reproducible and accessible.This workflow integrates four key elements: (1) version control (via Git); (2) literate programming (via R Markdown); (3) automatic checks and safeguards that improve code reproducibility; and (4) sharing code and results via a browsable website. These features exploit powerful existing tools, whose mastery would take considerable study. However, the workflowr interface is simple enough that novice users can quickly enjoy its many benefits. By simply following the workflowr "workflow", R users can create projects whose results, figures, and development history are easily accessible on a static website - thereby conveniently shareable with collaborators by sending them a URL - and accompanied by source code and reproducibility safeguards. The workflowr R package is open source and available on CRAN, with full documentation and source code available at https://github.com/jdblischak/workflowr.
Collapse
Affiliation(s)
- John D. Blischak
- Department of Human Genetics, University of Chicago, Chicago, IL, 60637, USA
| | - Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL, 60637, USA
- Research Computing Center, University of Chicago, Chicago, IL, 60637, USA
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, IL, 60637, USA
- Department of Statistics, University of Chicago, Chicago, IL, 60637, USA
| |
Collapse
|
19
|
McManus E, Turner D, Gray E, Khawar H, Okoli T, Sach T. Barriers and Facilitators to Model Replication Within Health Economics. VALUE IN HEALTH : THE JOURNAL OF THE INTERNATIONAL SOCIETY FOR PHARMACOECONOMICS AND OUTCOMES RESEARCH 2019; 22:1018-1025. [PMID: 31511178 DOI: 10.1016/j.jval.2019.04.1928] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Revised: 04/09/2019] [Accepted: 04/28/2019] [Indexed: 06/10/2023]
Abstract
BACKGROUND Model replication is important because it enables researchers to check research integrity and transparency and, potentially, to inform the model conceptualization process when developing a new or updated model. OBJECTIVE The aim of this study was to evaluate the replicability of published decision analytic models and to identify the barriers and facilitators to replication. METHODS Replication attempts of 5 published economic modeling studies were made. The replications were conducted using only publicly available information within the manuscripts and supplementary materials. The replicator attempted to reproduce the key results detailed in the paper, for example, the total cost, total outcomes, and if applicable, incremental cost-effectiveness ratio reported. Although a replication attempt was not explicitly defined as a success or failure, the replicated results were compared for percentage difference to the original results. RESULTS In conducting the replication attempts, common barriers and facilitators emerged. For most case studies, the replicator needed to make additional assumptions when recreating the model. This was often exacerbated by conflicting information being presented in the text and the tables. Across the case studies, the variation between original and replicated results ranged from -4.54% to 108.00% for costs and -3.81% to 0.40% for outcomes. CONCLUSION This study demonstrates that although models may appear to be comprehensively reported, it is often not enough to facilitate a precise replication. Further work is needed to understand how to improve model transparency and in turn increase the chances of replication, thus ensuring future usability.
Collapse
Affiliation(s)
- Emma McManus
- Norwich Medical School, University of East Anglia, Norwich, England, UK.
| | - David Turner
- Norwich Medical School, University of East Anglia, Norwich, England, UK
| | - Ewan Gray
- Division of Population Health, Health Services Research & Primary Care, The University of Manchester, Manchester, England, UK
| | - Haseeb Khawar
- Norwich Medical School, University of East Anglia, Norwich, England, UK
| | - Toochukwu Okoli
- Norwich Medical School, University of East Anglia, Norwich, England, UK
| | - Tracey Sach
- Norwich Medical School, University of East Anglia, Norwich, England, UK
| |
Collapse
|
20
|
Madduri R, Chard K, D’Arcy M, Jung SC, Rodriguez A, Sulakhe D, Deutsch E, Funk C, Heavner B, Richards M, Shannon P, Glusman G, Price N, Kesselman C, Foster I. Reproducible big data science: A case study in continuous FAIRness. PLoS One 2019; 14:e0213013. [PMID: 30973881 PMCID: PMC6459504 DOI: 10.1371/journal.pone.0213013] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 02/13/2019] [Indexed: 01/22/2023] Open
Abstract
Big biomedical data create exciting opportunities for discovery, but make it difficult to capture analyses and outputs in forms that are findable, accessible, interoperable, and reusable (FAIR). In response, we describe tools that make it easy to capture, and assign identifiers to, data and code throughout the data lifecycle. We illustrate the use of these tools via a case study involving a multi-step analysis that creates an atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data. We show how the tools automate routine but complex tasks, capture analysis algorithms in understandable and reusable forms, and harness fast networks and powerful cloud computers to process data rapidly, all without sacrificing usability or reproducibility-thus ensuring that big data are not hard-to-(re)use data. We evaluate our approach via a user study, and show that 91% of participants were able to replicate a complex analysis involving considerable data volumes.
Collapse
Affiliation(s)
- Ravi Madduri
- Globus, University of Chicago, Chicago, Illinois, United States of America
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois, United States of America
| | - Kyle Chard
- Globus, University of Chicago, Chicago, Illinois, United States of America
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois, United States of America
| | - Mike D’Arcy
- Information Sciences Institute, University of Southern California, Los Angeles, California, United States of America
| | - Segun C. Jung
- Globus, University of Chicago, Chicago, Illinois, United States of America
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois, United States of America
| | - Alexis Rodriguez
- Globus, University of Chicago, Chicago, Illinois, United States of America
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois, United States of America
| | - Dinanath Sulakhe
- Globus, University of Chicago, Chicago, Illinois, United States of America
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois, United States of America
| | - Eric Deutsch
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Cory Funk
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Ben Heavner
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, Washington, United States of America
| | - Matthew Richards
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Paul Shannon
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Gustavo Glusman
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Nathan Price
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Carl Kesselman
- Information Sciences Institute, University of Southern California, Los Angeles, California, United States of America
| | - Ian Foster
- Globus, University of Chicago, Chicago, Illinois, United States of America
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois, United States of America
- Department of Computer Science, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
21
|
Stevens SLR, Kuzak M, Martinez C, Moser A, Bleeker P, Galland M. Building a local community of practice in scientific programming for life scientists. PLoS Biol 2018; 16:e2005561. [PMID: 30485260 PMCID: PMC6287879 DOI: 10.1371/journal.pbio.2005561] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 12/10/2018] [Indexed: 11/18/2022] Open
Abstract
In this paper, we describe why and how to build a local community of practice in scientific programming for life scientists who use computers and programming in their research. A community of practice is a small group of scientists who meet regularly to help each other and promote good practices in scientific programming. While most life scientists are well trained in the laboratory to conduct experiments, good practices with (big) data sets and their analysis are often missing. We propose a model on how to build such a community of practice at a local academic institution, present two real-life examples, and introduce challenges and implemented solutions. We believe that the current data deluge that life scientists face can benefit from the implementation of these small communities. Good practices spread among experimental scientists will foster open, transparent, and sound scientific results beneficial to society.
Collapse
Affiliation(s)
- Sarah L. R. Stevens
- Department of Bacteriology, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Mateusz Kuzak
- Dutch Techcentre for Life Sciences, Utrecht, Netherlands
| | | | - Aurelia Moser
- Mozilla Foundation, Mountain View, California, United States of America
| | - Petra Bleeker
- Department of Plant Physiology, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, Netherlands
| | - Marc Galland
- Department of Plant Physiology, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, Netherlands
| |
Collapse
|
22
|
AlNoamany Y, Borghi JA. Towards computational reproducibility: researcher perspectives on the use and sharing of software. PeerJ Comput Sci 2018; 4:e163. [PMID: 33816816 PMCID: PMC7924683 DOI: 10.7717/peerj-cs.163] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 08/24/2018] [Indexed: 05/05/2023]
Abstract
Research software, which includes both source code and executables used as part of the research process, presents a significant challenge for efforts aimed at ensuring reproducibility. In order to inform such efforts, we conducted a survey to better understand the characteristics of research software as well as how it is created, used, and shared by researchers. Based on the responses of 215 participants, representing a range of research disciplines, we found that researchers create, use, and share software in a wide variety of forms for a wide variety of purposes, including data collection, data analysis, data visualization, data cleaning and organization, and automation. More participants indicated that they use open source software than commercial software. While a relatively small number of programming languages (e.g., Python, R, JavaScript, C++, MATLAB) are used by a large number, there is a long tail of languages used by relatively few. Between-group comparisons revealed that significantly more participants from computer science write source code and create executables than participants from other disciplines. Differences between researchers from computer science and other disciplines related to the knowledge of best practices of software creation and sharing were not statistically significant. While many participants indicated that they draw a distinction between the sharing and preservation of software, related practices and perceptions were often not aligned with those of the broader scholarly communications community.
Collapse
Affiliation(s)
- Yasmin AlNoamany
- University of California, Berkeley, CA, United States of America
| | - John A. Borghi
- California Digital Library, Oakland, CA, United States of America
| |
Collapse
|
23
|
Toelch U, Ostwald D. Digital open science-Teaching digital tools for reproducible and transparent research. PLoS Biol 2018; 16:e2006022. [PMID: 30048447 PMCID: PMC6095603 DOI: 10.1371/journal.pbio.2006022] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 08/16/2018] [Indexed: 11/19/2022] Open
Abstract
An important hallmark of science is the transparency and reproducibility of scientific results. Over the last few years, internet-based technologies have emerged that allow for a representation of the scientific process that goes far beyond traditional methods and analysis descriptions. Using these often freely available tools requires a suite of skills that is not necessarily part of a curriculum in the life sciences. However, funders, journals, and policy makers increasingly require researchers to ensure complete reproducibility of their methods and analyses. To close this gap, we designed an introductory course that guides students towards a reproducible science workflow. Here, we outline the course content and possible extensions, report encountered challenges, and discuss how to integrate such a course in existing curricula.
Collapse
Affiliation(s)
- Ulf Toelch
- QUEST Center for Transforming Biomedical Research, Berlin Institute of Health, Berlin, Germany
- Biological Psychology und Cognitive Neuroscience, Freie Universität Berlin, Berlin, Germany
| | - Dirk Ostwald
- Computational Cognitive Neuroscience, Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
- Center for Cognitive Neuroscience Berlin, Freie Universität Berlin, Berlin, Germany
- Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany
| |
Collapse
|
24
|
Klein O, Hardwicke TE, Aust F, Breuer J, Danielsson H, Mohr AH, IJzerman H, Nilsonne G, Vanpaemel W, Frank MC. A Practical Guide for Transparency in Psychological Science. COLLABRA-PSYCHOLOGY 2018. [DOI: 10.1525/collabra.158] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The credibility of scientific claims depends upon the transparency of the research products upon which they are based (e.g., study protocols, data, materials, and analysis scripts). As psychology navigates a period of unprecedented introspection, user-friendly tools and services that support open science have flourished. However, the plethora of decisions and choices involved can be bewildering. Here we provide a practical guide to help researchers navigate the process of preparing and sharing the products of their research (e.g., choosing a repository, preparing their research products for sharing, structuring folders, etc.). Being an open scientist means adopting a few straightforward research management practices, which lead to less error prone, reproducible research workflows. Further, this adoption can be piecemeal – each incremental step towards complete transparency adds positive value. Transparent research practices not only improve the efficiency of individual researchers, they enhance the credibility of the knowledge generated by the scientific community.
Collapse
Affiliation(s)
| | | | | | | | - Henrik Danielsson
- Linköping University, SE
- The Swedish Institute for Disability Research, SE
| | | | | | - Gustav Nilsonne
- Stanford University, US
- Karolinska Institutet and Stockholm University, SE
| | | | | |
Collapse
|
25
|
McQueen RB, Padula WV, Campbell JD. A Call for Open-Source Cost-Effectiveness Analysis. Ann Intern Med 2018; 168:528-529. [PMID: 29610902 DOI: 10.7326/l17-0694] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Affiliation(s)
- R Brett McQueen
- University of Colorado Skaggs School of Pharmacy, Aurora, Colorado (R.B.M., J.D.C.)
| | - William V Padula
- Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland (W.V.P.)
| | - Jonathan D Campbell
- University of Colorado Skaggs School of Pharmacy, Aurora, Colorado (R.B.M., J.D.C.)
| |
Collapse
|
26
|
Lipo CP, DiNapoli RJ, Hunt TL. Commentary: Rain, Sun, Soil, and Sweat: A Consideration of Population Limits on Rapa Nui (Easter Island) before European Contact. Front Ecol Evol 2018. [DOI: 10.3389/fevo.2018.00025] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
27
|
Schlagbauer B, Rausch M, Zehetleitner M, Müller HJ, Geyer T. Contextual cueing of visual search is associated with greater subjective experience of the search display configuration. Neurosci Conscious 2018; 2018:niy001. [PMID: 30042854 PMCID: PMC6007139 DOI: 10.1093/nc/niy001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Revised: 12/01/2017] [Accepted: 01/30/2018] [Indexed: 11/14/2022] Open
Abstract
Visual search is facilitated when display configurations are repeated over time, showing that memory of spatio-configural context can cue the location of the target. The present study investigates whether memory of the search target in relation to the configuration of distractors alters subjective experience of the visual search target and/or the subjective experience of the display configuration. Observers performed a masked localization task for targets embedded in repeated vs. non-repeated (baseline) arrays of distractors items. After the localization response, observers reported their subjective experience of either the target or the display configuration. Bayesian analysis revealed that repeated displays resulted in a stronger visual experience of both targets and display configurations. However, subsequent analysis showed that repeated search displays increased the correlation between the experience of the display configuration and localization accuracy, but there was no such effect on experience of the target stimulus. We suggest that memory of visual context enhances the representation of the current visual search display. This representation improves visual search and at the same time increases observers' subjective experience of the display configuration.
Collapse
Affiliation(s)
- Bernhard Schlagbauer
- Department Psychologie, Ludwig-Maximilians-Universität München, Munich, Germany
- Graduate School of Systemic Neurosciences, Groβhaderner Str. 2, 82152 Planegg-Martinsried, Germany
| | - Manuel Rausch
- Department Psychologie, Ludwig-Maximilians-Universität München, Munich, Germany
- Graduate School of Systemic Neurosciences, Groβhaderner Str. 2, 82152 Planegg-Martinsried, Germany
- Fakultät für Psychologie und Pädagogik, Fachgebiet Psychologie II, Katholische Universität Eichstätt-Ingolstadt, Ostenstraβe 25, 85072 Eichstätt, Germany
| | - Michael Zehetleitner
- Department Psychologie, Ludwig-Maximilians-Universität München, Munich, Germany
- Fakultät für Psychologie und Pädagogik, Fachgebiet Psychologie II, Katholische Universität Eichstätt-Ingolstadt, Ostenstraβe 25, 85072 Eichstätt, Germany
| | - Hermann J Müller
- Department Psychologie, Ludwig-Maximilians-Universität München, Munich, Germany
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, UK
| | - Thomas Geyer
- Department Psychologie, Ludwig-Maximilians-Universität München, Munich, Germany
| |
Collapse
|
28
|
Smith AM, Niemeyer KE, Katz DS, Barba LA, Githinji G, Gymrek M, Huff KD, Madan CR, Cabunoc Mayes A, Moerman KM, Prins P, Ram K, Rokem A, Teal TK, Valls Guimera R, Vanderplas JT. Journal of Open Source Software (JOSS): design and first-year review. PEERJ PREPRINTS 2018; 4:e147. [PMID: 32704456 PMCID: PMC7340488 DOI: 10.7717/peerj-cs.147] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Accepted: 01/24/2018] [Indexed: 06/01/2023]
Abstract
This article describes the motivation, design, and progress of the Journal of Open Source Software (JOSS). JOSS is a free and open-access journal that publishes articles describing research software. It has the dual goals of improving the quality of the software submitted and providing a mechanism for research software developers to receive credit. While designed to work within the current merit system of science, JOSS addresses the dearth of rewards for key contributions to science made in the form of software. JOSS publishes articles that encapsulate scholarship contained in the software itself, and its rigorous peer review targets the software components: functionality, documentation, tests, continuous integration, and the license. A JOSS article contains an abstract describing the purpose and functionality of the software, references, and a link to the software archive. The article is the entry point of a JOSS submission, which encompasses the full set of software artifacts. Submission and review proceed in the open, on GitHub. Editors, reviewers, and authors work collaboratively and openly. Unlike other journals, JOSS does not reject articles requiring major revision; while not yet accepted, articles remain visible and under review until the authors make adequate changes (or withdraw, if unable to meet requirements). Once an article is accepted, JOSS gives it a digital object identifier (DOI), deposits its metadata in Crossref, and the article can begin collecting citations on indexers like Google Scholar and other services. Authors retain copyright of their JOSS article, releasing it under a Creative Commons Attribution 4.0 International License. In its first year, starting in May 2016, JOSS published 111 articles, with more than 40 additional articles under review. JOSS is a sponsored project of the nonprofit organization NumFOCUS and is an affiliate of the Open Source Initiative (OSI).
Collapse
Affiliation(s)
- Arfon M. Smith
- Data Science Mission Office, Space Telescope Science Institute, Baltimore, MD, United States of America
| | - Kyle E. Niemeyer
- School of Mechanical, Industrial, and Manufacturing Engineering, Oregon State University, Corvallis, OR, United States of America
| | - Daniel S. Katz
- National Center for Supercomputing Applications & Department of Computer Science & Department of Electrical and Computer Engineering & School of Information Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| | - Lorena A. Barba
- Department of Mechanical & Aerospace Engineering, The George Washington University, Washington, D.C., United States of America
| | | | - Melissa Gymrek
- Departments of Medicine & Computer Science and Engineering, University of California, San Diego, La Jolla, CA, United States of America
| | - Kathryn D. Huff
- Department of Nuclear, Plasma, and Radiological Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| | | | | | - Kevin M. Moerman
- MIT Media Lab, Massachusetts Institute of Technology, Cambridge, MA, United States of America
- Trinity Centre for Bioengineering, Trinity College, The University of Dublin, Dublin, Ireland
| | - Pjotr Prins
- University of Tennessee Health Science Center, Memphis, TN, United States of America
- University Medical Centre Utrecht, Utrecht, The Netherlands
| | - Karthik Ram
- Berkeley Institute for Data Science, University of California, Berkeley, CA, United States of America
| | - Ariel Rokem
- eScience Institute, University of Washington, Seattle, WA, United States of America
| | | | - Roman Valls Guimera
- University of Melbourne Centre for Cancer Research, University of Melbourne, Melbourne, Australia
| | - Jacob T. Vanderplas
- eScience Institute, University of Washington, Seattle, WA, United States of America
| |
Collapse
|
29
|
Can Economic Model Transparency Improve Provider Interpretation of Cost-Effectiveness Analysis? A Response. Med Care 2017; 55:912-914. [PMID: 29028754 DOI: 10.1097/mlr.0000000000000811] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
To enhance the credibility and the value of health economic analyses, we argue that the computer model source code underlying these analyses should be made publicly available. Only with open publication is it possible for others to assess whether alternative assumptions, beyond those examined by the model authors, alter the model's findings. Because reproducibility is critical for scientific acceptance and because computation increasingly permeates scientific inquiry, other fields have moved toward open publication of computer models, and health economics should avoid falling behind. Making source code available shines a light on these otherwise black boxes and facilitates their complete evaluation and understandability. The preceding commentary makes 2 arguments against open publication. It claims first that open publication would undermine intellectual property rights and discourage work in this field. We respond that the impact on intellectual property would be minimal, and that open publication could even increase model value. The second argument against open publication is the possibility of model misuse. If anything, however, open publication would reduce this risk by making the model implementation completely transparent. We argue finally that open publication of models would have ancillary benefits by making the research more amenable for adaptation and innovation. Moving toward open publication will present challenges, but we believe that the benefits of increased scientific credibility and utility, particularly for health policy and clinical practice decisions, will certainly outweigh the harms.
Collapse
|
30
|
Finding Resolution for the Responsible Transparency of Economic Models in Health and Medicine. Med Care 2017; 55:915-917. [PMID: 29028755 DOI: 10.1097/mlr.0000000000000813] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
The Second Panel on Cost-Effectiveness in Health and Medicine recommendations for conduct, methodological practices, and reporting of cost-effectiveness analyses has a number of questions unanswered with respect to the implementation of transparent, open source code interface for economic models. The possibility of making economic model source code could be positive and progressive for the field; however, several unintended consequences of this system should be first considered before complete implementation of this model. First, there is the concern regarding intellectual property rights that modelers have to their analyses. Second, the open source code could make analyses more accessible to inexperienced modelers, leading to inaccurate or misinterpreted results. We propose several resolutions to these concerns. The field should establish a licensing system of open source code such that the model originators maintain control of the code use and grant permissions to other investigators who wish to use it. The field should also be more forthcoming towards the teaching of cost-effectiveness analysis in medical and health services education so that providers and other professionals are familiar with economic modeling and able to conduct analyses with open source code. These types of unintended consequences need to be fully considered before the field's preparedness to move forward into an era of model transparency with open source code.
Collapse
|
31
|
Confidence in masked orientation judgments is informed by both evidence and visibility. Atten Percept Psychophys 2017; 80:134-154. [DOI: 10.3758/s13414-017-1431-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
32
|
Rocchini D, Petras V, Petrasova A, Horning N, Furtkevicova L, Neteler M, Leutner B, Wegmann M. Open data and open source for remote sensing training in ecology. ECOL INFORM 2017. [DOI: 10.1016/j.ecoinf.2017.05.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
33
|
The TimeStudio Project: An open source scientific workflow system for the behavioral and brain sciences. Behav Res Methods 2017; 48:542-52. [PMID: 26170051 PMCID: PMC4891379 DOI: 10.3758/s13428-015-0616-x] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
This article describes a new open source scientific workflow system, the TimeStudio Project, dedicated to the behavioral and brain sciences. The program is written in MATLAB and features a graphical user interface for the dynamic pipelining of computer algorithms developed as TimeStudio plugins. TimeStudio includes both a set of general plugins (for reading data files, modifying data structures, visualizing data structures, etc.) and a set of plugins specifically developed for the analysis of event-related eyetracking data as a proof of concept. It is possible to create custom plugins to integrate new or existing MATLAB code anywhere in a workflow, making TimeStudio a flexible workbench for organizing and performing a wide range of analyses. The system also features an integrated sharing and archiving tool for TimeStudio workflows, which can be used to share workflows both during the data analysis phase and after scientific publication. TimeStudio thus facilitates the reproduction and replication of scientific studies, increases the transparency of analyses, and reduces individual researchers’ analysis workload. The project website (http://timestudioproject.com) contains the latest releases of TimeStudio, together with documentation and user forums.
Collapse
|
34
|
Eglen SJ, Marwick B, Halchenko YO, Hanke M, Sufi S, Gleeson P, Silver RA, Davison AP, Lanyon L, Abrams M, Wachtler T, Willshaw DJ, Pouzat C, Poline JB. Toward standard practices for sharing computer code and programs in neuroscience. Nat Neurosci 2017; 20:770-773. [PMID: 28542156 PMCID: PMC6386137 DOI: 10.1038/nn.4550] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Computational techniques are central in many areas of neuroscience, and are relatively easy to share. This paper describes why computer programs underlying scientific publications should be shared, and lists simple steps for sharing. Together with ongoing efforts in data sharing, this should aid reproducibility of research.
Collapse
Affiliation(s)
- Stephen J. Eglen
- Cambridge Computational Biology Institute, Department of Applied Mathematics and Theoretical Physics, University of Cambridge, UK
| | - Ben Marwick
- Department of Anthropology, University of Washington, Seattle, WA 98195-3100 USA
| | - Yaroslav O. Halchenko
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH 03755 USA
| | - Michael Hanke
- Institute of Psychology II, Otto-von-Guericke-University Magdeburg, 39106 Magdeburg, Germany
- Center for Behavioral Brain Sciences, 39106 Magdeburg, Germany
| | - Shoaib Sufi
- Software Sustainability Institute, University of Manchester, UK
| | - Padraig Gleeson
- Department of Neuroscience, Physiology and Pharmacology, University College London, UK
| | - R. Angus Silver
- Department of Neuroscience, Physiology and Pharmacology, University College London, UK
| | - Andrew P. Davison
- Unité de Neurosciences, Information et Complexité, CNRS, Gif sur Yvette, France
| | - Linda Lanyon
- International Neuroinformatics Coordinating Facility, Karolinska Institutet, Stockholm, Sweden
| | - Mathew Abrams
- International Neuroinformatics Coordinating Facility, Karolinska Institutet, Stockholm, Sweden
| | - Thomas Wachtler
- Department of Biology II, Ludwig-Maximilians-Universität Muünchen, Germany
| | - David J. Willshaw
- Institute for Adaptive and Neural Computation, School of Informatics, University of Edinburgh, UK
| | - Christophe Pouzat
- MAP5 Paris-Descartes University and CNRS UMR 8145, 75006 Paris, France
| | - Jean-Baptiste Poline
- Henry H. Wheeler, Jr. Brain Imaging Center, Helen Wills Neuroscience Institute, University of California, Berkeley, USA
| |
Collapse
|
35
|
Rausch M, Zehetleitner M. Should metacognition be measured by logistic regression? Conscious Cogn 2017; 49:291-312. [PMID: 28236748 DOI: 10.1016/j.concog.2017.02.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Revised: 02/05/2017] [Accepted: 02/06/2017] [Indexed: 11/30/2022]
Abstract
Are logistic regression slopes suitable to quantify metacognitive sensitivity, i.e. the efficiency with which subjective reports differentiate between correct and incorrect task responses? We analytically show that logistic regression slopes are independent from rating criteria in one specific model of metacognition, which assumes (i) that rating decisions are based on sensory evidence generated independently of the sensory evidence used for primary task responses and (ii) that the distributions of evidence are logistic. Given a hierarchical model of metacognition, logistic regression slopes depend on rating criteria. According to all considered models, regression slopes depend on the primary task criterion. A reanalysis of previous data revealed that massive numbers of trials are required to distinguish between hierarchical and independent models with tolerable accuracy. It is argued that researchers who wish to use logistic regression as measure of metacognitive sensitivity need to control the primary task criterion and rating criteria.
Collapse
Affiliation(s)
- Manuel Rausch
- Katholische Universität Eichstätt-Ingolstadt, Eichstätt, Germany; Ludwig-Maximilians-Universität München, Munich, Germany.
| | - Michael Zehetleitner
- Katholische Universität Eichstätt-Ingolstadt, Eichstätt, Germany; Ludwig-Maximilians-Universität München, Munich, Germany
| |
Collapse
|
36
|
Li K, Lin X, Greenberg J. Software citation, reuse and metadata considerations: An exploratory study examining LAMMPS. ACTA ACUST UNITED AC 2016. [DOI: 10.1002/pra2.2016.14505301072] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Kai Li
- College of Computing and Informatics; Drexel University; 3141 Chestnut Street Philadelphia PA 19104
| | - Xia Lin
- College of Computing and Informatics; Drexel University; 3141 Chestnut Street Philadelphia PA 19104
| | - Jane Greenberg
- College of Computing and Informatics; Drexel University; 3141 Chestnut Street Philadelphia PA 19104
| |
Collapse
|
37
|
Affiliation(s)
- David M. Nichols
- Department of Computer Science; University of Waikato; Hamilton 3240 New Zealand
| | - Michael B. Twidale
- Graduate School of Library and Information Science; University of Illinois at Urbana-Champaign; Champaign IL 61820-6211
| |
Collapse
|
38
|
Niemeyer KE, Smith AM, Katz DS. The Challenge and Promise of Software Citation for Credit, Identification, Discovery, and Reuse. ACM JOURNAL OF DATA AND INFORMATION QUALITY 2016. [DOI: 10.1145/2968452] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
39
|
Abstract
When reporting research findings, scientists document the steps they followed so that others can verify and build upon the research. When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible. Computers play a vital role in many research disciplines and present both opportunities and challenges for reproducibility. Computers can be programmed to execute analysis tasks, and those programs can be repeated and shared with others. The deterministic nature of most computer programs means that the same analysis tasks, applied to the same data, will often produce the same outputs. However, in practice, computational findings often cannot be reproduced because of complexities in how software is packaged, installed, and executed-and because of limitations associated with how scientists document analysis steps. Many tools and techniques are available to help overcome these challenges; here we describe seven such strategies. With a broad scientific audience in mind, we describe the strengths and limitations of each approach, as well as the circumstances under which each might be applied. No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.
Collapse
Affiliation(s)
- Stephen R Piccolo
- Department of Biology, Brigham Young University, Provo, UT, 84602, USA.
| | - Michael B Frampton
- Department of Computer Science, Brigham Young University, Provo, UT, USA
| |
Collapse
|
40
|
Abstract
Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in molecular biology, biochemistry, and other biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, including molecular evolution and bioinformatics; genome-, proteome-, transcriptome- and metabolome-wide experimental studies; structural genomics; and atomistic simulations of cellular-scale molecular assemblies as large as ribosomes and intact viruses. In short, much of post-genomic biology is increasingly becoming a form of computational biology. The ability to design and write computer programs is among the most indispensable skills that a modern researcher can cultivate. Python has become a popular programming language in the biosciences, largely because (i) its straightforward semantics and clean syntax make it a readily accessible first language; (ii) it is expressive and well-suited to object-oriented programming, as well as other modern paradigms; and (iii) the many available libraries and third-party toolkits extend the functionality of the core language into virtually every biological domain (sequence and structure analyses, phylogenomics, workflow management systems, etc.). This primer offers a basic introduction to coding, via Python, and it includes concrete examples and exercises to illustrate the language’s usage and capabilities; the main text culminates with a final project in structural bioinformatics. A suite of Supplemental Chapters is also provided. Starting with basic concepts, such as that of a “variable,” the Chapters methodically advance the reader to the point of writing a graphical user interface to compute the Hamming distance between two DNA sequences. Contemporary biology has largely become computational biology, whether it involves applying physical principles to simulate the motion of each atom in a piece of DNA, or using machine learning algorithms to integrate and mine “omics” data across whole cells (or even entire ecosystems). The ability to design algorithms and program computers, even at a novice level, may be the most indispensable skill that a modern researcher can cultivate. As with human languages, computational fluency is developed actively, not passively. This self-contained text, structured as a hybrid primer/tutorial, introduces any biologist—from college freshman to established senior scientist—to basic computing principles (control-flow, recursion, regular expressions, etc.) and the practicalities of programming and software design. We use the Python language because it now pervades virtually every domain of the biosciences, from sequence-based bioinformatics and molecular evolution to phylogenomics, systems biology, structural biology, and beyond. To introduce both coding (in general) and Python (in particular), we guide the reader via concrete examples and exercises. We also supply, as Supplemental Chapters, a few thousand lines of heavily-annotated, freely distributed source code for personal study.
Collapse
Affiliation(s)
- Berk Ekmekci
- Department of Chemistry, University of Virginia, Charlottesville, Virginia, United States of America
| | - Charles E. McAnany
- Department of Chemistry, University of Virginia, Charlottesville, Virginia, United States of America
| | - Cameron Mura
- Department of Chemistry, University of Virginia, Charlottesville, Virginia, United States of America
- * E-mail:
| |
Collapse
|
41
|
Rausch M, Zehetleitner M. Visibility Is Not Equivalent to Confidence in a Low Contrast Orientation Discrimination Task. Front Psychol 2016; 7:591. [PMID: 27242566 PMCID: PMC4874366 DOI: 10.3389/fpsyg.2016.00591] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2015] [Accepted: 04/11/2016] [Indexed: 11/13/2022] Open
Abstract
In several visual tasks, participants report that they feel confident about discrimination responses at a level of stimulation at which they would report not seeing the stimulus. How general and reliable is this effect? We compared subjective reports of discrimination confidence and subjective reports of visibility in an orientation discrimination task with varying stimulus contrast. Participants applied more liberal criteria for subjective reports of discrimination confidence than for visibility. While reports of discrimination confidence were more efficient in predicting trial accuracy than reports of visibility, only reports of visibility but not confidence were associated with stimulus contrast in incorrect trials. It is argued that the distinction between discrimination confidence and visibility can be reconciled with both the partial awareness hypothesis and higher order thought theory. We suggest that consciousness research would benefit from differentiating between subjective reports of visibility and confidence.
Collapse
Affiliation(s)
- Manuel Rausch
- Psychologie II, Catholic University of Eichstätt-IngolstadtEichstätt, Germany; Graduate School of Systemic Neurosciences, Ludwig-Maximilians-Universität MünchenMunich, Germany; General and Experimental Psychology, Ludwig-Maximilians-Universtität MünchenMunich, Germany
| | - Michael Zehetleitner
- Psychologie II, Catholic University of Eichstätt-IngolstadtEichstätt, Germany; General and Experimental Psychology, Ludwig-Maximilians-Universtität MünchenMunich, Germany
| |
Collapse
|
42
|
Morin A, Sliz P. Structural biology computing: Lessons for the biomedical research sciences. Biopolymers 2016; 99:809-16. [PMID: 23828134 DOI: 10.1002/bip.22343] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2013] [Accepted: 06/26/2013] [Indexed: 12/18/2022]
Abstract
The field of structural biology, whose aim is to elucidate the molecular and atomic structures of biological macromolecules, has long been at the forefront of biomedical sciences in adopting and developing computational research methods. Operating at the intersection between biophysics, biochemistry, and molecular biology, structural biology's growth into a foundational framework on which many concepts and findings of molecular biology are interpreted1 has depended largely on parallel advancements in computational tools and techniques. Without these computing advances, modern structural biology would likely have remained an exclusive pursuit practiced by few, and not become the widely practiced, foundational field it is today. As other areas of biomedical research increasingly embrace research computing techniques, the successes, failures and lessons of structural biology computing can serve as a useful guide to progress in other biomedically related research fields.
Collapse
Affiliation(s)
- Andrew Morin
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, 02115
| | | |
Collapse
|
43
|
Socias SM, Morin A, Timony MA, Sliz P. AppCiter: A Web Application for Increasing Rates and Accuracy of Scientific Software Citation. Structure 2016; 23:807-808. [PMID: 25955101 DOI: 10.1016/j.str.2015.04.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Revised: 03/13/2015] [Accepted: 03/14/2015] [Indexed: 11/15/2022]
Affiliation(s)
- Stephanie M Socias
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Andrew Morin
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Michael A Timony
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Piotr Sliz
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
44
|
Integrating Free and Open Source Solutions into Geospatial Science Education. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2015. [DOI: 10.3390/ijgi4020942] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
45
|
Lawrence TJ, Kauffman KT, Amrine KCH, Carper DL, Lee RS, Becich PJ, Canales CJ, Ardell DH. FAST: FAST Analysis of Sequences Toolbox. Front Genet 2015; 6:172. [PMID: 26042145 PMCID: PMC4437040 DOI: 10.3389/fgene.2015.00172] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Accepted: 04/20/2015] [Indexed: 11/13/2022] Open
Abstract
FAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R, and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics make FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought.
Collapse
Affiliation(s)
- Travis J Lawrence
- Quantitative and Systems Biology Program, University of California, Merced Merced, CA, USA
| | - Kyle T Kauffman
- Molecular Cell Biology Unit, School of Natural Sciences, University of California, Merced Merced, CA, USA
| | - Katherine C H Amrine
- Quantitative and Systems Biology Program, University of California, Merced Merced, CA, USA ; Department of Viticulture and Enology, University of California, Davis Davis, CA, USA
| | - Dana L Carper
- Quantitative and Systems Biology Program, University of California, Merced Merced, CA, USA
| | - Raymond S Lee
- School of Engineering, University of California, Merced Merced, CA, USA
| | - Peter J Becich
- Molecular Cell Biology Unit, School of Natural Sciences, University of California, Merced Merced, CA, USA
| | - Claudia J Canales
- School of Engineering, University of California, Merced Merced, CA, USA
| | - David H Ardell
- Quantitative and Systems Biology Program, University of California, Merced Merced, CA, USA ; Molecular Cell Biology Unit, School of Natural Sciences, University of California, Merced Merced, CA, USA
| |
Collapse
|
46
|
Zehetleitner M, Ratko-Dehnert E, Müller HJ. Modeling violations of the race model inequality in bimodal paradigms: co-activation from decision and non-decision components. Front Hum Neurosci 2015; 9:119. [PMID: 25805987 PMCID: PMC4353255 DOI: 10.3389/fnhum.2015.00119] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2014] [Accepted: 02/17/2015] [Indexed: 11/13/2022] Open
Abstract
The redundant-signals paradigm (RSP) is designed to investigate response behavior in perceptual tasks in which response-relevant targets are defined by either one or two features, or modalities. The common finding is that responses are speeded for redundantly compared to singly defined targets. This redundant-signals effect (RSE) can be accounted for by race models if the response times do not violate the race model inequality (RMI). When there are violations of the RMI, race models are effectively excluded as a viable account of the RSE. The common alternative is provided by co-activation accounts, which assume that redundant target signals are integrated at some processing stage. However, "co-activation" has mostly been only indirectly inferred and the accounts have only rarely been explicitly modeled; if they were modeled, the RSE has typically been assumed to have a decisional locus. Yet, there are also indications in the literature that the RSE might originate, at least in part, at a non-decisional or motor stage. In the present study, using a distribution analysis of sequential-sampling models (ex-Wald and Ratcliff Diffusion model), the locus of the RSE was investigated for two bimodal (audio-visual) detection tasks that strongly violated the RMI, indicative of substantial co-activation. Three model variants assuming different loci of the RSE were fitted to the quantile reaction time proportions: a decision, a non-decision, and a combined variant both to vincentized group as well as individual data. The results suggest that for the two bimodal detection tasks, co-activation has a shared decisional and non-decisional locus. These findings point to the possibility that the mechanisms underlying the RSE depend on the specifics (task, stimulus, conditions, etc.) of the experimental paradigm.
Collapse
Affiliation(s)
- Michael Zehetleitner
- Department Psychologie, Institut für Allgemeine und Experimentelle Psychologie, Ludwig-Maximilians-Universität München Munich, Germany
| | - Emil Ratko-Dehnert
- Department Psychologie, Institut für Allgemeine und Experimentelle Psychologie, Ludwig-Maximilians-Universität München Munich, Germany
| | - Hermann J Müller
- Department Psychologie, Institut für Allgemeine und Experimentelle Psychologie, Ludwig-Maximilians-Universität München Munich, Germany ; Department of Psychological Sciences, Birkbeck College, University of London London, UK
| |
Collapse
|
47
|
Blankenberg D, Taylor J, Nekrutenko A. Online resources for genomic analysis using high-throughput sequencing. Cold Spring Harb Protoc 2015; 2015:324-35. [PMID: 25655493 DOI: 10.1101/pdb.top083667] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The availability of high-throughput sequencing has created enormous possibilities for scientific discovery. However, the massive amount of data being generated has resulted in a severe informatics bottleneck. A large number of tools exist for analyzing next-generation sequencing (NGS) data, yet often there remains a disconnect between these research tools and the ability of many researchers to use them. As a consequence, several online resources and communities have been developed to assist researchers with both the management and the analysis of sequencing data sets. Here we describe the use and applications of common file formats for coding and storing genomic data, consider several web-accessible open-source resources for the visualization and analysis of NGS data, and provide examples of typical analyses with links to further detailed exercises.
Collapse
Affiliation(s)
- Daniel Blankenberg
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, Pennsylvania 16802
| | - James Taylor
- Departments of Biology and Computer Science, Johns Hopkins University, Baltimore, Maryland 21211
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, Pennsylvania 16802
| |
Collapse
|
48
|
Stodden V, Miguez S, Seiler J. ResearchCompendia.org: Cyberinfrastructure for Reproducibility and Collaboration in Computational Science. Comput Sci Eng 2015. [DOI: 10.1109/mcse.2015.18] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
49
|
Abstract
The opportunities for both subtle and profound errors in software and data management are boundless, yet they remain surprisingly underappreciated. Here I estimate that any reported scientific result could very well be wrong if data have passed through a computer, and that these errors may remain largely undetected. It is therefore necessary to greatly expand our efforts to validate scientific software and computed results.
Collapse
Affiliation(s)
- David A W Soergel
- Department of Computer Science, University of Massachusetts Amherst, Amherst, USA ; Current address: Google, Inc., Mountain View, CA, USA
| |
Collapse
|
50
|
Nikolaienko TY, Bulavin LA, Hovorun DM. JANPA: An open source cross-platform implementation of the Natural Population Analysis on the Java platform. COMPUT THEOR CHEM 2014. [DOI: 10.1016/j.comptc.2014.10.002] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|