1
|
Carver JC, Weber N, Ram K, Gesing S, Katz DS. A survey of the state of the practice for research software in the United States. PeerJ Comput Sci 2022; 8:e963. [PMID: 35634111 PMCID: PMC9138129 DOI: 10.7717/peerj-cs.963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Accepted: 04/06/2022] [Indexed: 06/15/2023]
Abstract
Research software is a critical component of contemporary scholarship. Yet, most research software is developed and managed in ways that are at odds with its long-term sustainability. This paper presents findings from a survey of 1,149 researchers, primarily from the United States, about sustainability challenges they face in developing and using research software. Some of our key findings include a repeated need for more opportunities and time for developers of research software to receive training. These training needs cross the software lifecycle and various types of tools. We also identified the recurring need for better models of funding research software and for providing credit to those who develop the software so they can advance in their careers. The results of this survey will help inform future infrastructure and service support for software developers and users, as well as national research policy aimed at increasing the sustainability of research software.
Collapse
Affiliation(s)
- Jeffrey C. Carver
- Computer Science, University of Alabama, Tuscaloosa, AL, United States of America
| | - Nic Weber
- Information School, University of Washington, Seattle, WA, United States of America
| | - Karthik Ram
- Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, CA, United States of America
| | - Sandra Gesing
- Discovery Partners Institute, Chicago, IL, United States of America
| | - Daniel S. Katz
- NCSA & CS & ECE & iSchool, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| |
Collapse
|
2
|
Hunter-Zinck H, de Siqueira AF, Vásquez VN, Barnes R, Martinez CC. Ten simple rules on writing clean and reliable open-source scientific software. PLoS Comput Biol 2021; 17:e1009481. [PMID: 34762641 PMCID: PMC8584773 DOI: 10.1371/journal.pcbi.1009481] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Functional, usable, and maintainable open-source software is increasingly essential to scientific research, but there is a large variation in formal training for software development and maintainability. Here, we propose 10 “rules” centered on 2 best practice components: clean code and testing. These 2 areas are relatively straightforward and provide substantial utility relative to the learning investment. Adopting clean code practices helps to standardize and organize software code in order to enhance readability and reduce cognitive load for both the initial developer and subsequent contributors; this allows developers to concentrate on core functionality and reduce errors. Clean coding styles make software code more amenable to testing, including unit tests that work best with modular and consistent software code. Unit tests interrogate specific and isolated coding behavior to reduce coding errors and ensure intended functionality, especially as code increases in complexity; unit tests also implicitly provide example usages of code. Other forms of testing are geared to discover erroneous behavior arising from unexpected inputs or emerging from the interaction of complex codebases. Although conforming to coding styles and designing tests can add time to the software development project in the short term, these foundational tools can help to improve the correctness, quality, usability, and maintainability of open-source scientific software code. They also advance the principal point of scientific research: producing accurate results in a reproducible way. In addition to suggesting several tips for getting started with clean code and testing practices, we recommend numerous tools for the popular open-source scientific software languages Python, R, and Julia.
Collapse
Affiliation(s)
- Haley Hunter-Zinck
- Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, California, United States of America
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California, United States of America
- VA Boston Healthcare System, Boston, Massachusetts, United States of America
- VA St. Louis Health Care System, St. Louis, Missouri, United States of America
| | | | - Váleri N. Vásquez
- Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, California, United States of America
- Energy and Resources Group, Rausser College of Natural Resources, University of California, Berkeley, Berkeley, California, United States of America
| | - Richard Barnes
- Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, California, United States of America
- Energy and Resources Group, Rausser College of Natural Resources, University of California, Berkeley, Berkeley, California, United States of America
| | - Ciera C. Martinez
- Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, California, United States of America
- * E-mail:
| |
Collapse
|
3
|
In-code citation practices in open research software libraries. J Informetr 2021. [DOI: 10.1016/j.joi.2021.101139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
4
|
Samuel S, König-Ries B. Understanding experiments and research practices for reproducibility: an exploratory study. PeerJ 2021; 9:e11140. [PMID: 33976964 PMCID: PMC8067906 DOI: 10.7717/peerj.11140] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 03/01/2021] [Indexed: 11/20/2022] Open
Abstract
Scientific experiments and research practices vary across disciplines. The research practices followed by scientists in each domain play an essential role in the understandability and reproducibility of results. The "Reproducibility Crisis", where researchers find difficulty in reproducing published results, is currently faced by several disciplines. To understand the underlying problem in the context of the reproducibility crisis, it is important to first know the different research practices followed in their domain and the factors that hinder reproducibility. We performed an exploratory study by conducting a survey addressed to researchers representing a range of disciplines to understand scientific experiments and research practices for reproducibility. The survey findings identify a reproducibility crisis and a strong need for sharing data, code, methods, steps, and negative and positive results. Insufficient metadata, lack of publicly available data, and incomplete information in study methods are considered to be the main reasons for poor reproducibility. The survey results also address a wide number of research questions on the reproducibility of scientific results. Based on the results of our explorative study and supported by the existing published literature, we offer general recommendations that could help the scientific community to understand, reproduce, and reuse experimental data and results in the research data lifecycle.
Collapse
Affiliation(s)
- Sheeba Samuel
- Heinz Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University Jena, Jena, Thuringia, Germany
- Michael Stifel Center Jena, Jena, Thuringia, Germany
| | - Birgitta König-Ries
- Heinz Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University Jena, Jena, Thuringia, Germany
- Michael Stifel Center Jena, Jena, Thuringia, Germany
| |
Collapse
|
5
|
Butt BH, Rafi M, Sabih M. A systematic metadata harvesting workflow for analysing scientific networks. PeerJ Comput Sci 2021; 7:e421. [PMID: 33817056 PMCID: PMC7959659 DOI: 10.7717/peerj-cs.421] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 02/09/2021] [Indexed: 06/12/2023]
Abstract
One of the disciplines behind the science of science is the study of scientific networks. This work focuses on scientific networks as a social network having different nodes and connections. Nodes can be represented by authors, articles or journals while connections by citation, co-citation or co-authorship. One of the challenges in creating scientific networks is the lack of publicly available comprehensive data set. It limits the variety of analyses on the same set of nodes of different scientific networks. To supplement such analyses we have worked on publicly available citation metadata from Crossref and OpenCitatons. Using this data a workflow is developed to create scientific networks. Analysis of these networks gives insights into academic research and scholarship. Different techniques of social network analysis have been applied in the literature to study these networks. It includes centrality analysis, community detection, and clustering coefficient. We have used metadata of Scientometrics journal, as a case study, to present our workflow. We did a sample run of the proposed workflow to identify prominent authors using centrality analysis. This work is not a bibliometric study of any field rather it presents replicable Python scripts to perform network analysis. With an increase in the popularity of open access and open metadata, we hypothesise that this workflow shall provide an avenue for understanding scientific scholarship in multiple dimensions.
Collapse
Affiliation(s)
- Bilal H. Butt
- Department of Computer Science, D.H.A. Suffa University, Karachi, Pakistan
| | - Muhammad Rafi
- Department of Computer Science, National University of Computer and Emerging Sciences, Karachi, Pakistan
| | - Muhammad Sabih
- Department of Electrical Engineering, D.H.A. Suffa University, Karachi, Pakistan
| |
Collapse
|
6
|
Cadwallader L, Papin JA, Mac Gabhann F, Kirk R. Collaborating with our community to increase code sharing. PLoS Comput Biol 2021; 17:e1008867. [PMID: 33784294 PMCID: PMC8009435 DOI: 10.1371/journal.pcbi.1008867] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
| | - Jason A. Papin
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, United States of America
| | - Feilim Mac Gabhann
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland, United States of America
| | | |
Collapse
|
7
|
Abstract
With increasing demand for training in data science, extracurricular or "ad hoc" education efforts have emerged to help individuals acquire relevant skills and expertise. Although extracurricular efforts already exist for many computationally intensive disciplines, their support of data science education has significantly helped in coping with the speed of innovation in data science practice and formal curricula. While the proliferation of ad hoc efforts is an indication of their popularity, less has been documented about the needs that they are designed to meet, the limitations that they face, and practical suggestions for holding successful efforts. To holistically understand the role of different ad hoc formats for data science, we surveyed organizers of ad hoc data science education efforts to understand how organizers perceived the events to have gone-including areas of strength and areas requiring growth. We also gathered recommendations from these past events for future organizers. Our results suggest that the perceived benefits of ad hoc efforts go beyond developing technical skills and may provide continued benefit in conjunction with formal curricula, which warrants further investigation. As increasing numbers of researchers from computational fields with a history of complex data become involved with ad hoc efforts to share their skills, the lessons learned that we extract from the surveys will provide concrete suggestions for the practitioner-leaders interested in creating, improving, and sustaining future efforts.
Collapse
Affiliation(s)
- Orianna DeMasi
- Department of Computer Science, University of California, Davis, California, United States of America
| | - Alexandra Paxton
- Department of Psychological Sciences, University of Connecticut, Storrs, Connecticut, United States of America
- Center for the Ecological Study of Perception and Action, University of Connecticut, Storrs, Connecticut, United States of America
| | - Kevin Koy
- IDEO, San Francisco, California, United States of America
| |
Collapse
|
8
|
Affiliation(s)
- Stephan Druskat
- German Aerospace Center (DLR)Humboldt-Universität zu Berlin, Friedrich Schiller University Jena
| |
Collapse
|
9
|
Pecht T, Aschenbrenner AC, Ulas T, Succurro A. Modeling population heterogeneity from microbial communities to immune response in cells. Cell Mol Life Sci 2020; 77:415-432. [PMID: 31768606 PMCID: PMC7010691 DOI: 10.1007/s00018-019-03378-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 11/05/2019] [Accepted: 11/12/2019] [Indexed: 12/14/2022]
Abstract
Heterogeneity is universally observed in all natural systems and across multiple scales. Understanding population heterogeneity is an intriguing and attractive topic of research in different disciplines, including microbiology and immunology. Microbes and mammalian immune cells present obviously rather different system-specific biological features. Nevertheless, as typically occurs in science, similar methods can be used to study both types of cells. This is particularly true for mathematical modeling, in which key features of a system are translated into algorithms to challenge our mechanistic understanding of the underlying biology. In this review, we first present a broad overview of the experimental developments that allowed observing heterogeneity at the single cell level. We then highlight how this "data revolution" requires the parallel advancement of algorithms and computing infrastructure for data processing and analysis, and finally present representative examples of computational models of population heterogeneity, from microbial communities to immune response in cells.
Collapse
Affiliation(s)
- Tal Pecht
- Genomics and Immunoregulation, Life and Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Anna C Aschenbrenner
- Genomics and Immunoregulation, Life and Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
- Department of Internal Medicine and Radboud Center for Infectious Diseases (RCI), Radboud University Medical Center, 6525, Nijmegen, The Netherlands
| | - Thomas Ulas
- Genomics and Immunoregulation, Life and Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Antonella Succurro
- Genomics and Immunoregulation, Life and Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany.
- West German Genome Center (WGGC), University of Bonn, Bonn, Germany.
| |
Collapse
|