1
|
Martín del Pico E, Gelpí JL, Capella-Gutierrez S. FAIRsoft-a practical implementation of FAIR principles for research software. Bioinformatics 2024; 40:btae464. [PMID: 39037960 PMCID: PMC11330317 DOI: 10.1093/bioinformatics/btae464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 05/26/2024] [Accepted: 07/20/2024] [Indexed: 07/24/2024] Open
Abstract
MOTIVATION Software plays a crucial and growing role in research. Unfortunately, the computational component in Life Sciences research is often challenging to reproduce and verify. It could be undocumented, opaque, contain unknown errors that affect the outcome, or be directly unavailable and impossible to use for others. These issues are detrimental to the overall quality of scientific research. One step to address this problem is the formulation of principles that research software in the domain should meet to ensure its quality and sustainability, resembling the FAIR (findable, accessible, interoperable, and reusable) data principles. RESULTS We present here a comprehensive series of quantitative indicators based on a pragmatic interpretation of the FAIR Principles and their implementation on OpenEBench, ELIXIR's open platform providing both support for scientific benchmarking and an active observatory of quality-related features for Life Sciences research software. The results serve to understand the current practices around research software quality-related features and provide objective indications for improving them. AVAILABILITY AND IMPLEMENTATION Software metadata, from 11 different sources, collected, integrated, and analysed in the context of this manuscript are available at https://doi.org/10.5281/zenodo.7311067. Code used for software metadata retrieval and processing is available in the following repository: https://gitlab.bsc.es/inb/elixir/software-observatory/FAIRsoft_ETL.
Collapse
Affiliation(s)
| | - Josep Lluís Gelpí
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
- Biochemistry and Molecular Biomedicine Department, University of Barcelona, 08028 Barcelona, Spain
| | | |
Collapse
|
2
|
Waltemath D, Beyan O, Crameri K, Dedié A, Gierend K, Gröber P, Inau ET, Michaelis L, Reinecke I, Sedlmayr M, Thun S, Krefting D. [FAIR health data in the national and international data space]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2024; 67:710-720. [PMID: 38750239 PMCID: PMC11166787 DOI: 10.1007/s00103-024-03884-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 04/19/2024] [Indexed: 06/12/2024]
Abstract
Health data are extremely important in today's data-driven world. Through automation, healthcare processes can be optimized, and clinical decisions can be supported. For any reuse of data, the quality, validity, and trustworthiness of data are essential, and it is the only way to guarantee that data can be reused sensibly. Specific requirements for the description and coding of reusable data are defined in the FAIR guiding principles for data stewardship. Various national research associations and infrastructure projects in the German healthcare sector have already clearly positioned themselves on the FAIR principles: both the infrastructures of the Medical Informatics Initiative and the University Medicine Network operate explicitly on the basis of the FAIR principles, as do the National Research Data Infrastructure for Personal Health Data and the German Center for Diabetes Research.To ensure that a resource complies with the FAIR principles, the degree of FAIRness should first be determined (so-called FAIR assessment), followed by the prioritization for improvement steps (so-called FAIRification). Since 2016, a set of tools and guidelines have been developed for both steps, based on the different, domain-specific interpretations of the FAIR principles.Neighboring European countries have also invested in the development of a national framework for semantic interoperability in the context of the FAIR (Findable, Accessible, Interoperable, Reusable) principles. Concepts for comprehensive data enrichment were developed to simplify data analysis, for example, in the European Health Data Space or via the Observational Health Data Sciences and Informatics network. With the support of the European Open Science Cloud, among others, structured FAIRification measures have already been taken for German health datasets.
Collapse
Affiliation(s)
- Dagmar Waltemath
- Abteilung Medizininformatik, Institut für Community Medicine, Walther-Rathenau-Straße 48, 17475, Greifswald, Deutschland.
| | - Oya Beyan
- Medizinische Fakultät und Uniklinik Köln, Institut für Biomedizininformatik, Universität zu Köln, Köln, Deutschland
| | - Katrin Crameri
- Schweizerisches Institut für Bioinformatik, Personalisierte Gesundheitsinformatik, Basel, Schweiz
| | - Angela Dedié
- Deutsches Zentrum für Diabetesforschung (DZD), Geschäftsstelle am Helmholtz Zentrum München, München, Deutschland
| | - Kerstin Gierend
- Abteilung für Biomedizinische Informatik am Zentrum für Präventivmedizin und Digitale Gesundheit (CPD), Medizinische Fakultät Mannheim der Universität Heidelberg, Mannheim, Deutschland
| | - Petra Gröber
- Datenintegrationszentrum Universitätsmedizin Rostock, Rostock, Deutschland
| | - Esther Thea Inau
- Abteilung Medizininformatik, Institut für Community Medicine, Walther-Rathenau-Straße 48, 17475, Greifswald, Deutschland
| | - Lea Michaelis
- Abteilung Medizininformatik, Institut für Community Medicine, Walther-Rathenau-Straße 48, 17475, Greifswald, Deutschland
| | - Ines Reinecke
- Datenintegrationszentrum, Zentrum für Medizinische Informatik, Universitätsklinikum Carl Gustav Carus Dresden, Dresden, Deutschland
| | - Martin Sedlmayr
- Institut für Medizinische Informatik und Biometrie, Med. Fakultät Carl Gustav Carus, TU Dresden, Dresden, Deutschland
| | - Sylvia Thun
- Berliner Institut für Gesundheitsforschung in der Charité - Universitätsmedizin Berlin, Berlin, Deutschland
| | - Dagmar Krefting
- Institut für Medizinische Informatik, Universitätsmedizin Göttingen und Deutsches Zentrum für Herz-Kreislauf-Forschung, Partner Site Göttingen, Göttingen, Deutschland
| |
Collapse
|
3
|
Katz DS, Chue Hong NP. Special issue on software citation, indexing, and discoverability. PeerJ Comput Sci 2024; 10:e1951. [PMID: 38660149 PMCID: PMC11042024 DOI: 10.7717/peerj-cs.1951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 02/29/2024] [Indexed: 04/26/2024]
Abstract
Software plays a fundamental role in research as a tool, an output, or even as an object of study. This special issue on software citation, indexing, and discoverability brings together five papers examining different aspects of how the use of software is recorded and made available to others. It describes new work on datasets that enable large-scale analysis of the evolution of software usage and citation, that presents evidence of increased citation rates when software artifacts are released, that provides guidance for registries and repositories to support software citation and findability, and that shows there are still barriers to improving and formalising software citation and publication practice. As the use of software increases further, driven by modern research methods, addressing the barriers to software citation and discoverability will encourage greater sharing and reuse of software, in turn enabling research progress.
Collapse
Affiliation(s)
- Daniel S. Katz
- National Center for Supercomputing Applications, Department of Computer Science, Department of Electrical and Computer Engineering, School of Information Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Neil P. Chue Hong
- Edinburgh Parallel Computing Centre, University of Edinburgh, Edinburgh, United Kingdom
- Software Sustainability Institute, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
4
|
Afiaz A, Ivanov AA, Chamberlin J, Hanauer D, Savonen CL, Goldman MJ, Morgan M, Reich M, Getka A, Holmes A, Pati S, Knight D, Boutros PC, Bakas S, Caporaso JG, Del Fiol G, Hochheiser H, Haas B, Schloss PD, Eddy JA, Albrecht J, Fedorov A, Waldron L, Hoffman AM, Bradshaw RL, Leek JT, Wright C. Evaluation of software impact designed for biomedical research: Are we measuring what's meaningful? ARXIV 2023:arXiv:2306.03255v1. [PMID: 37332562 PMCID: PMC10274942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Software is vital for the advancement of biology and medicine. Through analysis of usage and impact metrics of software, developers can help determine user and community engagement. These metrics can be used to justify additional funding, encourage additional use, and identify unanticipated use cases. Such analyses can help define improvement areas and assist with managing project resources. However, there are challenges associated with assessing usage and impact, many of which vary widely depending on the type of software being evaluated. These challenges involve issues of distorted, exaggerated, understated, or misleading metrics, as well as ethical and security concerns. More attention to the nuances, challenges, and considerations involved in capturing impact across the diverse spectrum of biological software is needed. Furthermore, some tools may be especially beneficial to a small audience, yet may not have comparatively compelling metrics of high usage. Although some principles are generally applicable, there is not a single perfect metric or approach to effectively evaluate a software tool's impact, as this depends on aspects unique to each tool, how it is used, and how one wishes to evaluate engagement. We propose more broadly applicable guidelines (such as infrastructure that supports the usage of software and the collection of metrics about usage), as well as strategies for various types of software and resources. We also highlight outstanding issues in the field regarding how communities measure or evaluate software impact. To gain a deeper understanding of the issues hindering software evaluations, as well as to determine what appears to be helpful, we performed a survey of participants involved with scientific software projects for the Informatics Technology for Cancer Research (ITCR) program funded by the National Cancer Institute (NCI). We also investigated software among this scientific community and others to assess how often infrastructure that supports such evaluations is implemented and how this impacts rates of papers describing usage of the software. We find that although developers recognize the utility of analyzing data related to the impact or usage of their software, they struggle to find the time or funding to support such analyses. We also find that infrastructure such as social media presence, more in-depth documentation, the presence of software health metrics, and clear information on how to contact developers seem to be associated with increased usage rates. Our findings can help scientific software developers make the most out of the evaluations of their software so that they can more fully benefit from such assessments.
Collapse
Affiliation(s)
- Awan Afiaz
- Department of Biostatistics, University of Washington, Seattle, WA
- Biostatistics Program, Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA
| | - Andrey A. Ivanov
- Department of Pharmacology and Chemical Biology, Emory University School of Medicine, Emory University, Atlanta, GA
| | - John Chamberlin
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT
| | - David Hanauer
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI
| | - Candace L. Savonen
- Biostatistics Program, Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA
| | | | - Martin Morgan
- Roswell Park Comprehensive Cancer Center, Buffalo, NY
| | | | | | - Aaron Holmes
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA
- Institute for Precision Health, University of California, Los Angeles, CA
- Department of Human Genetics, University of California, Los Angeles, CA
- Department of Urology, University of California, Los Angeles, CA
| | | | - Dan Knight
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA
- Institute for Precision Health, University of California, Los Angeles, CA
- Department of Human Genetics, University of California, Los Angeles, CA
- Department of Urology, University of California, Los Angeles, CA
| | - Paul C. Boutros
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA
- Institute for Precision Health, University of California, Los Angeles, CA
- Department of Human Genetics, University of California, Los Angeles, CA
- Department of Urology, University of California, Los Angeles, CA
| | | | - J. Gregory Caporaso
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ
| | - Guilherme Del Fiol
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT
| | - Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh,Pittsburgh, PA
| | - Brian Haas
- Methods Development Laboratory, Broad Institute, Cambridge, MA
| | - Patrick D. Schloss
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI
| | | | | | - Andrey Fedorov
- Department of Radiology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
| | - Levi Waldron
- Department of Epidemiology and Biostatistics, City University of New York Graduate School of Public Health and Health Policy, New York, NY
| | - Ava M. Hoffman
- Biostatistics Program, Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA
| | - Richard L. Bradshaw
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT
| | - Jeffrey T. Leek
- Biostatistics Program, Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA
| | - Carrie Wright
- Biostatistics Program, Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA
| |
Collapse
|
5
|
Abu-Salih B, AL-Qurishi M, Alweshah M, AL-Smadi M, Alfayez R, Saadeh H. Healthcare knowledge graph construction: A systematic review of the state-of-the-art, open issues, and opportunities. JOURNAL OF BIG DATA 2023; 10:81. [PMID: 37274445 PMCID: PMC10225120 DOI: 10.1186/s40537-023-00774-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 05/17/2023] [Indexed: 06/06/2023]
Abstract
The incorporation of data analytics in the healthcare industry has made significant progress, driven by the demand for efficient and effective big data analytics solutions. Knowledge graphs (KGs) have proven utility in this arena and are rooted in a number of healthcare applications to furnish better data representation and knowledge inference. However, in conjunction with a lack of a representative KG construction taxonomy, several existing approaches in this designated domain are inadequate and inferior. This paper is the first to provide a comprehensive taxonomy and a bird's eye view of healthcare KG construction. Additionally, a thorough examination of the current state-of-the-art techniques drawn from academic works relevant to various healthcare contexts is carried out. These techniques are critically evaluated in terms of methods used for knowledge extraction, types of the knowledge base and sources, and the incorporated evaluation protocols. Finally, several research findings and existing issues in the literature are reported and discussed, opening horizons for future research in this vibrant area.
Collapse
Affiliation(s)
| | | | | | - Mohammad AL-Smadi
- Jordan University of Science and Technology, Irbid, Jordan
- Qatar University, Doha, Qatar
| | | | | |
Collapse
|
6
|
Liu S, Zhou G, Xia Y, Wu H, Li Z. A data-centric way to improve entity linking in knowledge-based question answering. PeerJ Comput Sci 2023; 9:e1233. [PMID: 37346650 PMCID: PMC10280402 DOI: 10.7717/peerj-cs.1233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 01/11/2023] [Indexed: 06/23/2023]
Abstract
Entity linking in knowledge-based question answering (KBQA) is intended to construct a mapping relation between a mention in a natural language question and an entity in the knowledge base. Most research in entity linking focuses on long text, but entity linking in open domain KBQA is more concerned with short text. Many recent models have tried to extract the features of raw data by adjusting the neural network structure. However, the models only perform well with several datasets. We therefore concentrate on the data rather than the model itself and created a model DME (Domain information Mining and Explicit expressing) to extract domain information from short text and append it to the data. The entity linking model will be enhanced by training with DME-processed data. Besides, we also developed a novel negative sampling approach to make the model more robust. We conducted experiments using the large Chinese open source benchmark KgCLUE to assess model performance with DME-processed data. The experiments showed that our approach can improve entity linking in the baseline models without the need to change their structure and our approach is demonstrably transferable to other datasets.
Collapse
Affiliation(s)
- Shuo Liu
- Information Engineering University, Zhengzhou, Henan, China
| | - Gang Zhou
- Information Engineering University, Zhengzhou, Henan, China
| | - Yi Xia
- Information Engineering University, Zhengzhou, Henan, China
| | - Hao Wu
- Information Engineering University, Zhengzhou, Henan, China
| | - Zhufeng Li
- Information Engineering University, Zhengzhou, Henan, China
| |
Collapse
|
7
|
Hutson M. Hunting for the best bioscience software tool? Check this database. Nature 2023:10.1038/d41586-023-00053-w. [PMID: 36635507 DOI: 10.1038/d41586-023-00053-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
8
|
Du C, Cohoon J, Lopez P, Howison J. Understanding progress in software citation: a study of software citation in the CORD-19 corpus. PeerJ Comput Sci 2022; 8:e1022. [PMID: 36091992 PMCID: PMC9454791 DOI: 10.7717/peerj-cs.1022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 06/07/2022] [Indexed: 06/15/2023]
Abstract
In this paper, we investigate progress toward improved software citation by examining current software citation practices. We first introduce our machine learning based data pipeline that extracts software mentions from the CORD-19 corpus, a regularly updated collection of more than 280,000 scholarly articles on COVID-19 and related historical coronaviruses. We then closely examine a stratified sample of extracted software mentions from recent CORD-19 publications to understand the status of software citation. We also searched online for the mentioned software projects and their citation requests. We evaluate both practices of referencing software in publications and making software citable in comparison with earlier findings and recent advocacy recommendations. We found increased mentions of software versions, increased open source practices, and improved software accessibility. Yet, we also found a continuation of high numbers of informal mentions that did not sufficiently credit software authors. Existing software citation requests were diverse but did not match with software citation advocacy recommendations nor were they frequently followed by researchers authoring papers. Finally, we discuss implications for software citation advocacy and standard making efforts seeking to improve the situation. Our results show the diversity of software citation practices and how they differ from advocacy recommendations, provide a baseline for assessing the progress of software citation implementation, and enrich the understanding of existing challenges.
Collapse
Affiliation(s)
- Caifan Du
- The University of Texas at Austin, Austin, TX, United States of America
| | - Johanna Cohoon
- The University of Texas at Austin, Austin, TX, United States of America
| | | | - James Howison
- The University of Texas at Austin, Austin, TX, United States of America
| |
Collapse
|