1
|
Alser M, Lawlor B, Abdill RJ, Waymost S, Ayyala R, Rajkumar N, LaPierre N, Brito J, Ribeiro-Dos-Santos AM, Almadhoun N, Sarwal V, Firtina C, Osinski T, Eskin E, Hu Q, Strong D, Kim BDBD, Abedalthagafi MS, Mutlu O, Mangul S. Packaging and containerization of computational methods. Nat Protoc 2024:10.1038/s41596-024-00986-0. [PMID: 38565959 DOI: 10.1038/s41596-024-00986-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 02/12/2024] [Indexed: 04/04/2024]
Abstract
Methods for analyzing the full complement of a biomolecule type, e.g., proteomics or metabolomics, generate large amounts of complex data. The software tools used to analyze omics data have reshaped the landscape of modern biology and become an essential component of biomedical research. These tools are themselves quite complex and often require the installation of other supporting software, libraries and/or databases. A researcher may also be using multiple different tools that require different versions of the same supporting materials. The increasing dependence of biomedical scientists on these powerful tools creates a need for easier installation and greater usability. Packaging and containerization are different approaches to satisfy this need by delivering omics tools already wrapped in additional software that makes the tools easier to install and use. In this systematic review, we describe and compare the features of prominent packaging and containerization platforms. We outline the challenges, advantages and limitations of each approach and some of the most widely used platforms from the perspectives of users, software developers and system administrators. We also propose principles to make the distribution of omics software more sustainable and robust to increase the reproducibility of biomedical and life science research.
Collapse
Affiliation(s)
- Mohammed Alser
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Brendan Lawlor
- Department of Computer Science, Munster Technological University, Cork, Ireland
- Department of Biological Sciences, Munster Technological University, Cork, Ireland
| | - Richard J Abdill
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Sharon Waymost
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Ram Ayyala
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
- Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, USA
| | - Neha Rajkumar
- Department of Bioengineering, University of California, Los Angeles, Los Angeles, CA, USA
| | - Nathan LaPierre
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Jaqueline Brito
- Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, USA
| | | | - Nour Almadhoun
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Varuni Sarwal
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Can Firtina
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Tomasz Osinski
- Center for Advanced Research Computing, University of Southern California, Los Angeles, CA, USA
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, CA, USA
| | - Qiyang Hu
- Office of Advanced Research Computing, University of California, Los Angeles, CA, USA
| | - Derek Strong
- Center for Advanced Research Computing, University of Southern California, Los Angeles, CA, USA
| | - Byoung-Do B D Kim
- Center for Advanced Research Computing, University of Southern California, Los Angeles, CA, USA
| | - Malak S Abedalthagafi
- Department of Pathology & Laboratory Medicine, Emory University Hospital, Atlanta, GA, USA
- King Salman Center for Disability Research, Riyadh, Saudi Arabia
| | - Onur Mutlu
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Serghei Mangul
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
- Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
2
|
Lasko TA, Strobl EV, Stead WW. Why do probabilistic clinical models fail to transport between sites. NPJ Digit Med 2024; 7:53. [PMID: 38429353 PMCID: PMC10907678 DOI: 10.1038/s41746-024-01037-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 02/14/2024] [Indexed: 03/03/2024] Open
Abstract
The rising popularity of artificial intelligence in healthcare is highlighting the problem that a computational model achieving super-human clinical performance at its training sites may perform substantially worse at new sites. In this perspective, we argue that we should typically expect this failure to transport, and we present common sources for it, divided into those under the control of the experimenter and those inherent to the clinical data-generating process. Of the inherent sources we look a little deeper into site-specific clinical practices that can affect the data distribution, and propose a potential solution intended to isolate the imprint of those practices on the data from the patterns of disease cause and effect that are the usual target of probabilistic clinical models.
Collapse
Affiliation(s)
- Thomas A Lasko
- Vanderbilt University Medical Center, Nashville, TN, USA.
| | - Eric V Strobl
- Vanderbilt University Medical Center, Nashville, TN, USA
| | | |
Collapse
|
3
|
Dong H, Lin J, Tao Y, Jia Y, Sun L, Li WJ, Sun H. AI-enhanced biomedical micro/nanorobots in microfluidics. Lab Chip 2024; 24:1419-1440. [PMID: 38174821 DOI: 10.1039/d3lc00909b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Human beings encompass sophisticated microcirculation and microenvironments, incorporating a broad spectrum of microfluidic systems that adopt fundamental roles in orchestrating physiological mechanisms. In vitro recapitulation of human microenvironments based on lab-on-a-chip technology represents a critical paradigm to better understand the intricate mechanisms. Moreover, the advent of micro/nanorobotics provides brand new perspectives and dynamic tools for elucidating the complex process in microfluidics. Currently, artificial intelligence (AI) has endowed micro/nanorobots (MNRs) with unprecedented benefits, such as material synthesis, optimal design, fabrication, and swarm behavior. Using advanced AI algorithms, the motion control, environment perception, and swarm intelligence of MNRs in microfluidics are significantly enhanced. This emerging interdisciplinary research trend holds great potential to propel biomedical research to the forefront and make valuable contributions to human health. Herein, we initially introduce the AI algorithms integral to the development of MNRs. We briefly revisit the components, designs, and fabrication techniques adopted by robots in microfluidics with an emphasis on the application of AI. Then, we review the latest research pertinent to AI-enhanced MNRs, focusing on their motion control, sensing abilities, and intricate collective behavior in microfluidics. Furthermore, we spotlight biomedical domains that are already witnessing or will undergo game-changing evolution based on AI-enhanced MNRs. Finally, we identify the current challenges that hinder the practical use of the pioneering interdisciplinary technology.
Collapse
Affiliation(s)
- Hui Dong
- School of Mechanical Engineering and Automation, Fuzhou University, Fuzhou, China.
- School of Mechatronics Engineering, Harbin Institute of Technology, Harbin, China
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| | - Jiawen Lin
- School of Mechanical Engineering and Automation, Fuzhou University, Fuzhou, China.
| | - Yihui Tao
- Department of Automation Control and System Engineering, University of Sheffield, Sheffield, UK
| | - Yuan Jia
- Sino-German College of Intelligent Manufacturing, Shenzhen Technology University, Shenzhen, China
| | - Lining Sun
- School of Mechatronics Engineering, Harbin Institute of Technology, Harbin, China
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| | - Wen Jung Li
- Department of Mechanical Engineering, City University of Hong Kong, Hong Kong, China
| | - Hao Sun
- School of Mechanical Engineering and Automation, Fuzhou University, Fuzhou, China.
- School of Mechatronics Engineering, Harbin Institute of Technology, Harbin, China
- Research Center of Aerospace Mechanism and Control, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
4
|
Barberis A, Aerts HJWL, Buffa FM. Robustness and reproducibility for AI learning in biomedical sciences: RENOIR. Sci Rep 2024; 14:1933. [PMID: 38253545 PMCID: PMC10810363 DOI: 10.1038/s41598-024-51381-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 01/04/2024] [Indexed: 01/24/2024] Open
Abstract
Artificial intelligence (AI) techniques are increasingly applied across various domains, favoured by the growing acquisition and public availability of large, complex datasets. Despite this trend, AI publications often suffer from lack of reproducibility and poor generalisation of findings, undermining scientific value and contributing to global research waste. To address these issues and focusing on the learning aspect of the AI field, we present RENOIR (REpeated random sampliNg fOr machIne leaRning), a modular open-source platform for robust and reproducible machine learning (ML) analysis. RENOIR adopts standardised pipelines for model training and testing, introducing elements of novelty, such as the dependence of the performance of the algorithm on the sample size. Additionally, RENOIR offers automated generation of transparent and usable reports, aiming to enhance the quality and reproducibility of AI studies. To demonstrate the versatility of our tool, we applied it to benchmark datasets from health, computer science, and STEM (Science, Technology, Engineering, and Mathematics) domains. Furthermore, we showcase RENOIR's successful application in recently published studies, where it identified classifiers for SET2D and TP53 mutation status in cancer. Finally, we present a use case where RENOIR was employed to address a significant pharmacological challenge-predicting drug efficacy. RENOIR is freely available at https://github.com/alebarberis/renoir .
Collapse
Affiliation(s)
- Alessandro Barberis
- Nuffield Department of Surgical Sciences, Medical Sciences Division, University of Oxford, Old Road Campus Research Building, Roosevelt Drive, Oxford, OX3 7DQ, UK.
- Computational Biology and Integrative Genomics Lab, Department of Oncology, Medical Sciences Division, University of Oxford, Oxford, OX3 7DQ, UK.
| | - Hugo J W L Aerts
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Radiation Oncology and Radiology, Dana-Farber Cancer Institute, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Radiology and Nuclear Medicine, GROW & CARIM, Maastricht University, Maastricht, The Netherlands
- Cardiovascular Imaging Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Francesca M Buffa
- Computational Biology and Integrative Genomics Lab, Department of Oncology, Medical Sciences Division, University of Oxford, Oxford, OX3 7DQ, UK.
- AI and Systems Biology, IFOM ETS, 20139, Milan, Italy.
- Department of Computing Sciences and Bocconi Institute for Data Science and Analytics (BIDSA), Bocconi University, 20100, Milan, Italy.
| |
Collapse
|
5
|
Ciobanu-Caraus O, Aicher A, Kernbach JM, Regli L, Serra C, Staartjes VE. A critical moment in machine learning in medicine: on reproducible and interpretable learning. Acta Neurochir (Wien) 2024; 166:14. [PMID: 38227273 DOI: 10.1007/s00701-024-05892-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 12/14/2023] [Indexed: 01/17/2024]
Abstract
Over the past two decades, advances in computational power and data availability combined with increased accessibility to pre-trained models have led to an exponential rise in machine learning (ML) publications. While ML may have the potential to transform healthcare, this sharp increase in ML research output without focus on methodological rigor and standard reporting guidelines has fueled a reproducibility crisis. In addition, the rapidly growing complexity of these models compromises their interpretability, which currently impedes their successful and widespread clinical adoption. In medicine, where failure of such models may have severe implications for patients' health, the high requirements for accuracy, robustness, and interpretability confront ML researchers with a unique set of challenges. In this review, we discuss the semantics of reproducibility and interpretability, as well as related issues and challenges, and outline possible solutions to counteracting the "black box". To foster reproducibility, standard reporting guidelines need to be further developed and data or code sharing encouraged. Editors and reviewers may equally play a critical role by establishing high methodological standards and thus preventing the dissemination of low-quality ML publications. To foster interpretable learning, the use of simpler models more suitable for medical data can inform the clinician how results are generated based on input data. Model-agnostic explanation tools, sensitivity analysis, and hidden layer representations constitute further promising approaches to increase interpretability. Balancing model performance and interpretability are important to ensure clinical applicability. We have now reached a critical moment for ML in medicine, where addressing these issues and implementing appropriate solutions will be vital for the future evolution of the field.
Collapse
Affiliation(s)
- Olga Ciobanu-Caraus
- Machine Intelligence in Clinical Neuroscience & Microsurgical Neuroanatomy (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Anatol Aicher
- Machine Intelligence in Clinical Neuroscience & Microsurgical Neuroanatomy (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Julius M Kernbach
- Department of Neuroradiology, University Hospital Heidelberg, Heidelberg, Germany
| | - Luca Regli
- Machine Intelligence in Clinical Neuroscience & Microsurgical Neuroanatomy (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Carlo Serra
- Machine Intelligence in Clinical Neuroscience & Microsurgical Neuroanatomy (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Victor E Staartjes
- Machine Intelligence in Clinical Neuroscience & Microsurgical Neuroanatomy (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland.
| |
Collapse
|
6
|
Jacobs PG, Herrero P, Facchinetti A, Vehi J, Kovatchev B, Breton MD, Cinar A, Nikita KS, Doyle FJ, Bondia J, Battelino T, Castle JR, Zarkogianni K, Narayan R, Mosquera-Lopez C. Artificial Intelligence and Machine Learning for Improving Glycemic Control in Diabetes: Best Practices, Pitfalls, and Opportunities. IEEE Rev Biomed Eng 2024; 17:19-41. [PMID: 37943654 DOI: 10.1109/rbme.2023.3331297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
OBJECTIVE Artificial intelligence and machine learning are transforming many fields including medicine. In diabetes, robust biosensing technologies and automated insulin delivery therapies have created a substantial opportunity to improve health. While the number of manuscripts addressing the topic of applying machine learning to diabetes has grown in recent years, there has been a lack of consistency in the methods, metrics, and data used to train and evaluate these algorithms. This manuscript provides consensus guidelines for machine learning practitioners in the field of diabetes, including best practice recommended approaches and warnings about pitfalls to avoid. METHODS Algorithmic approaches are reviewed and benefits of different algorithms are discussed including importance of clinical accuracy, explainability, interpretability, and personalization. We review the most common features used in machine learning applications in diabetes glucose control and provide an open-source library of functions for calculating features, as well as a framework for specifying data sets using data sheets. A review of current data sets available for training algorithms is provided as well as an online repository of data sources. SIGNIFICANCE These consensus guidelines are designed to improve performance and translatability of new machine learning algorithms developed in the field of diabetes for engineers and data scientists.
Collapse
|
7
|
Akinci D'Antonoli T, Cuocolo R, Baessler B, Pinto Dos Santos D. Towards reproducible radiomics research: introduction of a database for radiomics studies. Eur Radiol 2024; 34:436-443. [PMID: 37572188 DOI: 10.1007/s00330-023-10095-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Revised: 07/07/2023] [Accepted: 07/12/2023] [Indexed: 08/14/2023]
Abstract
OBJECTIVES To investigate the model-, code-, and data-sharing practices in the current radiomics research landscape and to introduce a radiomics research database. METHODS A total of 1254 articles published between January 1, 2021, and December 31, 2022, in leading radiology journals (European Radiology, European Journal of Radiology, Radiology, Radiology: Artificial Intelligence, Radiology: Cardiothoracic Imaging, Radiology: Imaging Cancer) were retrospectively screened, and 257 original research articles were included in this study. The categorical variables were compared using Fisher's exact tests or chi-square test and numerical variables using Student's t test with relation to the year of publication. RESULTS Half of the articles (128 of 257) shared the model by either including the final model formula or reporting the coefficients of selected radiomics features. A total of 73 (28%) models were validated on an external independent dataset. Only 16 (6%) articles shared the data or used publicly available open datasets. Similarly, only 20 (7%) of the articles shared the code. A total of 7 (3%) articles both shared code and data. All collected data in this study is presented in a radiomics research database (RadBase) and could be accessed at https://github.com/EuSoMII/RadBase . CONCLUSION According to the results of this study, the majority of published radiomics models were not technically reproducible since they shared neither model nor code and data. There is still room for improvement in carrying out reproducible and open research in the field of radiomics. CLINICAL RELEVANCE STATEMENT To date, the reproducibility of radiomics research and open science practices within the radiomics research community are still very low. Ensuring reproducible radiomics research with model-, code-, and data-sharing practices will facilitate faster clinical translation. KEY POINTS • There is a discrepancy between the number of published radiomics papers and the clinical implementation of these published radiomics models. • The main obstacle to clinical implementation is the lack of model-, code-, and data-sharing practices. • In order to translate radiomics research into clinical practice, the radiomics research community should adopt open science practices.
Collapse
Affiliation(s)
- Tugba Akinci D'Antonoli
- Institute of Radiology and Nuclear Medicine, Cantonal Hospital Baselland, Liestal, Switzerland.
| | - Renato Cuocolo
- Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, Italy
| | - Bettina Baessler
- Department of Diagnostic and Interventional Radiology, University Hospital Würzburg, Würzburg, Germany
| | - Daniel Pinto Dos Santos
- Department of Radiology, University Hospital of Cologne, Cologne, Germany
- Department of Radiology, University Hospital of Frankfurt, Frankfurt, Germany
| |
Collapse
|
8
|
Chae A, Yao MS, Sagreiya H, Goldberg AD, Chatterjee N, MacLean MT, Duda J, Elahi A, Borthakur A, Ritchie MD, Rader D, Kahn CE, Witschey WR, Gee JC. Strategies for Implementing Machine Learning Algorithms in the Clinical Practice of Radiology. Radiology 2024; 310:e223170. [PMID: 38259208 PMCID: PMC10831483 DOI: 10.1148/radiol.223170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 08/24/2023] [Accepted: 08/29/2023] [Indexed: 01/24/2024]
Abstract
Despite recent advancements in machine learning (ML) applications in health care, there have been few benefits and improvements to clinical medicine in the hospital setting. To facilitate clinical adaptation of methods in ML, this review proposes a standardized framework for the step-by-step implementation of artificial intelligence into the clinical practice of radiology that focuses on three key components: problem identification, stakeholder alignment, and pipeline integration. A review of the recent literature and empirical evidence in radiologic imaging applications justifies this approach and offers a discussion on structuring implementation efforts to help other hospital practices leverage ML to improve patient care. Clinical trial registration no. 04242667 © RSNA, 2024 Supplemental material is available for this article.
Collapse
Affiliation(s)
| | | | - Hersh Sagreiya
- From the Departments of Bioengineering (M.S.Y.), Radiology (H.S.,
N.C., M.T.M., J.D., A.B., C.E.K., W.R.W., J.C.G.), Genetics (M.D.R.), and
Medicine (D.R.), Perelman School of Medicine (A.C., M.S.Y., H.S., A.B., C.E.K.,
W.R.W., J.C.G.), University of Pennsylvania, 3400 Civic Center Blvd,
Philadelphia, PA 19104; Department of Radiology, Loyola University Medical
Center, Maywood, Ill (A.D.G.); Department of Information Services, University of
Pennsylvania, Philadelphia, Pa (A.E.); and Leonard Davis Institute of Health
Economics, University of Pennsylvania, Philadelphia, Pa (A.B.)
| | - Ari D. Goldberg
- From the Departments of Bioengineering (M.S.Y.), Radiology (H.S.,
N.C., M.T.M., J.D., A.B., C.E.K., W.R.W., J.C.G.), Genetics (M.D.R.), and
Medicine (D.R.), Perelman School of Medicine (A.C., M.S.Y., H.S., A.B., C.E.K.,
W.R.W., J.C.G.), University of Pennsylvania, 3400 Civic Center Blvd,
Philadelphia, PA 19104; Department of Radiology, Loyola University Medical
Center, Maywood, Ill (A.D.G.); Department of Information Services, University of
Pennsylvania, Philadelphia, Pa (A.E.); and Leonard Davis Institute of Health
Economics, University of Pennsylvania, Philadelphia, Pa (A.B.)
| | - Neil Chatterjee
- From the Departments of Bioengineering (M.S.Y.), Radiology (H.S.,
N.C., M.T.M., J.D., A.B., C.E.K., W.R.W., J.C.G.), Genetics (M.D.R.), and
Medicine (D.R.), Perelman School of Medicine (A.C., M.S.Y., H.S., A.B., C.E.K.,
W.R.W., J.C.G.), University of Pennsylvania, 3400 Civic Center Blvd,
Philadelphia, PA 19104; Department of Radiology, Loyola University Medical
Center, Maywood, Ill (A.D.G.); Department of Information Services, University of
Pennsylvania, Philadelphia, Pa (A.E.); and Leonard Davis Institute of Health
Economics, University of Pennsylvania, Philadelphia, Pa (A.B.)
| | - Matthew T. MacLean
- From the Departments of Bioengineering (M.S.Y.), Radiology (H.S.,
N.C., M.T.M., J.D., A.B., C.E.K., W.R.W., J.C.G.), Genetics (M.D.R.), and
Medicine (D.R.), Perelman School of Medicine (A.C., M.S.Y., H.S., A.B., C.E.K.,
W.R.W., J.C.G.), University of Pennsylvania, 3400 Civic Center Blvd,
Philadelphia, PA 19104; Department of Radiology, Loyola University Medical
Center, Maywood, Ill (A.D.G.); Department of Information Services, University of
Pennsylvania, Philadelphia, Pa (A.E.); and Leonard Davis Institute of Health
Economics, University of Pennsylvania, Philadelphia, Pa (A.B.)
| | - Jeffrey Duda
- From the Departments of Bioengineering (M.S.Y.), Radiology (H.S.,
N.C., M.T.M., J.D., A.B., C.E.K., W.R.W., J.C.G.), Genetics (M.D.R.), and
Medicine (D.R.), Perelman School of Medicine (A.C., M.S.Y., H.S., A.B., C.E.K.,
W.R.W., J.C.G.), University of Pennsylvania, 3400 Civic Center Blvd,
Philadelphia, PA 19104; Department of Radiology, Loyola University Medical
Center, Maywood, Ill (A.D.G.); Department of Information Services, University of
Pennsylvania, Philadelphia, Pa (A.E.); and Leonard Davis Institute of Health
Economics, University of Pennsylvania, Philadelphia, Pa (A.B.)
| | - Ameena Elahi
- From the Departments of Bioengineering (M.S.Y.), Radiology (H.S.,
N.C., M.T.M., J.D., A.B., C.E.K., W.R.W., J.C.G.), Genetics (M.D.R.), and
Medicine (D.R.), Perelman School of Medicine (A.C., M.S.Y., H.S., A.B., C.E.K.,
W.R.W., J.C.G.), University of Pennsylvania, 3400 Civic Center Blvd,
Philadelphia, PA 19104; Department of Radiology, Loyola University Medical
Center, Maywood, Ill (A.D.G.); Department of Information Services, University of
Pennsylvania, Philadelphia, Pa (A.E.); and Leonard Davis Institute of Health
Economics, University of Pennsylvania, Philadelphia, Pa (A.B.)
| | - Arijitt Borthakur
- From the Departments of Bioengineering (M.S.Y.), Radiology (H.S.,
N.C., M.T.M., J.D., A.B., C.E.K., W.R.W., J.C.G.), Genetics (M.D.R.), and
Medicine (D.R.), Perelman School of Medicine (A.C., M.S.Y., H.S., A.B., C.E.K.,
W.R.W., J.C.G.), University of Pennsylvania, 3400 Civic Center Blvd,
Philadelphia, PA 19104; Department of Radiology, Loyola University Medical
Center, Maywood, Ill (A.D.G.); Department of Information Services, University of
Pennsylvania, Philadelphia, Pa (A.E.); and Leonard Davis Institute of Health
Economics, University of Pennsylvania, Philadelphia, Pa (A.B.)
| | - Marylyn D. Ritchie
- From the Departments of Bioengineering (M.S.Y.), Radiology (H.S.,
N.C., M.T.M., J.D., A.B., C.E.K., W.R.W., J.C.G.), Genetics (M.D.R.), and
Medicine (D.R.), Perelman School of Medicine (A.C., M.S.Y., H.S., A.B., C.E.K.,
W.R.W., J.C.G.), University of Pennsylvania, 3400 Civic Center Blvd,
Philadelphia, PA 19104; Department of Radiology, Loyola University Medical
Center, Maywood, Ill (A.D.G.); Department of Information Services, University of
Pennsylvania, Philadelphia, Pa (A.E.); and Leonard Davis Institute of Health
Economics, University of Pennsylvania, Philadelphia, Pa (A.B.)
| | - Daniel Rader
- From the Departments of Bioengineering (M.S.Y.), Radiology (H.S.,
N.C., M.T.M., J.D., A.B., C.E.K., W.R.W., J.C.G.), Genetics (M.D.R.), and
Medicine (D.R.), Perelman School of Medicine (A.C., M.S.Y., H.S., A.B., C.E.K.,
W.R.W., J.C.G.), University of Pennsylvania, 3400 Civic Center Blvd,
Philadelphia, PA 19104; Department of Radiology, Loyola University Medical
Center, Maywood, Ill (A.D.G.); Department of Information Services, University of
Pennsylvania, Philadelphia, Pa (A.E.); and Leonard Davis Institute of Health
Economics, University of Pennsylvania, Philadelphia, Pa (A.B.)
| | - Charles E. Kahn
- From the Departments of Bioengineering (M.S.Y.), Radiology (H.S.,
N.C., M.T.M., J.D., A.B., C.E.K., W.R.W., J.C.G.), Genetics (M.D.R.), and
Medicine (D.R.), Perelman School of Medicine (A.C., M.S.Y., H.S., A.B., C.E.K.,
W.R.W., J.C.G.), University of Pennsylvania, 3400 Civic Center Blvd,
Philadelphia, PA 19104; Department of Radiology, Loyola University Medical
Center, Maywood, Ill (A.D.G.); Department of Information Services, University of
Pennsylvania, Philadelphia, Pa (A.E.); and Leonard Davis Institute of Health
Economics, University of Pennsylvania, Philadelphia, Pa (A.B.)
| | | | | |
Collapse
|
9
|
Murray JD, Lange JJ, Bennett-Lenane H, Holm R, Kuentz M, O'Dwyer PJ, Griffin BT. Advancing algorithmic drug product development: Recommendations for machine learning approaches in drug formulation. Eur J Pharm Sci 2023; 191:106562. [PMID: 37562550 DOI: 10.1016/j.ejps.2023.106562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 07/09/2023] [Accepted: 08/07/2023] [Indexed: 08/12/2023]
Abstract
Artificial intelligence is a rapidly expanding area of research, with the disruptive potential to transform traditional approaches in the pharmaceutical industry, from drug discovery and development to clinical practice. Machine learning, a subfield of artificial intelligence, has fundamentally transformed in silico modelling and has the capacity to streamline clinical translation. This paper reviews data-driven modelling methodologies with a focus on drug formulation development. Despite recent advances, there is limited modelling guidance specific to drug product development and a trend towards suboptimal modelling practices, resulting in models that may not give reliable predictions in practice. There is an overwhelming focus on benchtop experimental outcomes obtained for a specific modelling aim, leaving the capabilities of data scraping or the use of combined modelling approaches yet to be fully explored. Moreover, the preference for high accuracy can lead to a reliance on black box methods over interpretable models. This further limits the widespread adoption of machine learning as black boxes yield models that cannot be easily understood for the purposes of enhancing product performance. In this review, recommendations for conducting machine learning research for drug product development to ensure trustworthiness, transparency, and reliability of the models produced are presented. Finally, possible future directions on how research in this area might develop are discussed to aim for models that provide useful and robust guidance to formulators.
Collapse
Affiliation(s)
- Jack D Murray
- School of Pharmacy, University College Cork, Cork, Ireland
| | - Justus J Lange
- School of Pharmacy, University College Cork, Cork, Ireland; Roche Pharmaceutical Research & Early Development, Pre-Clinical CMC, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, Basel, Switzerland
| | | | - René Holm
- Department of Physics, Chemistry and Pharmacy, University of Southern Denmark, Campusvej 55, Odense 5230, Denmark
| | - Martin Kuentz
- School of Life Sciences, University of Applied Sciences and Arts Northwestern Switzerland, Muttenz CH 4132, Switzerland
| | | | | |
Collapse
|
10
|
|
11
|
Pylvänäinen JW, Gómez-de-Mariscal E, Henriques R, Jacquemet G. Live-cell imaging in the deep learning era. Curr Opin Cell Biol 2023; 85:102271. [PMID: 37897927 DOI: 10.1016/j.ceb.2023.102271] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 09/29/2023] [Accepted: 10/02/2023] [Indexed: 10/30/2023]
Abstract
Live imaging is a powerful tool, enabling scientists to observe living organisms in real time. In particular, when combined with fluorescence microscopy, live imaging allows the monitoring of cellular components with high sensitivity and specificity. Yet, due to critical challenges (i.e., drift, phototoxicity, dataset size), implementing live imaging and analyzing the resulting datasets is rarely straightforward. Over the past years, the development of bioimage analysis tools, including deep learning, is changing how we perform live imaging. Here we briefly cover important computational methods aiding live imaging and carrying out key tasks such as drift correction, denoising, super-resolution imaging, artificial labeling, tracking, and time series analysis. We also cover recent advances in self-driving microscopy.
Collapse
Affiliation(s)
- Joanna W Pylvänäinen
- Faculty of Science and Engineering, Cell Biology, Åbo Akademi, University, 20520 Turku, Finland
| | | | - Ricardo Henriques
- Instituto Gulbenkian de Ciência, Oeiras 2780-156, Portugal; University College London, London WC1E 6BT, United Kingdom
| | - Guillaume Jacquemet
- Faculty of Science and Engineering, Cell Biology, Åbo Akademi, University, 20520 Turku, Finland; Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520, Turku, Finland; InFLAMES Research Flagship Center, University of Turku and Åbo Akademi University, 20520 Turku, Finland; Turku Bioimaging, University of Turku and Åbo Akademi University, FI- 20520 Turku, Finland.
| |
Collapse
|
12
|
Imker HJ, Schackart KE, Istrate AM, Cook CE. A machine learning-enabled open biodata resource inventory from the scientific literature. PLoS One 2023; 18:e0294812. [PMID: 38015968 PMCID: PMC10684096 DOI: 10.1371/journal.pone.0294812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 11/07/2023] [Indexed: 11/30/2023] Open
Abstract
Modern biological research depends on data resources. These resources archive difficult-to-reproduce data and provide added-value aggregation, curation, and analyses. Collectively, they constitute a global infrastructure of biodata resources. While the organic proliferation of biodata resources has enabled incredible research, sustained support for the individual resources that make up this distributed infrastructure is a challenge. The Global Biodata Coalition (GBC) was established by research funders in part to aid in developing sustainable funding strategies for biodata resources. An important component of this work is understanding the scope of the resource infrastructure; how many biodata resources there are, where they are, and how they are supported. Existing registries require self-registration and/or extensive curation, and we sought to develop a method for assembling a global inventory of biodata resources that could be periodically updated with minimal human intervention. The approach we developed identifies biodata resources using open data from the scientific literature. Specifically, we used a machine learning-enabled natural language processing approach to identify biodata resources from titles and abstracts of life sciences publications contained in Europe PMC. Pretrained BERT (Bidirectional Encoder Representations from Transformers) models were fine-tuned to classify publications as describing a biodata resource or not and to predict the resource name using named entity recognition. To improve the quality of the resulting inventory, low-confidence predictions and potential duplicates were manually reviewed. Further information about the resources were then obtained using article metadata, such as funder and geolocation information. These efforts yielded an inventory of 3112 unique biodata resources based on articles published from 2011-2021. The code was developed to facilitate reuse and includes automated pipelines. All products of this effort are released under permissive licensing, including the biodata resource inventory itself (CC0) and all associated code (BSD/MIT).
Collapse
Affiliation(s)
- Heidi J. Imker
- Global Biodata Coalition, Strasbourg, France
- University Library, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Kenneth E. Schackart
- Global Biodata Coalition, Strasbourg, France
- Department of Biosystems Engineering, The University of Arizona, Tucson, Arizona, United States of America
| | - Ana-Maria Istrate
- Chan Zuckerberg Initiative, Redwood City, California, United States of America
| | | |
Collapse
|
13
|
Hsiao YC, Kuo CY, Lin FJ, Wu YW, Lin TH, Yeh HI, Chen JW, Wu CC. Machine Learning Models for ASCVD Risk Prediction in an Asian Population - How to Validate the Model is Important. Acta Cardiol Sin 2023; 39:901-912. [PMID: 38022427 PMCID: PMC10646597 DOI: 10.6515/acs.202311_39(6).20230528a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 05/28/2023] [Indexed: 12/01/2023]
Abstract
Introduction Atherosclerotic cardiovascular disease (ASCVD) is prevalent worldwide including Taiwan, however widely accepted tools to assess the risk of ASCVD are lacking in Taiwan. Machine learning models are potentially useful for risk evaluation. In this study we used two cohorts to test the feasibility of machine learning with transfer learning for developing an ASCVD risk prediction model in Taiwan. Methods Two multi-center observational registry cohorts, T-SPARCLE and T-PPARCLE were used in this study. The variables selected were based on European, U.S. and Asian guidelines. Both registries recorded the ASCVD outcomes of the patients. Ten-fold validation and temporal validation methods were used to evaluate the performance of the binary classification analysis [prediction of major adverse cardiovascular (CV) events in one year]. Time-to-event analyses were also performed. Results In the binary classification analysis, eXtreme Gradient Boosting (XGBoost) and random forest had the best performance, with areas under the receiver operating characteristic curve (AUC-ROC) of 0.72 (0.68-0.76) and 0.73 (0.69-0.77), respectively, although it was not significantly better than other models. Temporal validation was also performed, and the data showed significant differences in the distribution of various features and event rate. The AUC-ROC of XGBoost dropped to 0.66 (0.59-0.73), while that of random forest dropped to 0.69 (0.62-0.76) in the temporal validation method, and the performance also became numerically worse than that of the logistic regression model. In the time-to-event analysis, most models had a concordance index of around 0.70. Conclusions Machine learning models with appropriate transfer learning may be a useful tool for the development of CV risk prediction models and may help improve patient care in the future.
Collapse
Affiliation(s)
- Yu-Chung Hsiao
- Department of Internal Medicine, National Taiwan University Hospital
| | - Chen-Yuan Kuo
- Center for Healthy Longevity and Aging Sciences, National Yang Ming Chiao Tung University
| | - Fang-Ju Lin
- Graduate Institute of Clinical Pharmacy & School of Pharmacy, College of Medicine, National Taiwan University
- Department of Pharmacy, National Taiwan University Hospital, Taipei
| | - Yen-Wen Wu
- Division of Cardiology, Cardiovascular Medical Center, Far Eastern Memorial Hospital, New Taipei City
- School of Medicine, National Yang Ming Chiao Tung University, School of Medicine, Taipei
- Graduate Institute of Medicine, Yuan Ze University, Taoyuan
| | - Tsung-Hsien Lin
- Division of Cardiology, Department of Internal Medicine, Kaohsiung Medical University Hospital
- Faculty of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung
| | - Hung-I Yeh
- MacKay Memorial Hospital, MacKay Medical College
| | - Jaw-Wen Chen
- Department of Medical Research and Education, Taipei Veterans General Hospital
| | - Chau-Chung Wu
- Department of Internal Medicine, National Taiwan University Hospital
- Graduate Institute of Medical Education & Bioethics, College of Medicine, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
14
|
Musyaffa FA, Rapp K, Gohlke H. LISTER: Semiautomatic Metadata Extraction from Annotated Experiment Documentation in eLabFTW. J Chem Inf Model 2023; 63:6224-6238. [PMID: 37773594 DOI: 10.1021/acs.jcim.3c00744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/01/2023]
Abstract
The availability of scientific methods, code, and data is key for reproducing an experiment. Research data should be made available following the FAIR principle (findable, accessible, interoperable, and reusable). For that, the annotation of research data with metadata is central. However, existing research data management workflows often require that metadata be created by the corresponding researchers, which takes effort and time. Here, we developed LISTER as a methodological and algorithmic solution to create and extract metadata from annotated, template-based experimental documentation using minimum effort. We focused on tailoring the integration between existing platforms by using eLabFTW as the electronic lab notebook and adopting the ISA (investigation, study, assay) model as the abstract data model framework. LISTER consists of four components: annotation language to support metadata extraction; customized eLabFTW entries using specific hierarchies, templates, and tags to structure reusable scientific documentation; a "container" concept in eLabFTW, making metadata of a particular container content extractable along with its underlying, related experiments via a single click; a Python-based app to enable easy-to-use, semiautomated metadata extraction from eLabFTW entries. LISTER outputs metadata in machine-readable .json and human-readable .xlsx formats, and Material and Methods (MM) descriptions in .docx format that could be used in a thesis or manuscript. The metadata can be used as a basis to create or extend ontologies, which, when applied to the published research data, will significantly enhance its value. DSpace is used as a data cataloging platform for hosting the extracted metadata and research data. We applied LISTER to computational biophysical chemistry, protein biochemistry, and molecular biology, and our concept should be extendable to other life science areas.
Collapse
Affiliation(s)
- Fathoni A Musyaffa
- Institute for Pharmaceutical and Medicinal Chemistry, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | - Kirsten Rapp
- Institute for Pharmaceutical and Medicinal Chemistry, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | - Holger Gohlke
- Institute for Pharmaceutical and Medicinal Chemistry, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| |
Collapse
|
15
|
Kaczmarzyk JR, Gupta R, Kurc TM, Abousamra S, Saltz JH, Koo PK. ChampKit: A framework for rapid evaluation of deep neural networks for patch-based histopathology classification. Comput Methods Programs Biomed 2023; 239:107631. [PMID: 37271050 PMCID: PMC11093625 DOI: 10.1016/j.cmpb.2023.107631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 04/23/2023] [Accepted: 05/28/2023] [Indexed: 06/06/2023]
Abstract
BACKGROUND AND OBJECTIVE Histopathology is the gold standard for diagnosis of many cancers. Recent advances in computer vision, specifically deep learning, have facilitated the analysis of histopathology images for many tasks, including the detection of immune cells and microsatellite instability. However, it remains difficult to identify optimal models and training configurations for different histopathology classification tasks due to the abundance of available architectures and the lack of systematic evaluations. Our objective in this work is to present a software tool that addresses this need and enables robust, systematic evaluation of neural network models for patch classification in histology in a light-weight, easy-to-use package for both algorithm developers and biomedical researchers. METHODS Here we present ChampKit (Comprehensive Histopathology Assessment of Model Predictions toolKit): an extensible, fully reproducible evaluation toolkit that is a one-stop-shop to train and evaluate deep neural networks for patch classification. ChampKit curates a broad range of public datasets. It enables training and evaluation of models supported by timm directly from the command line, without the need for users to write any code. External models are enabled through a straightforward API and minimal coding. As a result, Champkit facilitates the evaluation of existing and new models and deep learning architectures on pathology datasets, making it more accessible to the broader scientific community. To demonstrate the utility of ChampKit, we establish baseline performance for a subset of possible models that could be employed with ChampKit, focusing on several popular deep learning models, namely ResNet18, ResNet50, and R26-ViT, a hybrid vision transformer. In addition, we compare each model trained either from random weight initialization or with transfer learning from ImageNet pretrained models. For ResNet18, we also consider transfer learning from a self-supervised pretrained model. RESULTS The main result of this paper is the ChampKit software. Using ChampKit, we were able to systemically evaluate multiple neural networks across six datasets. We observed mixed results when evaluating the benefits of pretraining versus random intialization, with no clear benefit except in the low data regime, where transfer learning was found to be beneficial. Surprisingly, we found that transfer learning from self-supervised weights rarely improved performance, which is counter to other areas of computer vision. CONCLUSIONS Choosing the right model for a given digital pathology dataset is nontrivial. ChampKit provides a valuable tool to fill this gap by enabling the evaluation of hundreds of existing (or user-defined) deep learning models across a variety of pathology tasks. Source code and data for the tool are freely accessible at https://github.com/SBU-BMI/champkit.
Collapse
Affiliation(s)
- Jakub R Kaczmarzyk
- Department of Biomedical Informatics, Stony Brook Medicine, 101 Nicolls Rd, Stony Brook, 11794, NY, USA; Simons Center for Quantitative Biology, 1 Bungtown Rd, Cold Spring Harbor, 11724, NY, USA.
| | - Rajarsi Gupta
- Department of Biomedical Informatics, Stony Brook Medicine, 101 Nicolls Rd, Stony Brook, 11794, NY, USA
| | - Tahsin M Kurc
- Department of Biomedical Informatics, Stony Brook Medicine, 101 Nicolls Rd, Stony Brook, 11794, NY, USA
| | - Shahira Abousamra
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Joel H Saltz
- Department of Biomedical Informatics, Stony Brook Medicine, 101 Nicolls Rd, Stony Brook, 11794, NY, USA.
| | - Peter K Koo
- Simons Center for Quantitative Biology, 1 Bungtown Rd, Cold Spring Harbor, 11724, NY, USA.
| |
Collapse
|
16
|
Shin J, Porubsky V, Carothers J, Sauro HM. Standards, dissemination, and best practices in systems biology. Curr Opin Biotechnol 2023; 81:102922. [PMID: 37004298 PMCID: PMC10435326 DOI: 10.1016/j.copbio.2023.102922] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 02/14/2023] [Accepted: 02/24/2023] [Indexed: 04/03/2023]
Abstract
The reproducibility of scientific research is crucial to the success of the scientific method. Here, we review the current best practices when publishing mechanistic models in systems biology. We recommend, where possible, to use software engineering strategies such as testing, verification, validation, documentation, versioning, iterative development, and continuous integration. In addition, adhering to the Findable, Accessible, Interoperable, and Reusable modeling principles allows other scientists to collaborate and build off of each other's work. Existing standards such as Systems Biology Markup Language, CellML, or Simulation Experiment Description Markup Language can greatly improve the likelihood that a published model is reproducible, especially if such models are deposited in well-established model repositories. Where models are published in executable programming languages, the source code and their data should be published as open-source in public code repositories together with any documentation and testing code. For complex models, we recommend container-based solutions where any software dependencies and the run-time context can be easily replicated.
Collapse
Affiliation(s)
- Janis Shin
- Molecular Engineering & Sciences Institute, University of Washington, Seattle, WA, USA
| | - Veronica Porubsky
- Department of Bioengineering, University of Washington, Seattle, WA, USA
| | - James Carothers
- Molecular Engineering & Sciences Institute, University of Washington, Seattle, WA, USA
| | - Herbert M Sauro
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
| |
Collapse
|
17
|
Fraser AG, Biasin E, Bijnens B, Bruining N, Caiani EG, Cobbaert K, Davies RH, Gilbert SH, Hovestadt L, Kamenjasevic E, Kwade Z, McGauran G, O'Connor G, Vasey B, Rademakers FE. Artificial intelligence in medical device software and high-risk medical devices - a review of definitions, expert recommendations and regulatory initiatives. Expert Rev Med Devices 2023; 20:467-491. [PMID: 37157833 DOI: 10.1080/17434440.2023.2184685] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
INTRODUCTION Artificial intelligence (AI) encompasses a wide range of algorithms with risks when used to support decisions about diagnosis or treatment, so professional and regulatory bodies are recommending how they should be managed. AREAS COVERED AI systems may qualify as standalone medical device software (MDSW) or be embedded within a medical device. Within the European Union (EU) AI software must undergo a conformity assessment procedure to be approved as a medical device. The draft EU Regulation on AI proposes rules that will apply across industry sectors, while for devices the Medical Device Regulation also applies. In the CORE-MD project (Coordinating Research and Evidence for Medical Devices), we have surveyed definitions and summarize initiatives made by professional consensus groups, regulators, and standardization bodies. EXPERT OPINION The level of clinical evidence required should be determined according to each application and to legal and methodological factors that contribute to risk, including accountability, transparency, and interpretability. EU guidance for MDSW based on international recommendations does not yet describe the clinical evidence needed for medical AI software. Regulators, notified bodies, manufacturers, clinicians and patients would all benefit from common standards for the clinical evaluation of high-risk AI applications and transparency of their evidence and performance.
Collapse
Affiliation(s)
- Alan G Fraser
- University Hospital of Wales, School of Medicine, Cardiff University, Heath Park, Cardiff, U.K
- KU Leuven, Leuven, Belgium
| | | | - Bart Bijnens
- Engineering Sciences, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
| | - Nico Bruining
- Department of Clinical and Experimental Information processing (Digital Cardiology), Erasmus Medical Center, Thoraxcenter, Rotterdam, the Netherlands
| | - Enrico G Caiani
- Department of Electronics, Information and Biomedical Engineering, Politecnico di Milano, Milan, Italy
| | | | - Rhodri H Davies
- Institute of Cardiovascular Science, University College London, London, U.K
| | - Stephen H Gilbert
- Technische Universität Dresden, Else Kröner Fresenius Center for Digital Health, Dresden, Germany
| | | | | | | | | | | | - Baptiste Vasey
- Nuffield Department of Surgical Sciences, University of Oxford, Oxford, UK
| | | |
Collapse
|
18
|
Ahlquist KD, Sugden LA, Ramachandran S. Enabling interpretable machine learning for biological data with reliability scores. PLoS Comput Biol 2023; 19:e1011175. [PMID: 37235578 PMCID: PMC10249903 DOI: 10.1371/journal.pcbi.1011175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 06/08/2023] [Accepted: 05/10/2023] [Indexed: 05/28/2023] Open
Abstract
Machine learning tools have proven useful across biological disciplines, allowing researchers to draw conclusions from large datasets, and opening up new opportunities for interpreting complex and heterogeneous biological data. Alongside the rapid growth of machine learning, there have also been growing pains: some models that appear to perform well have later been revealed to rely on features of the data that are artifactual or biased; this feeds into the general criticism that machine learning models are designed to optimize model performance over the creation of new biological insights. A natural question arises: how do we develop machine learning models that are inherently interpretable or explainable? In this manuscript, we describe the SWIF(r) reliability score (SRS), a method building on the SWIF(r) generative framework that reflects the trustworthiness of the classification of a specific instance. The concept of the reliability score has the potential to generalize to other machine learning methods. We demonstrate the utility of the SRS when faced with common challenges in machine learning including: 1) an unknown class present in testing data that was not present in training data, 2) systemic mismatch between training and testing data, and 3) instances of testing data that have missing values for some attributes. We explore these applications of the SRS using a range of biological datasets, from agricultural data on seed morphology, to 22 quantitative traits in the UK Biobank, and population genetic simulations and 1000 Genomes Project data. With each of these examples, we demonstrate how the SRS can allow researchers to interrogate their data and training approach thoroughly, and to pair their domain-specific knowledge with powerful machine-learning frameworks. We also compare the SRS to related tools for outlier and novelty detection, and find that it has comparable performance, with the advantage of being able to operate when some data are missing. The SRS, and the broader discussion of interpretable scientific machine learning, will aid researchers in the biological machine learning space as they seek to harness the power of machine learning without sacrificing rigor and biological insight.
Collapse
Affiliation(s)
- K. D. Ahlquist
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, Rhode Island, United States of America
| | - Lauren A. Sugden
- Department of Mathematics and Computer Science, Duquesne University, Pittsburgh, Pennsylvania, United States of America
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Ecology, Evolution and Organismal Biology, Brown University, Providence, Rhode Island, United States of America
- Data Science Initiative, Brown University, Providence, Rhode Island, United States of America
| |
Collapse
|
19
|
Patterson A, Elbasir A, Tian B, Auslander N. Computational Methods Summarizing Mutational Patterns in Cancer: Promise and Limitations for Clinical Applications. Cancers (Basel) 2023; 15:cancers15071958. [PMID: 37046619 PMCID: PMC10093138 DOI: 10.3390/cancers15071958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 02/24/2023] [Accepted: 03/09/2023] [Indexed: 03/29/2023] Open
Abstract
Since the rise of next-generation sequencing technologies, the catalogue of mutations in cancer has been continuously expanding. To address the complexity of the cancer-genomic landscape and extract meaningful insights, numerous computational approaches have been developed over the last two decades. In this review, we survey the current leading computational methods to derive intricate mutational patterns in the context of clinical relevance. We begin with mutation signatures, explaining first how mutation signatures were developed and then examining the utility of studies using mutation signatures to correlate environmental effects on the cancer genome. Next, we examine current clinical research that employs mutation signatures and discuss the potential use cases and challenges of mutation signatures in clinical decision-making. We then examine computational studies developing tools to investigate complex patterns of mutations beyond the context of mutational signatures. We survey methods to identify cancer-driver genes, from single-driver studies to pathway and network analyses. In addition, we review methods inferring complex combinations of mutations for clinical tasks and using mutations integrated with multi-omics data to better predict cancer phenotypes. We examine the use of these tools for either discovery or prediction, including prediction of tumor origin, treatment outcomes, prognosis, and cancer typing. We further discuss the main limitations preventing widespread clinical integration of computational tools for the diagnosis and treatment of cancer. We end by proposing solutions to address these challenges using recent advances in machine learning.
Collapse
Affiliation(s)
- Andrew Patterson
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- The Wistar Institute, Philadelphia, PA 19104, USA
| | | | - Bin Tian
- The Wistar Institute, Philadelphia, PA 19104, USA
| | - Noam Auslander
- The Wistar Institute, Philadelphia, PA 19104, USA
- Department of Cancer Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
- Correspondence:
| |
Collapse
|
20
|
Sanders LM, Scott RT, Yang JH, Qutub AA, Garcia Martin H, Berrios DC, Hastings JJA, Rask J, Mackintosh G, Hoarfrost AL, Chalk S, Kalantari J, Khezeli K, Antonsen EL, Babdor J, Barker R, Baranzini SE, Beheshti A, Delgado-Aparicio GM, Glicksberg BS, Greene CS, Haendel M, Hamid AA, Heller P, Jamieson D, Jarvis KJ, Komarova SV, Komorowski M, Kothiyal P, Mahabal A, Manor U, Mason CE, Matar M, Mias GI, Miller J, Myers JG, Nelson C, Oribello J, Park SM, Parsons-Wingerter P, Prabhu RK, Reynolds RJ, Saravia-Butler A, Saria S, Sawyer A, Singh NK, Snyder M, Soboczenski F, Soman K, Theriot CA, Van Valen D, Venkateswaran K, Warren L, Worthey L, Zitnik M, Costes SV. Biological research and self-driving labs in deep space supported by artificial intelligence. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-023-00618-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2023]
|
21
|
Marques E, de Gendt S, Pourtois G, van Setten MJ. Improving Accuracy and Transferability of Machine Learning Chemical Activation Energies by Adding Electronic Structure Information. J Chem Inf Model 2023; 63:1454-1461. [PMID: 36864757 DOI: 10.1021/acs.jcim.2c01502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/04/2023]
Abstract
Predicting chemical activation energies is one of the longstanding and important challenges in computational chemistry. Recent advances have shown that machine learning can be used to create tools to predict them. Such tools can significantly decrease the computational cost for these predictions compared to traditional methods, which require an optimal path search along a high-dimensional potential energy surface. To enable this new route, we need both large and accurate datasets and a compact yet complete description of the reactions. Although data for chemical reactions is becoming increasingly available, the key step of encoding the reaction as an efficient descriptor remains a big challenge. In this paper, we demonstrate that including electronic energy levels in the description of the reaction significantly improves the prediction accuracy and transferability. Feature importance analysis further demonstrates that electronic energy levels have a higher importance than some structural information and typically require less space in the reaction encoding vector. In general, we observe that the results of the feature importance analysis relate well to the domain knowledge of fundamental chemical principles. This work can help to build better chemical reaction encodings for machine learning and thus improve the predictions of machine learning models for reaction activation energies. These models could ultimately be used to recognize reaction limiting steps in large reaction systems, allowing to account for bottlenecks at the design stage.
Collapse
Affiliation(s)
- Esteban Marques
- Department of Chemistry, KU Leuven (University of Leuven), Celestijnenlaan 200 F, Heverlee 3001, Belgium.,IMEC, Kapeldreef 75, Leuven 3001, Belgium
| | - Stefan de Gendt
- Department of Chemistry, KU Leuven (University of Leuven), Celestijnenlaan 200 F, Heverlee 3001, Belgium.,IMEC, Kapeldreef 75, Leuven 3001, Belgium
| | - Geoffrey Pourtois
- IMEC, Kapeldreef 75, Leuven 3001, Belgium.,Department of Chemistry, University of Antwerp, Campus Drie Eiken, Universiteitsplein 1, Wilrijk 2610, Belgium
| | - Michiel J van Setten
- IMEC, Kapeldreef 75, Leuven 3001, Belgium.,ETSF European Theoretical Spectroscopy Facility, Institut de Physique, Université de Liège, Allée du 6 août 17, Liège 4000, Belgium
| |
Collapse
|
22
|
Tsakiroglou M, Evans A, Pirmohamed M. Leveraging transcriptomics for precision diagnosis: Lessons learned from cancer and sepsis. Front Genet 2023; 14:1100352. [PMID: 36968610 PMCID: PMC10036914 DOI: 10.3389/fgene.2023.1100352] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 02/20/2023] [Indexed: 03/12/2023] Open
Abstract
Diagnostics require precision and predictive ability to be clinically useful. Integration of multi-omic with clinical data is crucial to our understanding of disease pathogenesis and diagnosis. However, interpretation of overwhelming amounts of information at the individual level requires sophisticated computational tools for extraction of clinically meaningful outputs. Moreover, evolution of technical and analytical methods often outpaces standardisation strategies. RNA is the most dynamic component of all -omics technologies carrying an abundance of regulatory information that is least harnessed for use in clinical diagnostics. Gene expression-based tests capture genetic and non-genetic heterogeneity and have been implemented in certain diseases. For example patients with early breast cancer are spared toxic unnecessary treatments with scores based on the expression of a set of genes (e.g., Oncotype DX). The ability of transcriptomics to portray the transcriptional status at a moment in time has also been used in diagnosis of dynamic diseases such as sepsis. Gene expression profiles identify endotypes in sepsis patients with prognostic value and a potential to discriminate between viral and bacterial infection. The application of transcriptomics for patient stratification in clinical environments and clinical trials thus holds promise. In this review, we discuss the current clinical application in the fields of cancer and infection. We use these paradigms to highlight the impediments in identifying useful diagnostic and prognostic biomarkers and propose approaches to overcome them and aid efforts towards clinical implementation.
Collapse
Affiliation(s)
- Maria Tsakiroglou
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, United Kingdom
- *Correspondence: Maria Tsakiroglou,
| | - Anthony Evans
- Computational Biology Facility, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Munir Pirmohamed
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| |
Collapse
|
23
|
Vaquero-Garcia J, Aicher JK, Jewell S, Gazzara MR, Radens CM, Jha A, Norton SS, Lahens NF, Grant GR, Barash Y. RNA splicing analysis using heterogeneous and large RNA-seq datasets. Nat Commun 2023; 14:1230. [PMID: 36869033 PMCID: PMC9984406 DOI: 10.1038/s41467-023-36585-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 02/06/2023] [Indexed: 03/05/2023] Open
Abstract
The ubiquity of RNA-seq has led to many methods that use RNA-seq data to analyze variations in RNA splicing. However, available methods are not well suited for handling heterogeneous and large datasets. Such datasets scale to thousands of samples across dozens of experimental conditions, exhibit increased variability compared to biological replicates, and involve thousands of unannotated splice variants resulting in increased transcriptome complexity. We describe here a suite of algorithms and tools implemented in the MAJIQ v2 package to address challenges in detection, quantification, and visualization of splicing variations from such datasets. Using both large scale synthetic data and GTEx v8 as benchmark datasets, we assess the advantages of MAJIQ v2 compared to existing methods. We then apply MAJIQ v2 package to analyze differential splicing across 2,335 samples from 13 brain subregions, demonstrating its ability to offer insights into brain subregion-specific splicing regulation.
Collapse
Affiliation(s)
| | - Joseph K Aicher
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA.,Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - San Jewell
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Matthew R Gazzara
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Caleb M Radens
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Anupama Jha
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Scott S Norton
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Nicholas F Lahens
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Gregory R Grant
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA.,Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Yoseph Barash
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA. .,Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
24
|
Heil BJ, Crawford J, Greene CS. The effect of non-linear signal in classification problems using gene expression. PLoS Comput Biol 2023; 19:e1010984. [PMID: 36972227 PMCID: PMC10079219 DOI: 10.1371/journal.pcbi.1010984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 04/06/2023] [Accepted: 02/28/2023] [Indexed: 03/29/2023] Open
Abstract
Those building predictive models from transcriptomic data are faced with two conflicting perspectives. The first, based on the inherent high dimensionality of biological systems, supposes that complex non-linear models such as neural networks will better match complex biological systems. The second, imagining that complex systems will still be well predicted by simple dividing lines prefers linear models that are easier to interpret. We compare multi-layer neural networks and logistic regression across multiple prediction tasks on GTEx and Recount3 datasets and find evidence in favor of both possibilities. We verified the presence of non-linear signal when predicting tissue and metadata sex labels from expression data by removing the predictive linear signal with Limma, and showed the removal ablated the performance of linear methods but not non-linear ones. However, we also found that the presence of non-linear signal was not necessarily sufficient for neural networks to outperform logistic regression. Our results demonstrate that while multi-layer neural networks may be useful for making predictions from gene expression data, including a linear baseline model is critical because while biological systems are high-dimensional, effective dividing lines for predictive models may not be.
Collapse
Affiliation(s)
- Benjamin J. Heil
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Pennsylvania, United States of America
| | - Jake Crawford
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Pennsylvania, United States of America
| | - Casey S. Greene
- Department of Pharmacology, University of Colorado School of Medicine, Colorado, United States of America
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Colorado, United States of America
| |
Collapse
|
25
|
Broderick T, Gelman A, Meager R, Smith AL, Zheng T. Toward a taxonomy of trust for probabilistic machine learning. Sci Adv 2023; 9:eabn3999. [PMID: 36791188 PMCID: PMC9931201 DOI: 10.1126/sciadv.abn3999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 01/13/2023] [Indexed: 06/18/2023]
Abstract
Probabilistic machine learning increasingly informs critical decisions in medicine, economics, politics, and beyond. To aid the development of trust in these decisions, we develop a taxonomy delineating where trust in an analysis can break down: (i) in the translation of real-world goals to goals on a particular set of training data, (ii) in the translation of abstract goals on the training data to a concrete mathematical problem, (iii) in the use of an algorithm to solve the stated mathematical problem, and (iv) in the use of a particular code implementation of the chosen algorithm. We detail how trust can fail at each step and illustrate our taxonomy with two case studies. Finally, we describe a wide variety of methods that can be used to increase trust at each step of our taxonomy. The use of our taxonomy highlights not only steps where existing research work on trust tends to concentrate and but also steps where building trust is particularly challenging.
Collapse
Affiliation(s)
- Tamara Broderick
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Andrew Gelman
- Department of Statistics, Columbia University, New York, NY, USA
- Department of Political Science, Columbia University, New York, NY, USA
| | - Rachael Meager
- Department of Economics, London School of Economics and Political Science, London, UK
| | - Anna L. Smith
- Department of Statistics, University of Kentucky, Lexington, KY, USA
| | - Tian Zheng
- Department of Statistics, Columbia University, New York, NY, USA
| |
Collapse
|
26
|
Du X, Dastmalchi F, Ye H, Garrett TJ, Diller MA, Liu M, Hogan WR, Brochhausen M, Lemas DJ. Evaluating LC-HRMS metabolomics data processing software using FAIR principles for research software. Metabolomics 2023; 19:11. [PMID: 36745241 DOI: 10.1007/s11306-023-01974-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 01/20/2023] [Indexed: 02/07/2023]
Abstract
BACKGROUND Liquid chromatography-high resolution mass spectrometry (LC-HRMS) is a popular approach for metabolomics data acquisition and requires many data processing software tools. The FAIR Principles - Findability, Accessibility, Interoperability, and Reusability - were proposed to promote open science and reusable data management, and to maximize the benefit obtained from contemporary and formal scholarly digital publishing. More recently, the FAIR principles were extended to include Research Software (FAIR4RS). AIM OF REVIEW This study facilitates open science in metabolomics by providing an implementation solution for adopting FAIR4RS in the LC-HRMS metabolomics data processing software. We believe our evaluation guidelines and results can help improve the FAIRness of research software. KEY SCIENTIFIC CONCEPTS OF REVIEW We evaluated 124 LC-HRMS metabolomics data processing software obtained from a systematic review and selected 61 software for detailed evaluation using FAIR4RS-related criteria, which were extracted from the literature along with internal discussions. We assigned each criterion one or more FAIR4RS categories through discussion. The minimum, median, and maximum percentages of criteria fulfillment of software were 21.6%, 47.7%, and 71.8%. Statistical analysis revealed no significant improvement in FAIRness over time. We identified four criteria covering multiple FAIR4RS categories but had a low %fulfillment: (1) No software had semantic annotation of key information; (2) only 6.3% of evaluated software were registered to Zenodo and received DOIs; (3) only 14.5% of selected software had official software containerization or virtual machine; (4) only 16.7% of evaluated software had a fully documented functions in code. According to the results, we discussed improvement strategies and future directions.
Collapse
Affiliation(s)
- Xinsong Du
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Farhad Dastmalchi
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Hao Ye
- Health Science Center Libraries, University of Florida, Florida, USA
| | - Timothy J Garrett
- Department of Pathology, Immunology and Laboratory Medicine, College of Medicine, University of Florida, Florida, USA
| | - Matthew A Diller
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Mei Liu
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - William R Hogan
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Mathias Brochhausen
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, USA
| | - Dominick J Lemas
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA.
- Department of Obstetrics and Gynecology, University of Florida College of Medicine, Florida, Gainesville, United States.
- Center for Perinatal Outcomes Research, University of Florida College of Medicine, Gainesville, United States.
| |
Collapse
|
27
|
Kertesz-Farkas A, Nii Adoquaye Acquaye FL, Bhimani K, Eng JK, Fondrie WE, Grant C, Hoopmann MR, Lin A, Lu YY, Moritz RL, MacCoss MJ, Noble WS. The Crux Toolkit for Analysis of Bottom-Up Tandem Mass Spectrometry Proteomics Data. J Proteome Res 2023; 22:561-569. [PMID: 36598107 DOI: 10.1021/acs.jproteome.2c00615] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The Crux tandem mass spectrometry data analysis toolkit provides a collection of algorithms for analyzing bottom-up proteomics tandem mass spectrometry data. Many publications have described various individual components of Crux, but a comprehensive summary has not been published since 2014. The goal of this work is to summarize the functionality of Crux, focusing on developments since 2014. We begin with empirical results demonstrating our recently implemented speedups to the Tide search engine. Other new features include a new score function in Tide, two new confidence estimation procedures, as well as three new tools: Param-medic for estimating search parameters directly from mass spectrometry data, Kojak for searching cross-linked mass spectra, and DIAmeter for searching data independent acquisition data against a sequence database.
Collapse
Affiliation(s)
- Attila Kertesz-Farkas
- Department of Data Analysis and Artificial Intelligence and Laboratory on AI for Computational Biology, Faculty of Computer Science, HSE University, 20 Myasnitskaya ulitsa, Moscow 101000, Russia
| | - Frank Lawrence Nii Adoquaye Acquaye
- Department of Data Analysis and Artificial Intelligence and Laboratory on AI for Computational Biology, Faculty of Computer Science, HSE University, 20 Myasnitskaya ulitsa, Moscow 101000, Russia
| | - Kishankumar Bhimani
- Department of Data Analysis and Artificial Intelligence and Laboratory on AI for Computational Biology, Faculty of Computer Science, HSE University, 20 Myasnitskaya ulitsa, Moscow 101000, Russia
| | - Jimmy K Eng
- Proteomics Resource, University of Washington, 850 Republican Street, Seattle, Washington 98109-4725, United States
| | - William E Fondrie
- Talus Bioscience550 17th Avenue, Seattle, Washington 98122, United States
| | - Charles Grant
- Department of Genome Sciences, University of Washington3720 15th Avenue NE, Seattle, Washington 98195, United States
| | - Michael R Hoopmann
- Insititute for Systems Biology, 401 Terry Avenue N, Seattle, Washington 98109, United States
| | - Andy Lin
- Department of Genome Sciences, University of Washington3720 15th Avenue NE, Seattle, Washington 98195, United States
| | - Yang Y Lu
- Department of Genome Sciences, University of Washington3720 15th Avenue NE, Seattle, Washington 98195, United States
| | - Robert L Moritz
- Insititute for Systems Biology, 401 Terry Avenue N, Seattle, Washington 98109, United States
| | - Michael J MacCoss
- Department of Genome Sciences, University of Washington3720 15th Avenue NE, Seattle, Washington 98195, United States
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington3720 15th Avenue NE, Seattle, Washington 98195, United States.,Paul G. Allen School of Computer Science and Engineering, University of Washington185 E Stevens Way NE, Seattle, Washington 98195-2350, United States
| |
Collapse
|
28
|
Wang X, Pennello G, deSouza NM, Huang EP, Buckler AJ, Barnhart HX, Delfino JG, Raunig DL, Wang L, Guimaraes AR, Hall TJ, Obuchowski NA. Multiparametric Data-driven Imaging Markers: Guidelines for Development, Application and Reporting of Model Outputs in Radiomics. Acad Radiol 2023; 30:215-229. [PMID: 36411153 PMCID: PMC9825652 DOI: 10.1016/j.acra.2022.10.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 09/21/2022] [Accepted: 10/01/2022] [Indexed: 11/19/2022]
Abstract
This paper is the fifth in a five-part series on statistical methodology for performance assessment of multi-parametric quantitative imaging biomarkers (mpQIBs) for radiomic analysis. Radiomics is the process of extracting visually imperceptible features from radiographic medical images using data-driven algorithms. We refer to the radiomic features as data-driven imaging markers (DIMs), which are quantitative measures discovered under a data-driven framework from images beyond visual recognition but evident as patterns of disease processes irrespective of whether or not ground truth exists for the true value of the DIM. This paper aims to set guidelines on how to build machine learning models using DIMs in radiomics and to apply and report them appropriately. We provide a list of recommendations, named RANDAM (an abbreviation of "Radiomic ANalysis and DAta Modeling"), for analysis, modeling, and reporting in a radiomic study to make machine learning analyses in radiomics more reproducible. RANDAM contains five main components to use in reporting radiomics studies: design, data preparation, data analysis and modeling, reporting, and material availability. Real case studies in lung cancer research are presented along with simulation studies to compare different feature selection methods and several validation strategies.
Collapse
Affiliation(s)
- Xiaofeng Wang
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, 9500 Euclid Ave/JJN3, Cleveland, OH 44195.
| | - Gene Pennello
- Center for Devices and Radiological Health, US Food and Drug Administration Division of Imaging, Diagnostic and Software Reliability, Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, Maryland
| | - Nandita M deSouza
- Division of Radiotherapy and Imaging, The Institute of Cancer Research and Royal Marsden Hospital, London, United Kingdom; European Imaging Biomarkers Alliance, European Society of Radiology, London, UK
| | - Erich P Huang
- Division of Cancer Treatment and Diagnosis, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | | | - Huiman X Barnhart
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina
| | - Jana G Delfino
- Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, Maryland
| | - David L Raunig
- Data Science Institute, Statistical and Quantitative Sciences, Takeda Pharmaceuticals America Inc, Lexington, Massachusetts
| | - Lu Wang
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, 9500 Euclid Ave/JJN3, Cleveland, OH 44195
| | - Alexander R Guimaraes
- Department of Diagnostic Radiology, Oregon Health & Sciences University, Portland, Oregon
| | - Timothy J Hall
- Department of Medical Physics, University of Wisconsin, Madison, Wisconsin
| | - Nancy A Obuchowski
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, 9500 Euclid Ave/JJN3, Cleveland, OH 44195
| |
Collapse
|
29
|
Novakovsky G, Dexter N, Libbrecht MW, Wasserman WW, Mostafavi S. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat Rev Genet 2023; 24:125-137. [PMID: 36192604 DOI: 10.1038/s41576-022-00532-2] [Citation(s) in RCA: 49] [Impact Index Per Article: 49.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/31/2022] [Indexed: 01/24/2023]
Abstract
Artificial intelligence (AI) models based on deep learning now represent the state of the art for making functional predictions in genomics research. However, the underlying basis on which predictive models make such predictions is often unknown. For genomics researchers, this missing explanatory information would frequently be of greater value than the predictions themselves, as it can enable new insights into genetic processes. We review progress in the emerging area of explainable AI (xAI), a field with the potential to empower life science researchers to gain mechanistic insights into complex deep learning models. We discuss and categorize approaches for model interpretation, including an intuitive understanding of how each approach works and their underlying assumptions and limitations in the context of typical high-throughput biological datasets.
Collapse
Affiliation(s)
- Gherman Novakovsky
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, British Columbia, Canada.,Bioinformatics Graduate Program, University of British Columbia, Vancouver, British Columbia, Canada
| | - Nick Dexter
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada.,School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Maxwell W Libbrecht
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada.
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA. .,Canadian Institute for Advanced Research, Toronto, Ontario, Canada.
| |
Collapse
|
30
|
Heil BJ, Greene CS. The Field-Dependent Nature of PageRank Values in Citation Networks. bioRxiv 2023:2023.01.05.522943. [PMID: 36711900 PMCID: PMC9881996 DOI: 10.1101/2023.01.05.522943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
The value of scientific research can be easier to assess at the collective level than at the level of individual contributions. Several journal-level and article-level metrics aim to measure the importance of journals or individual manuscripts. However, many are citation-based and citation practices vary between fields. To account for these differences, scientists have devised normalization schemes to make metrics more comparable across fields. We use PageRank as an example metric and examine the extent to which field-specific citation norms drive estimated importance differences. In doing so, we recapitulate differences in journal and article PageRanks between fields. We also find that manuscripts shared between fields have different PageRanks depending on which field's citation network the metric is calculated in. We implement a degree-preserving graph shuffling algorithm to generate a null distribution of similar networks and find differences more likely attributed to field-specific preferences than citation norms. Our results suggest that while differences exist between fields' metric distributions, applying metrics in a field-aware manner rather than using normalized global metrics avoids losing important information about article preferences. They also imply that assigning a single importance value to a manuscript may not be a useful construct, as the importance of each manuscript varies by the reader's field.
Collapse
Affiliation(s)
- Benjamin J. Heil
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania
| | - Casey S. Greene
- Department of Pharmacology, University of Colorado School of Medicine; Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine
| |
Collapse
|
31
|
Oza VH, Whitlock JH, Wilk EJ, Uno-Antonison A, Wilk B, Gajapathy M, Howton TC, Trull A, Ianov L, Worthey EA, Lasseigne BN. Ten simple rules for using public biological data for your research. PLoS Comput Biol 2023; 19:e1010749. [PMID: 36602970 DOI: 10.1371/journal.pcbi.1010749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
With an increasing amount of biological data available publicly, there is a need for a guide on how to successfully download and use this data. The 10 simple rules for using public biological data are: (1) use public data purposefully in your research; (2) evaluate data for your use case; (3) check data reuse requirements and embargoes; (4) be aware of ethics for data reuse; (5) plan for data storage and compute requirements; (6) know what you are downloading; (7) download programmatically and verify integrity; (8) properly cite data; (9) make reprocessed data and models Findable, Accessible, Interoperable, and Reusable (FAIR) and share; and (10) make pipelines and code FAIR and share. These rules are intended as a guide for researchers wanting to make use of available data and to increase data reuse and reproducibility.
Collapse
Affiliation(s)
- Vishal H Oza
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Jordan H Whitlock
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Elizabeth J Wilk
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Angelina Uno-Antonison
- Center for Computational Genomics and Data Sciences, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pediatrics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pathology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Brandon Wilk
- Center for Computational Genomics and Data Sciences, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pediatrics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pathology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Manavalan Gajapathy
- Center for Computational Genomics and Data Sciences, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pediatrics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pathology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Timothy C Howton
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Austyn Trull
- Center for Computational Genomics and Data Sciences, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pediatrics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pathology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Lara Ianov
- Civitan International Research Center, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Elizabeth A Worthey
- Center for Computational Genomics and Data Sciences, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pediatrics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pathology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Brittany N Lasseigne
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| |
Collapse
|
32
|
Belfield SJ, Cronin MTD, Enoch SJ, Firman JW. Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs). PLoS One 2023; 18:e0282924. [PMID: 37163504 PMCID: PMC10171609 DOI: 10.1371/journal.pone.0282924] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 02/26/2023] [Indexed: 05/12/2023] Open
Abstract
Recent years have seen a substantial growth in the adoption of machine learning approaches for the purposes of quantitative structure-activity relationship (QSAR) development. Such a trend has coincided with desire to see a shifting in the focus of methodology employed within chemical safety assessment: away from traditional reliance upon animal-intensive in vivo protocols, and towards increased application of in silico (or computational) predictive toxicology. With QSAR central amongst techniques applied in this area, the emergence of algorithms trained through machine learning with the objective of toxicity estimation has, quite naturally, arisen. On account of the pattern-recognition capabilities of the underlying methods, the statistical power of the ensuing models is potentially considerable-appropriate for the handling even of vast, heterogeneous datasets. However, such potency comes at a price: this manifesting as the general practical deficits observed with respect to the reproducibility, interpretability and generalisability of the resulting tools. Unsurprisingly, these elements have served to hinder broader uptake (most notably within a regulatory setting). Areas of uncertainty liable to accompany (and hence detract from applicability of) toxicological QSAR have previously been highlighted, accompanied by the forwarding of suggestions for "best practice" aimed at mitigation of their influence. However, the scope of such exercises has remained limited to "classical" QSAR-that conducted through use of linear regression and related techniques, with the adoption of comparatively few features or descriptors. Accordingly, the intention of this study has been to extend the remit of best practice guidance, so as to address concerns specific to employment of machine learning within the field. In doing so, the impact of strategies aimed at enhancing the transparency (feature importance, feature reduction), generalisability (cross-validation) and predictive power (hyperparameter optimisation) of algorithms, trained upon real toxicity data through six common learning approaches, is evaluated.
Collapse
Affiliation(s)
- Samuel J Belfield
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - Mark T D Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - Steven J Enoch
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - James W Firman
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| |
Collapse
|
33
|
Dulyan L, Talozzi L, Pacella V, Corbetta M, Forkel SJ, Thiebaut de Schotten M. Longitudinal prediction of motor dysfunction after stroke: a disconnectome study. Brain Struct Funct 2022; 227:3085-3098. [PMID: 36334132 PMCID: PMC9653357 DOI: 10.1007/s00429-022-02589-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 10/20/2022] [Indexed: 06/01/2023]
Abstract
Motricity is the most commonly affected ability after a stroke. While many clinical studies attempt to predict motor symptoms at different chronic time points after a stroke, longitudinal acute-to-chronic studies remain scarce. Taking advantage of recent advances in mapping brain disconnections, we predict motor outcomes in 62 patients assessed longitudinally two weeks, three months, and one year after their stroke. Results indicate that brain disconnection patterns accurately predict motor impairments. However, disconnection patterns leading to impairment differ between the three-time points and between left and right motor impairments. These results were cross-validated using resampling techniques. In sum, we demonstrated that while some neuroplasticity mechanisms exist changing the structure-function relationship, disconnection patterns prevail when predicting motor impairment at different time points after stroke.
Collapse
Affiliation(s)
- Lilit Dulyan
- Groupe d'Imagerie Neurofonctionnelle, Institut Des Maladies Neurodégénératives-UMR 5293, CNRS, CEA, University of Bordeaux, Bordeaux, France.
- Brain Connectivity and Behaviour Laboratory, Sorbonne University, Paris, France.
- Donders Centre for Brain Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands.
| | - Lia Talozzi
- Groupe d'Imagerie Neurofonctionnelle, Institut Des Maladies Neurodégénératives-UMR 5293, CNRS, CEA, University of Bordeaux, Bordeaux, France
- Brain Connectivity and Behaviour Laboratory, Sorbonne University, Paris, France
| | - Valentina Pacella
- Groupe d'Imagerie Neurofonctionnelle, Institut Des Maladies Neurodégénératives-UMR 5293, CNRS, CEA, University of Bordeaux, Bordeaux, France
- Brain Connectivity and Behaviour Laboratory, Sorbonne University, Paris, France
| | - Maurizio Corbetta
- Clinica Neurologica, Department of Neuroscience, University of Padova, Padua, Italy
- Padova Neuroscience Center (PNC), University of Padova, Padua, Italy
- Venetian Institute of Molecular Medicine, VIMM, Padua, Italy
| | - Stephanie J Forkel
- Groupe d'Imagerie Neurofonctionnelle, Institut Des Maladies Neurodégénératives-UMR 5293, CNRS, CEA, University of Bordeaux, Bordeaux, France.
- Brain Connectivity and Behaviour Laboratory, Sorbonne University, Paris, France.
- Centre for Neuroimaging Sciences, Department of Neuroimaging, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
- Donders Centre for Brain Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands.
- Department of Neurosurgery, School of Medicine, Technical University of Munich, Munich, Germany.
| | - Michel Thiebaut de Schotten
- Groupe d'Imagerie Neurofonctionnelle, Institut Des Maladies Neurodégénératives-UMR 5293, CNRS, CEA, University of Bordeaux, Bordeaux, France.
- Brain Connectivity and Behaviour Laboratory, Sorbonne University, Paris, France.
| |
Collapse
|
34
|
Mincu D, Roy S. Developing robust benchmarks for driving forward AI innovation in healthcare. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00559-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
35
|
Goedmakers C, Pereboom L, Schoones J, de Leeuw den Bouter M, Remis R, Staring M, Vleggeert-Lankamp C. Machine learning for image analysis in the cervical spine: Systematic review of the available models and methods. Brain Spine 2022; 2:101666. [PMID: 36506292 PMCID: PMC9729832 DOI: 10.1016/j.bas.2022.101666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 09/12/2022] [Accepted: 10/28/2022] [Indexed: 11/16/2022]
Abstract
•Neural network approaches show the most potential for automated image analysis of thecervical spine.•Fully automatic convolutional neural network (CNN) models are promising Deep Learning methods for segmentation.•In cervical spine analysis, the biomechanical features are most often studied using finiteelement models.•The application of artificial neural networks and support vector machine models looks promising for classification purposes.•This article provides an overview of the methods for research on computer aided imaging diagnostics of the cervical spine.
Collapse
Affiliation(s)
- C.M.W. Goedmakers
- Department of Neurosurgery, Leiden University Medical Center, Leiden, the Netherlands,Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA,Corresponding author. Department of Neurosurgery, Albinusdreef 2, 2300 RC, Leiden, the Netherlands.
| | - L.M. Pereboom
- Faculty of Mechanical, Maritime and Materials Engineering (3mE), Delft University of Technology, Delft, the Netherlands
| | - J.W. Schoones
- Walaeus Library, Leiden University Medical Center, Leiden, the Netherlands
| | - M.L. de Leeuw den Bouter
- Delft Institute of Applied Mathematics, Department of Numerical Analysis, Delft University of Technology, Delft, the Netherlands
| | - R.F. Remis
- Circuits and Systems Group, Microelectronics Department, Delft University of Technology, Delft, the Netherlands
| | - M. Staring
- Department of Radiology, Leiden University Medical Center, Leiden, the Netherlands,Intelligent Systems Department, Delft University of Technology, Delft, the Netherlands
| | - C.L.A. Vleggeert-Lankamp
- Department of Neurosurgery, Leiden University Medical Center, Leiden, the Netherlands,Department of Neurosurgery Haaglanden Medical Centre and HAGA Teaching Hospitals, The Hague, the Netherlands,Department of Neurosurgery, Spaarne Gasthuis Haarlem/Hoofddorp, the Netherlands
| |
Collapse
|
36
|
Thanapalasingam T, van Berkel L, Bloem P, Groth P. Relational graph convolutional networks: a closer look. PeerJ Comput Sci 2022; 8:e1073. [PMID: 36426239 PMCID: PMC9680895 DOI: 10.7717/peerj-cs.1073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 07/28/2022] [Indexed: 06/16/2023]
Abstract
In this article, we describe a reproduction of the Relational Graph Convolutional Network (RGCN). Using our reproduction, we explain the intuition behind the model. Our reproduction results empirically validate the correctness of our implementations using benchmark Knowledge Graph datasets on node classification and link prediction tasks. Our explanation provides a friendly understanding of the different components of the RGCN for both users and researchers extending the RGCN approach. Furthermore, we introduce two new configurations of the RGCN that are more parameter efficient. The code and datasets are available at https://github.com/thiviyanT/torch-rgcn.
Collapse
Affiliation(s)
- Thiviyan Thanapalasingam
- University of Amsterdam, Amsterdam, Noord Holland, Netherlands
- VU University Amsterdam, Amsterdam, Noord Holland, Netherlands
| | | | - Peter Bloem
- VU University Amsterdam, Amsterdam, Noord Holland, Netherlands
| | - Paul Groth
- University of Amsterdam, Amsterdam, Noord Holland, Netherlands
| |
Collapse
|
37
|
Robitaille MC, Byers JM, Christodoulides JA, Raphael MP. Self-supervised machine learning for live cell imagery segmentation. Commun Biol 2022; 5:1162. [PMID: 36323790 DOI: 10.1038/s42003-022-04117-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 10/14/2022] [Indexed: 11/05/2022] Open
Abstract
Segmenting single cells is a necessary process for extracting quantitative data from biological microscopy imagery. The past decade has seen the advent of machine learning (ML) methods to aid in this process, the overwhelming majority of which fall under supervised learning (SL) which requires vast libraries of pre-processed, human-annotated labels to train the ML algorithms. Such SL pre-processing is labor intensive, can introduce bias, varies between end-users, and has yet to be shown capable of robust models to be effectively utilized throughout the greater cell biology community. Here, to address this pre-processing problem, we offer a self-supervised learning (SSL) approach that utilizes cellular motion between consecutive images to self-train a ML classifier, enabling cell and background segmentation without the need for adjustable parameters or curated imagery. By leveraging motion, we achieve accurate segmentation that trains itself directly on end-user data, is independent of optical modality, outperforms contemporary SL methods, and does so in a completely automated fashion—thus eliminating end-user variability and bias. To the best of our knowledge, this SSL algorithm represents a first of its kind effort and has appealing features that make it an ideal segmentation tool candidate for the broader cell biology research community. A self-supervised learning approach uses cellular motion between consecutive images to self-train a machine learning classifier for cell segmentation.
Collapse
|
38
|
Bárcenas O, Pintado-grima C, Sidorczuk K, Teufel F, Nielsen H, Ventura S, Burdukiewicz M. The dynamic landscape of peptide activity prediction. Comput Struct Biotechnol J 2022. [DOI: 10.1016/j.csbj.2022.11.043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 11/21/2022] [Accepted: 11/21/2022] [Indexed: 11/27/2022] Open
|
39
|
Qureshi AS, Roos T. Transfer Learning with Ensembles of Deep Neural Networks for Skin Cancer Detection in Imbalanced Data Sets. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-11049-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
Abstract
AbstractEarly diagnosis plays a key role in prevention and treatment of skin cancer. Several machine learning techniques for accurate detection of skin cancer from medical images have been reported. Many of these techniques are based on pre-trained convolutional neural networks (CNNs), which enable training the models based on limited amounts of training data. However, the classification accuracy of these models still tends to be severely limited by the scarcity of representative images from malignant tumours. We propose a novel ensemble-based convolutional neural network (CNN) architecture where multiple CNN models, some of which are pre-trained and some are trained only on the data at hand, along with auxiliary data in the form of metadata associated with the input images, are combined using a meta-learner. The proposed approach improves the model’s ability to handle limited and imbalanced data. We demonstrate the benefits of the proposed technique using a dataset with 33,126 dermoscopic images from 2056 patients. We evaluate the performance of the proposed technique in terms of the F1-measure, area under the ROC curve (AUC-ROC), and area under the PR-curve (AUC-PR), and compare it with that of seven different benchmark methods, including two recent CNN-based techniques. The proposed technique compares favourably in terms of all the evaluation metrics.
Collapse
|
40
|
Bernau CR, Knödler M, Emonts J, Jäpel RC, Buyel JF. The use of predictive models to develop chromatography-based purification processes. Front Bioeng Biotechnol 2022; 10:1009102. [PMID: 36312533 PMCID: PMC9605695 DOI: 10.3389/fbioe.2022.1009102] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 09/23/2022] [Indexed: 11/13/2022] Open
Abstract
Chromatography is the workhorse of biopharmaceutical downstream processing because it can selectively enrich a target product while removing impurities from complex feed streams. This is achieved by exploiting differences in molecular properties, such as size, charge and hydrophobicity (alone or in different combinations). Accordingly, many parameters must be tested during process development in order to maximize product purity and recovery, including resin and ligand types, conductivity, pH, gradient profiles, and the sequence of separation operations. The number of possible experimental conditions quickly becomes unmanageable. Although the range of suitable conditions can be narrowed based on experience, the time and cost of the work remain high even when using high-throughput laboratory automation. In contrast, chromatography modeling using inexpensive, parallelized computer hardware can provide expert knowledge, predicting conditions that achieve high purity and efficient recovery. The prediction of suitable conditions in silico reduces the number of empirical tests required and provides in-depth process understanding, which is recommended by regulatory authorities. In this article, we discuss the benefits and specific challenges of chromatography modeling. We describe the experimental characterization of chromatography devices and settings prior to modeling, such as the determination of column porosity. We also consider the challenges that must be overcome when models are set up and calibrated, including the cross-validation and verification of data-driven and hybrid (combined data-driven and mechanistic) models. This review will therefore support researchers intending to establish a chromatography modeling workflow in their laboratory.
Collapse
Affiliation(s)
- C. R. Bernau
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Aachen, Germany
| | - M. Knödler
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Aachen, Germany
- Institute for Molecular Biotechnology, RWTH Aachen University, Aachen, Germany
| | - J. Emonts
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Aachen, Germany
| | - R. C. Jäpel
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Aachen, Germany
- Institute for Molecular Biotechnology, RWTH Aachen University, Aachen, Germany
| | - J. F. Buyel
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Aachen, Germany
- Institute for Molecular Biotechnology, RWTH Aachen University, Aachen, Germany
- University of Natural Resources and Life Sciences, Vienna (BOKU), Department of Biotechnology (DBT), Institute of Bioprocess Science and Engineering (IBSE), Vienna, Austria
- *Correspondence: J. F. Buyel,
| |
Collapse
|
41
|
Gerard D. Comment on three papers about Hardy–Weinberg equilibrium tests in autopolyploids. Front Genet 2022; 13:1027209. [PMID: 36267399 PMCID: PMC9576855 DOI: 10.3389/fgene.2022.1027209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 09/12/2022] [Indexed: 12/04/2022] Open
|
42
|
Chen BH. Minimum standards for evaluating machine-learned models of high-dimensional data. Front Aging 2022; 3:901841. [PMID: 36176975 PMCID: PMC9513121 DOI: 10.3389/fragi.2022.901841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 08/05/2022] [Indexed: 11/13/2022]
Abstract
The maturation of machine learning and technologies that generate high dimensional data have led to the growth in the number of predictive models, such as the “epigenetic clock”. While powerful, machine learning algorithms run a high risk of overfitting, particularly when training data is limited, as is often the case with high-dimensional data (“large p, small n”). Making independent validation a requirement of “algorithmic biomarker” development would bring greater clarity to the field by more efficiently identifying prediction or classification models to prioritize for further validation and characterization. Reproducibility has been a mainstay in science, but only recently received attention in defining its various aspects and how to apply these principles to machine learning models. The goal of this paper is merely to serve as a call-to-arms for greater rigor and attention paid to newly developed models for prediction or classification.
Collapse
Affiliation(s)
- Brian H. Chen
- FOXO Technologies Inc, Minneapolis, MN, United States
- The Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA, United States
- *Correspondence: Brian H. Chen,
| |
Collapse
|
43
|
Babier A, Mahmood R, Zhang B, Alves VGL, Barragán-Montero AM, Beaudry J, Cardenas CE, Chang Y, Chen Z, Chun J, Diaz K, Eraso HD, Faustmann E, Gaj S, Gay S, Gronberg M, Guo B, He J, Heilemann G, Hira S, Huang Y, Ji F, Jiang D, Giraldo JCJ, Lee H, Lian J, Liu S, Liu KC, Marrugo J, Miki K, Nakamura K, Netherton T, Nguyen D, Nourzadeh H, Osman AFI, Peng Z, Muñoz JDQ, Ramsl C, Rhee DJ, Rodriguez JD, Shan H, Siebers JV, Soomro MH, Sun K, Hoyos AU, Valderrama C, Verbeek R, Wang E, Willems S, Wu Q, Xu X, Yang S, Yuan L, Zhu S, Zimmermann L, Moore KL, Purdie TG, McNiven AL, Chan TCY. OpenKBP-Opt: an international and reproducible evaluation of 76 knowledge-based planning pipelines. Phys Med Biol 2022; 67:10.1088/1361-6560/ac8044. [PMID: 36093921 PMCID: PMC10696540 DOI: 10.1088/1361-6560/ac8044] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Accepted: 07/11/2022] [Indexed: 11/12/2022]
Abstract
Objective.To establish an open framework for developing plan optimization models for knowledge-based planning (KBP).Approach.Our framework includes radiotherapy treatment data (i.e. reference plans) for 100 patients with head-and-neck cancer who were treated with intensity-modulated radiotherapy. That data also includes high-quality dose predictions from 19 KBP models that were developed by different research groups using out-of-sample data during the OpenKBP Grand Challenge. The dose predictions were input to four fluence-based dose mimicking models to form 76 unique KBP pipelines that generated 7600 plans (76 pipelines × 100 patients). The predictions and KBP-generated plans were compared to the reference plans via: the dose score, which is the average mean absolute voxel-by-voxel difference in dose; the deviation in dose-volume histogram (DVH) points; and the frequency of clinical planning criteria satisfaction. We also performed a theoretical investigation to justify our dose mimicking models.Main results.The range in rank order correlation of the dose score between predictions and their KBP pipelines was 0.50-0.62, which indicates that the quality of the predictions was generally positively correlated with the quality of the plans. Additionally, compared to the input predictions, the KBP-generated plans performed significantly better (P< 0.05; one-sided Wilcoxon test) on 18 of 23 DVH points. Similarly, each optimization model generated plans that satisfied a higher percentage of criteria than the reference plans, which satisfied 3.5% more criteria than the set of all dose predictions. Lastly, our theoretical investigation demonstrated that the dose mimicking models generated plans that are also optimal for an inverse planning model.Significance.This was the largest international effort to date for evaluating the combination of KBP prediction and optimization models. We found that the best performing models significantly outperformed the reference dose and dose predictions. In the interest of reproducibility, our data and code is freely available.
Collapse
Affiliation(s)
- Aaron Babier
- Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Rafid Mahmood
- Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON, Canada
| | - Binghao Zhang
- Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON, Canada
| | - Victor G L Alves
- Department of Radiation Oncology, University of Virginia Health System, Charlottesville, VA, United States of America
| | | | - Joel Beaudry
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, United States of America
| | - Carlos E Cardenas
- Department of Radiation Oncology, The University of Alabama at Birmingham, Birmingham, AL, United States of America
| | - Yankui Chang
- Department of Engineering and Applied Physics, University of Science and Technology of China, Hefei, People’s Republic of China
| | - Zijie Chen
- Shenying Medical Technology Co., Ltd., Shenzhen, Guangdong, People’s Republic of China
| | - Jaehee Chun
- Department of Radiation Oncology, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Kelly Diaz
- Department of Physics, National University of Colombia, Medellín, Colombia
| | - Harold David Eraso
- Department of Physics, National University of Colombia, Medellín, Colombia
| | - Erik Faustmann
- Atominstitut, Vienna University of Technology, Vienna, Austria
| | - Sibaji Gaj
- Department of Biomedical Engineering, Cleveland Clinic, Cleveland, OH, United States of America
| | - Skylar Gay
- Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States of America
| | - Mary Gronberg
- Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States of America
| | - Bingqi Guo
- Department of Radiation Oncology, Cleveland Clinic, Cleveland, OH, United States of America
| | - Junjun He
- Department of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, People’s Republic of China
| | - Gerd Heilemann
- Department of Radiation Oncology, Medical University of Vienna, Vienna, Austria
| | - Sanchit Hira
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States of America
| | - Yuliang Huang
- Department of Radiation Oncology, Peking University Cancer Hospital and Institute, Beijing, People’s Republic of China
| | - Fuxin Ji
- Department of Electrical Engineering and Automation, Anhui University, Hefei, People’s Republic of China
| | - Dashan Jiang
- Department of Electrical Engineering and Automation, Anhui University, Hefei, People’s Republic of China
| | | | - Hoyeon Lee
- Department of Radiation Oncology, Massachusetts General Hospital, Boston, MA, United States of America
| | - Jun Lian
- Department of Radiation Oncology, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States of America
| | - Shuolin Liu
- Department of Electrical Engineering and Automation, Anhui University, Hefei, People’s Republic of China
| | - Keng-Chi Liu
- Department of Medical Imaging, Taiwan AI Labs, Taipei, Taiwan
| | - José Marrugo
- Department of Physics, National University of Colombia, Medellín, Colombia
| | - Kentaro Miki
- Department Of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Kunio Nakamura
- Department of Biomedical Engineering, Cleveland Clinic, Cleveland, OH, United States of America
| | - Tucker Netherton
- Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States of America
| | - Dan Nguyen
- Medical Artificial Intelligence and Automation (MAIA) Laboratory, Department of Radiation Oncology, The University of Texas Southwestern Medical Center, Dallas, TX, United States of America
| | - Hamidreza Nourzadeh
- Department of Radiation Oncology, Thomas Jefferson University, Philadelphia, PA, United States of America
| | | | - Zhao Peng
- Department of Engineering and Applied Physics, University of Science and Technology of China, Hefei, People’s Republic of China
| | | | - Christian Ramsl
- Atominstitut, Vienna University of Technology, Vienna, Austria
| | - Dong Joo Rhee
- Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States of America
| | | | - Hongming Shan
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, People’s Republic of China
| | - Jeffrey V Siebers
- Department of Radiation Oncology, University of Virginia Health System, Charlottesville, VA, United States of America
| | - Mumtaz H Soomro
- Department of Radiation Oncology, University of Virginia Health System, Charlottesville, VA, United States of America
| | - Kay Sun
- Studio Vodels, Atlanta, GA, United States of America
| | - Andrés Usuga Hoyos
- Department of Physics, National University of Colombia, Medellín, Colombia
| | - Carlos Valderrama
- Department of Physics, National University of Colombia, Medellín, Colombia
| | - Rob Verbeek
- Department Computer Science, Aalto University, Espoo, Finland
| | - Enpei Wang
- Shenying Medical Technology Co., Ltd., Shenzhen, Guangdong, People’s Republic of China
| | - Siri Willems
- Department of Electrical Engineering, KULeuven, Leuven, Belgium
| | - Qi Wu
- Department of Electrical Engineering and Automation, Anhui University, Hefei, People’s Republic of China
| | - Xuanang Xu
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY, United States of America
| | - Sen Yang
- Tencent AI Lab, Shenzhen, Guangdong, People’s Republic of China
| | - Lulin Yuan
- Department of Radiation Oncology, Virginia Commonwealth University Medical Center, Richmond, VA, United States of America
| | - Simeng Zhu
- Department of Radiation Oncology, Henry Ford Health System, Detroit, MI, United States of America
| | - Lukas Zimmermann
- Faculty of Health, University of Applied Sciences Wiener Neustadt, Wiener Neustadt, Austria
- Competence Center for Preclinical Imaging and Biomedical Engineering, University of Applied Sciences Wiener Neustadt, Wiener Neustadt, Austria
| | - Kevin L Moore
- Department of Radiation Oncology, University of California, San Diego, La Jolla, CA, United States of America
| | - Thomas G Purdie
- Radiation Medicine Program, UHN Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Radiation Oncology, University of Toronto, Toronto, ON, Canada
- Techna Institute for the Advancement of Technology for Health, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Andrea L McNiven
- Radiation Medicine Program, UHN Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Radiation Oncology, University of Toronto, Toronto, ON, Canada
| | - Timothy C Y Chan
- Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
- Techna Institute for the Advancement of Technology for Health, Toronto, ON, Canada
| |
Collapse
|
44
|
Couckuyt A, Seurinck R, Emmaneel A, Quintelier K, Novak D, Van Gassen S, Saeys Y. Challenges in translational machine learning. Hum Genet 2022; 141:1451-1466. [PMID: 35246744 PMCID: PMC8896412 DOI: 10.1007/s00439-022-02439-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Accepted: 02/08/2022] [Indexed: 11/25/2022]
Abstract
Machine learning (ML) algorithms are increasingly being used to help implement clinical decision support systems. In this new field, we define as "translational machine learning", joint efforts and strong communication between data scientists and clinicians help to span the gap between ML and its adoption in the clinic. These collaborations also improve interpretability and trust in translational ML methods and ultimately aim to result in generalizable and reproducible models. To help clinicians and bioinformaticians refine their translational ML pipelines, we review the steps from model building to the use of ML in the clinic. We discuss experimental setup, computational analysis, interpretability and reproducibility, and emphasize the challenges involved. We highly advise collaboration and data sharing between consortia and institutes to build multi-centric cohorts that facilitate ML methodologies that generalize across centers. In the end, we hope that this review provides a way to streamline translational ML and helps to tackle the challenges that come with it.
Collapse
Affiliation(s)
- Artuur Couckuyt
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Ruth Seurinck
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Annelies Emmaneel
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Katrien Quintelier
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
- Department of Pulmonary Diseases, Erasmus MC, Rotterdam, The Netherlands
| | - David Novak
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Sofie Van Gassen
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Yvan Saeys
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium.
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium.
| |
Collapse
|
45
|
Hohlbein J, Diederich B, Marsikova B, Reynaud EG, Holden S, Jahr W, Haase R, Prakash K. Open microscopy in the life sciences: quo vadis? Nat Methods 2022; 19:1020-1025. [PMID: 36008630 DOI: 10.1038/s41592-022-01602-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Johannes Hohlbein
- Laboratory of Biophysics, Wageningen University & Research, Wageningen, The Netherlands. .,Microspectroscopy Research Facility, Wageningen University & Research, Wageningen, The Netherlands.
| | - Benedict Diederich
- Leibniz Institute for Photonic Technology, Jena, Germany.,Institute for Physical Chemistry, Friedrich-Schiller University, Jena, Germany
| | | | - Emmanuel G Reynaud
- School of Biomolecular and Biomedical Sciences, University College Dublin, Dublin, Ireland
| | - Séamus Holden
- School of Life Sciences, The University of Warwick, Coventry, UK
| | - Wiebke Jahr
- In-Vision Technologies AG, Guntramsdorf, Austria
| | - Robert Haase
- DFG Cluster of Excellence Physics of Life, TU Dresden, Dresden, Germany
| | - Kirti Prakash
- National Physical Laboratory, Teddington, UK.,Integrated Pathology Unit, Centre for Molecular Pathology, The Royal Marsden Trust and Institute of Cancer Research, Sutton, UK
| |
Collapse
|
46
|
Sidorczuk K, Gagat P, Pietluch F, Kała J, Rafacz D, Bąkała L, Słowik J, Kolenda R, Rödiger S, Fingerhut LCHW, Cooke IR, Mackiewicz P, Burdukiewicz M. Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data. Brief Bioinform 2022; 23:6672903. [PMID: 35988923 PMCID: PMC9487607 DOI: 10.1093/bib/bbac343] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 07/07/2022] [Accepted: 07/25/2022] [Indexed: 12/29/2022] Open
Abstract
Antimicrobial peptides (AMPs) are a heterogeneous group of short polypeptides that target not only microorganisms but also viruses and cancer cells. Due to their lower selection for resistance compared with traditional antibiotics, AMPs have been attracting the ever-growing attention from researchers, including bioinformaticians. Machine learning represents the most cost-effective method for novel AMP discovery and consequently many computational tools for AMP prediction have been recently developed. In this article, we investigate the impact of negative data sampling on model performance and benchmarking. We generated 660 predictive models using 12 machine learning architectures, a single positive data set and 11 negative data sampling methods; the architectures and methods were defined on the basis of published AMP prediction software. Our results clearly indicate that similar training and benchmark data set, i.e. produced by the same or a similar negative data sampling method, positively affect model performance. Consequently, all the benchmark analyses that have been performed for AMP prediction models are significantly biased and, moreover, we do not know which model is the most accurate. To provide researchers with reliable information about the performance of AMP predictors, we also created a web server AMPBenchmark for fair model benchmarking. AMPBenchmark is available at http://BioGenies.info/AMPBenchmark.
Collapse
Affiliation(s)
| | | | | | - Jakub Kała
- Warsaw University of Technology, Faculty of Mathematics and Information Science, Poland
| | - Dominik Rafacz
- Warsaw University of Technology, Faculty of Mathematics and Information Science, Poland
| | - Laura Bąkała
- Warsaw University of Technology, Faculty of Mathematics and Information Science, Poland
| | - Jadwiga Słowik
- Warsaw University of Technology, Faculty of Mathematics and Information Science, Poland
| | - Rafał Kolenda
- Quadram Institute Biosciences, Norwich Research Park, Norwich, United Kingdom,Wrocław University of Environmental and Life Sciences, Faculty of Veterinary Medicine, Poland
| | - Stefan Rödiger
- Brandenburg University of Technology Cottbus-Senftenberg, Faculty of Natural Sciences, Germany
| | - Legana C H W Fingerhut
- Department of Molecular and Cell Biology, Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Australia
| | - Ira R Cooke
- Department of Molecular and Cell Biology, Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Australia
| | | | | |
Collapse
|
47
|
Li R, Sharma V, Thangamani S, Yakimovich A. Open-Source Biomedical Image Analysis Models: A Meta-Analysis and Continuous Survey. Front Bioinform 2022; 2:912809. [PMID: 36304285 PMCID: PMC9580903 DOI: 10.3389/fbinf.2022.912809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 06/13/2022] [Indexed: 12/05/2022] Open
Abstract
Open-source research software has proven indispensable in modern biomedical image analysis. A multitude of open-source platforms drive image analysis pipelines and help disseminate novel analytical approaches and algorithms. Recent advances in machine learning allow for unprecedented improvement in these approaches. However, these novel algorithms come with new requirements in order to remain open source. To understand how these requirements are met, we have collected 50 biomedical image analysis models and performed a meta-analysis of their respective papers, source code, dataset, and trained model parameters. We concluded that while there are many positive trends in openness, only a fraction of all publications makes all necessary elements available to the research community.
Collapse
Affiliation(s)
- Rui Li
- Center for Advanced Systems Understanding (CASUS), Helmholtz-Zentrum Dresden-Rossendorf e. V. (HZDR), Görlitz, Germany
| | - Vaibhav Sharma
- Center for Advanced Systems Understanding (CASUS), Helmholtz-Zentrum Dresden-Rossendorf e. V. (HZDR), Görlitz, Germany
| | - Subasini Thangamani
- Center for Advanced Systems Understanding (CASUS), Helmholtz-Zentrum Dresden-Rossendorf e. V. (HZDR), Görlitz, Germany
| | - Artur Yakimovich
- Center for Advanced Systems Understanding (CASUS), Helmholtz-Zentrum Dresden-Rossendorf e. V. (HZDR), Görlitz, Germany
- Bladder Infection and Immunity Group (BIIG), Department of Renal Medicine, Division of Medicine, University College London, Royal Free Hospital Campus, London, United Kingdom
- Artificial Intelligence for Life Sciences CIC, Dorset, United Kingdom
- Roche Pharma International Informatics, Roche Diagnostics GmbH, Mannheim, Germany
- *Correspondence: Artur Yakimovich,
| |
Collapse
|
48
|
Muñoz-Tamayo R, Nielsen BL, Gagaoua M, Gondret F, Krause ET, Morgavi DP, Olsson IAS, Pastell M, Taghipoor M, Tedeschi L, Veissier I, Nawroth C. Seven steps to enhance Open Science practices in animal science. PNAS Nexus 2022; 1:pgac106. [PMID: 36741429 PMCID: PMC9896936 DOI: 10.1093/pnasnexus/pgac106] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 06/30/2022] [Indexed: 04/14/2023]
Abstract
The Open Science movement aims at ensuring accessibility, reproducibility, and transparency of research. The adoption of Open Science practices in animal science, however, is still at an early stage. To move ahead as a field, we here provide seven practical steps to embrace Open Science in animal science. We hope that this paper contributes to the shift in research practices of animal scientists towards open, reproducible, and transparent science, enabling the field to gain additional public trust and deal with future challenges to guarantee reliable research. Although the paper targets primarily animal science researchers, the steps discussed here are also applicable to other research domains.
Collapse
Affiliation(s)
- Rafael Muñoz-Tamayo
- INRAE, AgroParisTech, Université Paris-Saclay, UMR Modélisation Systémique Appliquée aux Ruminants, 75005 Paris, France
| | - Birte L Nielsen
- Universities Federation for Animal Welfare (UFAW), The Old School, Brewhouse Hill, Wheathampstead, Hertfordshire AL4 8AN, UK
| | | | | | - E Tobias Krause
- Institute of Animal Welfare and Animal Husbandry, Friedrich-Loeffler-Institut, Dörnbergstr. 25/27, 29223 Celle, Germany
| | - Diego P Morgavi
- Université Clermont Auvergne, INRAE, VetAgro Sup, UMR Herbivores, F-63122 Saint-Genes-Champanelle, France
| | - I Anna S Olsson
- i3S—Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Rua Alfredo Allen 208, 4200-180 Porto, Portugal
| | - Matti Pastell
- Natural Resources Institute Finland (Luke), Production Systems, Latokartanonkaari 9, FI-00790 Helsinki, Finland
| | - Masoomeh Taghipoor
- INRAE, AgroParisTech, Université Paris-Saclay, UMR Modélisation Systémique Appliquée aux Ruminants, 75005 Paris, France
| | - Luis Tedeschi
- Department of Animal Science, Texas A&M University, College Station, TX 77843-2471, USA
| | - Isabelle Veissier
- Université Clermont Auvergne, INRAE, VetAgro Sup, UMR Herbivores, F-63122 Saint-Genes-Champanelle, France
| | - Christian Nawroth
- Institute of Behavioural Physiology, Research Institute for Farm Animal Biology (FBN), Wilhelm-Stahl-Allee 2, 18196 Dummerstorf, Germany
| |
Collapse
|
49
|
Abstract
The field of health services research studies the health care system by examining outcomes relevant to patients and clinicians but also health economists and policy makers. Such outcomes often include health care spending, and utilization of care services. Building accurate prediction models using reproducible research practices for health services research is important for evidence-based decision making. Several systematic reviews have summarized prediction models for outcomes relevant to health services research, but these systematic reviews do not present a thorough assessment of reproducibility and research quality of the prediction modelling studies. In the present commentary, we discuss how recent advances in prediction modelling in other medical fields can be applied to health services research. We also describe the current status of prediction modelling in health services research, and we summarize available methodological guidance for the development, update, external validation and systematic appraisal of prediction models.
Collapse
Affiliation(s)
- Lazaros Belbasis
- Meta-Research Innovation Center Berlin, QUEST Center, Berlin Institute of Health, Charité - Universitätsmedizin Berlin, Berlin, Germany.
| | - Orestis A Panagiotou
- Center for Evidence Synthesis in Health, School of Public Health, Brown University, Providence, RI, USA.,Department of Health Services, Policy and Practice, School of Public Health, Brown University, Providence, RI, USA.,Department of Epidemiology, School of Public Health, Brown University, Providence, RI, USA
| |
Collapse
|
50
|
Abstract
SUMMARY Li and colleagues present REFLECT, a computational approach to precision oncology that nominates effective drug combinations by utilizing a diverse compendium of publicly available preclinical and clinical genomic, transcriptomic, and proteomic data. The preliminary validation of the REFLECT system in preclinical and clinical trial settings showcases potential for clinical implementation, although challenges remain. See related article by Li et al., p. 1542 (4).
Collapse
Affiliation(s)
- Trevor J Pugh
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Benjamin Haibe-Kains
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario
| |
Collapse
|