1
|
Munroe ES, Spicer A, Castellvi-Font A, Zalucky A, Dianti J, Graham Linck E, Talisa V, Urner M, Angus DC, Baedorf-Kassis E, Blette B, Bos LD, Buell KG, Casey JD, Calfee CS, Del Sorbo L, Estenssoro E, Ferguson ND, Giblon R, Granholm A, Harhay MO, Heath A, Hodgson C, Houle T, Jiang C, Kramer L, Lawler PR, Leligdowicz A, Li F, Liu K, Maiga A, Maslove D, McArthur C, McAuley DF, Serpa Neto A, Oosthuysen C, Perner A, Prescott HC, Rochwerg B, Sahetya S, Samoilenko M, Schnitzer ME, Seitz KP, Shah F, Shankar-Hari M, Sinha P, Slutsky AS, Qian ET, Webb SA, Young PJ, Zampieri FG, Zarychanski R, Fan E, Semler MW, Churpek M, Goligher EC, Platform of Randomized Adaptive Clinical Trials in Critical Illness (PRACTICAL) investigators, Evidence-based Individualized Treatment Effects (EvITE) Group. Evidence-based personalised medicine in critical care: a framework for quantifying and applying individualised treatment effects in patients who are critically ill. THE LANCET. RESPIRATORY MEDICINE 2025; 13:556-568. [PMID: 40250459 DOI: 10.1016/s2213-2600(25)00054-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Revised: 01/22/2025] [Accepted: 02/11/2025] [Indexed: 04/20/2025]
Abstract
Clinicians aim to provide treatments that will result in the best outcome for each patient. Ideally, treatment decisions are based on evidence from randomised clinical trials. Randomised trials conventionally report an aggregated difference in outcomes between patients in each group, known as an average treatment effect. However, the actual effect of treatment on outcomes (treatment response) can vary considerably between individuals, and can differ substantially from the average treatment effect. This variation in response to treatment between patients-heterogeneity of treatment effect-is particularly important in critical care because common critical care syndromes (eg, sepsis and acute respiratory distress syndrome) are clinically and biologically heterogeneous. Statistical approaches have been developed to analyse heterogeneity of treatment effect and predict individualised treatment effects for each patient. In this Review, we outline a framework for deriving and validating individualised treatment effects and identify challenges to applying individualised treatment effect estimates to inform treatment decisions in clinical care.
Collapse
Affiliation(s)
- Elizabeth S Munroe
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Alexandra Spicer
- Division of Pulmonary and Critical Care, Department of Medicine, University of Wisconsin-Madison, Madison, WI, USA
| | - Andrea Castellvi-Font
- Department of Critical Care, Hospital del Mar, and Critical Illness Research Group (GREPAC), Hospital del Mar Research Institute (IMIM), Barcelona, Spain; Division of Respirology, Department of Medicine, University Health Network, Toronto, ON, Canada; Toronto General Hospital Research Institute, Toronto, ON, Canada
| | - Ann Zalucky
- Department of Critical Care Medicine, Snyder Institute for Chronic Diseases, Cumming School of Medicine, University of Calgary and Alberta Health Services, Foothills Medical Center, Calgary, AB, Canada; Division of Pulmonary, Critical Care, Allergy and Sleep Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Jose Dianti
- Division of Respirology, Department of Medicine, University Health Network, Toronto, ON, Canada; Interdepartmental Division of Critical Care Medicine, University of Toronto, Toronto, ON, Canada
| | - Emma Graham Linck
- School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, USA
| | - Victor Talisa
- Center for Reasearch, Investigation, and Systems Modeling of Acute Illness, Department of Critical Care Medicine, University of Pittsburgh, PA, USA
| | - Martin Urner
- Toronto General Hospital Research Institute, Toronto, ON, Canada; Interdepartmental Division of Critical Care Medicine, University of Toronto, Toronto, ON, Canada; Department of Anesthesiology & Pain Medicine, University of Toronto, Toronto, ON, Canada
| | - Derek C Angus
- Center for Reasearch, Investigation, and Systems Modeling of Acute Illness, Department of Critical Care Medicine, University of Pittsburgh, PA, USA
| | | | - Bryan Blette
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Lieuwe D Bos
- Department of Intensive Care and Laboratory of Experimental Intensive Care and Anesthesiology, Amsterdam, Netherlands
| | - Kevin G Buell
- Division of Pulmonary and Critical Care, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Jonathan D Casey
- Division of Allergy, Pulmonary, and Critical Care Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Carolyn S Calfee
- Division of Pulmonary, Critical Care, Allergy and Sleep Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Lorenzo Del Sorbo
- Division of Respirology, Department of Medicine, University Health Network, Toronto, ON, Canada
| | - Elisa Estenssoro
- Hospital Interzonal San Martin de La Plata, Buenos Aires, Argentina
| | - Niall D Ferguson
- Division of Respirology, Department of Medicine, University Health Network, Toronto, ON, Canada; Toronto General Hospital Research Institute, Toronto, ON, Canada; Interdepartmental Division of Critical Care Medicine, University of Toronto, Toronto, ON, Canada
| | - Rachel Giblon
- Division of Biostatistics, University of Toronto, Toronto, ON, Canada; Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Anders Granholm
- Department of Intensive Care, Copenhagen University Hospital-Rigshospitalet, Copenhagen, Denmark
| | - Michael O Harhay
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Anna Heath
- Division of Biostatistics, University of Toronto, Toronto, ON, Canada; Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Carol Hodgson
- Australian and New Zealand Intensive Care Research Centre, Monash University, Melbourne, VIC, Australia
| | - Timothy Houle
- Department of Anesthesia, Critical Care, and Pain Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Cong Jiang
- Faculté of Pharmacie, Université de Montréal, Montreal, QC, Canada
| | - Lina Kramer
- Department of Intensive Care and Laboratory of Experimental Intensive Care and Anesthesiology, Amsterdam, Netherlands
| | - Patrick R Lawler
- Interdepartmental Division of Critical Care Medicine, University of Toronto, Toronto, ON, Canada; Division of Cardiology, Department of Medicine, McGill University Health Centre, Montreal, QC, Canada
| | - Aleksandra Leligdowicz
- Division of Critial Care Medicine, Department of Medicine, Schulich School of Medicine and Dentistry, Western University, London, ON, Canada
| | - Fan Li
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Kuan Liu
- Institute of Health Policy, Management, and Evaluation, University of Toronto, Toronto, ON, Canada
| | - Amelia Maiga
- Division of Acute Care Surgery, Department of Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
| | - David Maslove
- Department of Critical Care Medicine, Queen's University, Kingston, ON, Canada
| | - Colin McArthur
- Department of Critical Care Medicine, Te Toka Tumai Auckland City Hospital, Auckland, New Zealand
| | - Daniel F McAuley
- Wellcome-Wolfson Institute for Experimental Medicine, Queen's University Belfast, Belfast, Northern Ireland
| | - Ary Serpa Neto
- Department of Intensive Care, Austin Hospital, Melbourne, VIC, Australia; Department of Critical Care Medicine, Hospital Israelita Albert Einstein, Sao Paolo, Brazil; Australian and New Zealand Intensive Care Research Centre, Monash University, Melbourne, VIC, Australia
| | - Charissa Oosthuysen
- Division of Respirology, Department of Medicine, University Health Network, Toronto, ON, Canada
| | - Anders Perner
- Department of Intensive Care, Copenhagen University Hospital-Rigshospitalet, Copenhagen, Denmark
| | - Hallie C Prescott
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA; VA Center for Clinical Management Research, Ann Arbor, MI, USA
| | - Bram Rochwerg
- Department of Medicine, McMaster University, Hamilton, ON, Canada
| | - Sarina Sahetya
- Division of Pulmonary and Critical Care Medicine, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | | | - Mireille E Schnitzer
- Faculté of Pharmacie, Université de Montréal, Montreal, QC, Canada; Department of Social and Preventive Medicine, School of Public Health, Université de Montréal, Montreal, QC, Canada
| | - Kevin P Seitz
- Division of Allergy, Pulmonary, and Critical Care Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Faraaz Shah
- Division of Pulmonary, Allergy, Critical Care, and Sleep Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Manu Shankar-Hari
- Centre for Inflammation Research, Institute for Regeneration and Repair, The University of Edinburgh, Edinburgh, UK
| | - Pratik Sinha
- Department of Anesthesiology, Washington University School of Medicine in St Louis, St Louis, MO, USA
| | - Arthur S Slutsky
- Interdepartmental Division of Critical Care Medicine, University of Toronto, Toronto, ON, Canada
| | - Edward T Qian
- Division of Allergy, Pulmonary, and Critical Care Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Steve A Webb
- Australian and New Zealand Intensive Care Research Centre, Monash University, Melbourne, VIC, Australia
| | - Paul J Young
- Australian and New Zealand Intensive Care Research Centre, Monash University, Melbourne, VIC, Australia; Intensive Care Unit, Wellington Hospital, Wellington, New Zealand; Medical Research Institute of New Zealand, Wellington, New Zealand; Department of Critical Care, University of Melbourne, Melbourne, VIC, Australia
| | - Fernando G Zampieri
- Department of Critical Care Medicine, Faculty of Medicine and Dentistry, University of Alberta and Alberta Health Services, Edmonton, AB, Canada
| | - Ryan Zarychanski
- Department of Internal Medicine, University of Manitoba, Winnipeg, MB, Canada
| | - Eddy Fan
- Division of Respirology, Department of Medicine, University Health Network, Toronto, ON, Canada; Toronto General Hospital Research Institute, Toronto, ON, Canada; Interdepartmental Division of Critical Care Medicine, University of Toronto, Toronto, ON, Canada
| | - Matthew W Semler
- Division of Allergy, Pulmonary, and Critical Care Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Matthew Churpek
- Division of Pulmonary and Critical Care, Department of Medicine, University of Wisconsin-Madison, Madison, WI, USA; School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, USA
| | - Ewan C Goligher
- Division of Respirology, Department of Medicine, University Health Network, Toronto, ON, Canada; Toronto General Hospital Research Institute, Toronto, ON, Canada; Interdepartmental Division of Critical Care Medicine, University of Toronto, Toronto, ON, Canada.
| | | | | |
Collapse
|
2
|
Chen H, Liu J, Tang G, Hao G, Yang G. Bioinformatic Resources for Exploring Human-virus Protein-protein Interactions Based on Binding Modes. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae075. [PMID: 39404802 PMCID: PMC11658832 DOI: 10.1093/gpbjnl/qzae075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 10/05/2024] [Accepted: 10/11/2024] [Indexed: 12/21/2024]
Abstract
Historically, there have been many outbreaks of viral diseases that have continued to claim millions of lives. Research on human-virus protein-protein interactions (PPIs) is vital to understanding the principles of human-virus relationships, providing an essential foundation for developing virus control strategies to combat diseases. The rapidly accumulating data on human-virus PPIs offer unprecedented opportunities for bioinformatics research around human-virus PPIs. However, available detailed analyses and summaries to help use these resources systematically and efficiently are lacking. Here, we comprehensively review the bioinformatic resources used in human-virus PPI research, and discuss and compare their functions, performance, and limitations. This review aims to provide researchers with a bioinformatic toolbox that will hopefully better facilitate the exploration of human-virus PPIs based on binding modes.
Collapse
Affiliation(s)
- Huimin Chen
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, China
| | - Jiaxin Liu
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, China
| | - Gege Tang
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, China
| | - Gefei Hao
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, China
- State Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China
| | - Guangfu Yang
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, China
| |
Collapse
|
3
|
Pazmiño-Betancourth M, Casas Gómez-Uribarri I, Mondragon-Shem K, Babayan SA, Baldini F, Rafuse Haines L. Advancing age grading techniques for Glossina morsitans morsitans, vectors of African trypanosomiasis, through mid-infrared spectroscopy and machine learning. Biol Methods Protoc 2024; 9:bpae058. [PMID: 39290986 PMCID: PMC11407438 DOI: 10.1093/biomethods/bpae058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Revised: 08/12/2024] [Accepted: 08/15/2024] [Indexed: 09/19/2024] Open
Abstract
Tsetse are the insects responsible for transmitting African trypanosomes, which cause sleeping sickness in humans and animal trypanosomiasis in wildlife and livestock. Knowing the age of these flies is important when assessing the effectiveness of vector control programs and modelling disease risk. Current methods to assess fly age are, however, labour-intensive, slow, and often inaccurate as skilled personnel are in short supply. Mid-infrared spectroscopy (MIRS), a fast and cost-effective tool to accurately estimate several biological traits of insects, offers a promising alternative. This is achieved by characterising the biochemical composition of the insect cuticle using infrared light coupled with machine-learning (ML) algorithms to estimate the traits of interest. We tested the performance of MIRS in estimating tsetse sex and age for the first-time using spectra obtained from their cuticle. We used 541 insectary-reared Glossina m. morsitans of two different age groups for males (5 and 7 weeks) and three age groups for females (3 days, 5 weeks, and 7 weeks). Spectra were collected from the head, thorax, and abdomen of each sample. ML models differentiated between male and female flies with a 96% accuracy and predicted the age group with 94% and 87% accuracy for males and females, respectively. The key infrared regions important for discriminating sex and age classification were characteristic of lipid and protein content. Our results support the use of MIRS as a rapid and accurate way to identify tsetse sex and age with minimal pre-processing. Further validation using wild-caught tsetse could pave the way for this technique to be implemented as a routine surveillance tool in vector control programmes.
Collapse
Affiliation(s)
- Mauro Pazmiño-Betancourth
- School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow, G12 8QQ, Glasgow, United Kingdom
| | - Ivan Casas Gómez-Uribarri
- School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow, G12 8QQ, Glasgow, United Kingdom
| | - Karina Mondragon-Shem
- Department of Vector Biology, Liverpool School of Tropical Medicine, L3 5QA, Liverpool, United Kingdom
| | - Simon A Babayan
- School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow, G12 8QQ, Glasgow, United Kingdom
| | - Francesco Baldini
- School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow, G12 8QQ, Glasgow, United Kingdom
- Environmental Health, and Ecological Sciences Department, Ifakara Health Institute, Morogoro, Ifakara, P.O. Box 53, United Republic of Tanzania
| | - Lee Rafuse Haines
- Department of Vector Biology, Liverpool School of Tropical Medicine, L3 5QA, Liverpool, United Kingdom
- Department of Biological Sciences, University of Notre Dame, 46556, Notre Dame, United States
| |
Collapse
|
4
|
Schwabe D, Becker K, Seyferth M, Klaß A, Schaeffter T. The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review. NPJ Digit Med 2024; 7:203. [PMID: 39097662 PMCID: PMC11297942 DOI: 10.1038/s41746-024-01196-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 07/12/2024] [Indexed: 08/05/2024] Open
Abstract
The adoption of machine learning (ML) and, more specifically, deep learning (DL) applications into all major areas of our lives is underway. The development of trustworthy AI is especially important in medicine due to the large implications for patients' lives. While trustworthiness concerns various aspects including ethical, transparency and safety requirements, we focus on the importance of data quality (training/test) in DL. Since data quality dictates the behaviour of ML products, evaluating data quality will play a key part in the regulatory approval of medical ML products. We perform a systematic review following PRISMA guidelines using the databases Web of Science, PubMed and ACM Digital Library. We identify 5408 studies, out of which 120 records fulfil our eligibility criteria. From this literature, we synthesise the existing knowledge on data quality frameworks and combine it with the perspective of ML applications in medicine. As a result, we propose the METRIC-framework, a specialised data quality framework for medical training data comprising 15 awareness dimensions, along which developers of medical ML applications should investigate the content of a dataset. This knowledge helps to reduce biases as a major source of unfairness, increase robustness, facilitate interpretability and thus lays the foundation for trustworthy AI in medicine. The METRIC-framework may serve as a base for systematically assessing training datasets, establishing reference datasets, and designing test datasets which has the potential to accelerate the approval of medical ML products.
Collapse
Affiliation(s)
- Daniel Schwabe
- Division Medical Physics and Metrological Information Technology, Physikalisch-Technische Bundesanstalt, Berlin, Germany.
| | - Katinka Becker
- Division Medical Physics and Metrological Information Technology, Physikalisch-Technische Bundesanstalt, Berlin, Germany
| | - Martin Seyferth
- Division Medical Physics and Metrological Information Technology, Physikalisch-Technische Bundesanstalt, Berlin, Germany
| | - Andreas Klaß
- Division Medical Physics and Metrological Information Technology, Physikalisch-Technische Bundesanstalt, Berlin, Germany
| | - Tobias Schaeffter
- Division Medical Physics and Metrological Information Technology, Physikalisch-Technische Bundesanstalt, Berlin, Germany
- Department of Medical Engineering, Technical University Berlin, Berlin, Germany
- Einstein Centre for Digital Future, Berlin, Germany
| |
Collapse
|
5
|
Liu Y, Ji Y, Chen J, Zhang Y, Li X, Li X. Pioneering noninvasive colorectal cancer detection with an AI-enhanced breath volatilomics platform. Theranostics 2024; 14:4240-4255. [PMID: 39113791 PMCID: PMC11303087 DOI: 10.7150/thno.94950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 05/02/2024] [Indexed: 08/10/2024] Open
Abstract
Background: The sensitivity and specificity of current breath biomarkers are often inadequate for effective cancer screening, particularly in colorectal cancer (CRC). While a few exhaled biomarkers in CRC exhibit high specificity, they lack the requisite sensitivity for early-stage detection, thereby limiting improvements in patient survival rates. Methods: In this study, we developed an advanced Mass Spectrometry-based volatilomics platform, complemented by an enhanced breath sampler. The platform integrates artificial intelligence (AI)-assisted algorithms to detect multiple volatile organic compounds (VOCs) biomarkers in human breath. Subsequently, we applied this platform to analyze 364 clinical CRC and normal exhaled samples. Results: The diagnostic signatures, including 2-methyl, octane, and butyric acid, generated by the platform effectively discriminated CRC patients from normal controls with high sensitivity (89.7%), specificity (86.8%), and accuracy (AUC = 0.91). Furthermore, the metastatic signature correctly identified over 50% of metastatic patients who tested negative for carcinoembryonic antigen (CEA). Fecal validation indicated that elevated breath biomarkers correlated with an inflammatory response guided by Bacteroides fragilis in CRC. Conclusion: This study introduces a sophisticated AI-aided Mass Spectrometry-based platform capable of identifying novel and feasible breath biomarkers for early-stage CRC detection. The promising results position the platform as an efficient noninvasive screening test for clinical applications, offering potential advancements in early detection and improved survival rates for CRC patients.
Collapse
Affiliation(s)
- Yongqian Liu
- Department of Environmental Science & Engineering, Fudan University, Shanghai 200438, P.R. China
| | - Yongyan Ji
- Department of Environmental Science & Engineering, Fudan University, Shanghai 200438, P.R. China
| | - Jian Chen
- Department of Environmental Science & Engineering, Fudan University, Shanghai 200438, P.R. China
| | - Yixuan Zhang
- Department of gastroenterology, Huadong hospital, Fudan University, Shanghai 200040, P.R. China
| | - Xiaowen Li
- Department of gastroenterology, Huadong hospital, Fudan University, Shanghai 200040, P.R. China
| | - Xiang Li
- Department of Environmental Science & Engineering, Fudan University, Shanghai 200438, P.R. China
| |
Collapse
|
6
|
Makarov V, Chabbert C, Koletou E, Psomopoulos F, Kurbatova N, Ramirez S, Nelson C, Natarajan P, Neupane B. Good machine learning practices: Learnings from the modern pharmaceutical discovery enterprise. Comput Biol Med 2024; 177:108632. [PMID: 38788373 DOI: 10.1016/j.compbiomed.2024.108632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 05/07/2024] [Accepted: 05/18/2024] [Indexed: 05/26/2024]
Abstract
Machine Learning (ML) and Artificial Intelligence (AI) have become an integral part of the drug discovery and development value chain. Many teams in the pharmaceutical industry nevertheless report the challenges associated with the timely, cost effective and meaningful delivery of ML and AI powered solutions for their scientists. We sought to better understand what these challenges were and how to overcome them by performing an industry wide assessment of the practices in AI and Machine Learning. Here we report results of the systematic business analysis of the personas in the modern pharmaceutical discovery enterprise in relation to their work with the AI and ML technologies. We identify 23 common business problems that individuals in these roles face when they encounter AI and ML technologies at work, and describe best practices (Good Machine Learning Practices) that address these issues.
Collapse
Affiliation(s)
- Vladimir Makarov
- The Pistoia Alliance, 401 Edgewater Place, Suite 600, Wakefield, MA, 01880, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Cappelletti L, Rekerle L, Fontana T, Hansen P, Casiraghi E, Ravanmehr V, Mungall CJ, Yang JJ, Spranger L, Karlebach G, Caufield JH, Carmody L, Coleman B, Oprea TI, Reese J, Valentini G, Robinson PN. Node-degree aware edge sampling mitigates inflated classification performance in biomedical random walk-based graph representation learning. BIOINFORMATICS ADVANCES 2024; 4:vbae036. [PMID: 38577542 PMCID: PMC10994718 DOI: 10.1093/bioadv/vbae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 01/31/2024] [Accepted: 02/29/2024] [Indexed: 04/06/2024]
Abstract
Motivation Graph representation learning is a family of related approaches that learn low-dimensional vector representations of nodes and other graph elements called embeddings. Embeddings approximate characteristics of the graph and can be used for a variety of machine-learning tasks such as novel edge prediction. For many biomedical applications, partial knowledge exists about positive edges that represent relationships between pairs of entities, but little to no knowledge is available about negative edges that represent the explicit lack of a relationship between two nodes. For this reason, classification procedures are forced to assume that the vast majority of unlabeled edges are negative. Existing approaches to sampling negative edges for training and evaluating classifiers do so by uniformly sampling pairs of nodes. Results We show here that this sampling strategy typically leads to sets of positive and negative examples with imbalanced node degree distributions. Using representative heterogeneous biomedical knowledge graph and random walk-based graph machine learning, we show that this strategy substantially impacts classification performance. If users of graph machine-learning models apply the models to prioritize examples that are drawn from approximately the same distribution as the positive examples are, then performance of models as estimated in the validation phase may be artificially inflated. We present a degree-aware node sampling approach that mitigates this effect and is simple to implement. Availability and implementation Our code and data are publicly available at https://github.com/monarch-initiative/negativeExampleSelection.
Collapse
Affiliation(s)
- Luca Cappelletti
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milano 20133, Italy
| | - Lauren Rekerle
- The Jackson Laboratory for Genomic Medicine, CT 06032, United States
| | - Tommaso Fontana
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milano 20133, Italy
| | - Peter Hansen
- The Jackson Laboratory for Genomic Medicine, CT 06032, United States
| | - Elena Casiraghi
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milano 20133, Italy
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, United States
| | - Vida Ravanmehr
- The Jackson Laboratory for Genomic Medicine, CT 06032, United States
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, United States
| | - Jeremy J Yang
- Department of Internal Medicine and UNM Comprehensive Cancer Center, UNM School of Medicine, Albuquerque, NM 87102, United States
| | - Leonard Spranger
- Institute of Bioinformatics, Freie Universität Berlin, Berlin, 14195, Germany
| | - Guy Karlebach
- The Jackson Laboratory for Genomic Medicine, CT 06032, United States
| | - J Harry Caufield
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, United States
| | - Leigh Carmody
- The Jackson Laboratory for Genomic Medicine, CT 06032, United States
| | - Ben Coleman
- The Jackson Laboratory for Genomic Medicine, CT 06032, United States
- Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, United States
| | - Tudor I Oprea
- Department of Internal Medicine and UNM Comprehensive Cancer Center, UNM School of Medicine, Albuquerque, NM 87102, United States
| | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, United States
| | - Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milano 20133, Italy
- ELLIS—European Laboratory for Learning and Intelligent Systems
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, CT 06032, United States
- Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, United States
- ELLIS—European Laboratory for Learning and Intelligent Systems
- Berlin Institute of Health, Charité – Universitätsmedizin Berlin, Berlin, 10117, Germany
| |
Collapse
|
8
|
Aguilera-Puga MDC, Cancelarich NL, Marani MM, de la Fuente-Nunez C, Plisson F. Accelerating the Discovery and Design of Antimicrobial Peptides with Artificial Intelligence. Methods Mol Biol 2024; 2714:329-352. [PMID: 37676607 DOI: 10.1007/978-1-0716-3441-7_18] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Peptides modulate many processes of human physiology targeting ion channels, protein receptors, or enzymes. They represent valuable starting points for the development of new biologics against communicable and non-communicable disorders. However, turning native peptide ligands into druggable materials requires high selectivity and efficacy, predictable metabolism, and good safety profiles. Machine learning models have gradually emerged as cost-effective and time-saving solutions to predict and generate new proteins with optimal properties. In this chapter, we will discuss the evolution and applications of predictive modeling and generative modeling to discover and design safe and effective antimicrobial peptides. We will also present their current limitations and suggest future research directions, applicable to peptide drug design campaigns.
Collapse
Affiliation(s)
- Mariana D C Aguilera-Puga
- Centro de Investigación y de Estudios Avanzados del IPN (CINVESTAV-IPN), Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Irapuato, Guanajuato, Mexico
- CINVESTAV-IPN, Unidad Irapuato, Departamento de Biotecnología y Bioquímica, Irapuato, Guanajuato, Mexico
| | - Natalia L Cancelarich
- Instituto Patagónico para el Estudio de los Ecosistemas Continentales (IPEEC), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Puerto Madryn, Argentina
| | - Mariela M Marani
- Instituto Patagónico para el Estudio de los Ecosistemas Continentales (IPEEC), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Puerto Madryn, Argentina
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA.
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA.
| | - Fabien Plisson
- Centro de Investigación y de Estudios Avanzados del IPN (CINVESTAV-IPN), Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Irapuato, Guanajuato, Mexico.
- CINVESTAV-IPN, Unidad Irapuato, Departamento de Biotecnología y Bioquímica, Irapuato, Guanajuato, Mexico.
| |
Collapse
|
9
|
Fernandez ME, Martinez-Romero J, Aon MA, Bernier M, Price NL, de Cabo R. How is Big Data reshaping preclinical aging research? Lab Anim (NY) 2023; 52:289-314. [PMID: 38017182 DOI: 10.1038/s41684-023-01286-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 10/10/2023] [Indexed: 11/30/2023]
Abstract
The exponential scientific and technological progress during the past 30 years has favored the comprehensive characterization of aging processes with their multivariate nature, leading to the advent of Big Data in preclinical aging research. Spanning from molecular omics to organism-level deep phenotyping, Big Data demands large computational resources for storage and analysis, as well as new analytical tools and conceptual frameworks to gain novel insights leading to discovery. Systems biology has emerged as a paradigm that utilizes Big Data to gain insightful information enabling a better understanding of living organisms, visualized as multilayered networks of interacting molecules, cells, tissues and organs at different spatiotemporal scales. In this framework, where aging, health and disease represent emergent states from an evolving dynamic complex system, context given by, for example, strain, sex and feeding times, becomes paramount for defining the biological trajectory of an organism. Using bioinformatics and artificial intelligence, the systems biology approach is leading to remarkable advances in our understanding of the underlying mechanism of aging biology and assisting in creative experimental study designs in animal models. Future in-depth knowledge acquisition will depend on the ability to fully integrate information from different spatiotemporal scales in organisms, which will probably require the adoption of theories and methods from the field of complex systems. Here we review state-of-the-art approaches in preclinical research, with a focus on rodent models, that are leading to conceptual and/or technical advances in leveraging Big Data to understand basic aging biology and its full translational potential.
Collapse
Affiliation(s)
- Maria Emilia Fernandez
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Jorge Martinez-Romero
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
- Laboratory of Epidemiology and Population Science, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Miguel A Aon
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
- Laboratory of Cardiovascular Science, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Michel Bernier
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Nathan L Price
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Rafael de Cabo
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA.
| |
Collapse
|
10
|
Ahlquist KD, Sugden LA, Ramachandran S. Enabling interpretable machine learning for biological data with reliability scores. PLoS Comput Biol 2023; 19:e1011175. [PMID: 37235578 PMCID: PMC10249903 DOI: 10.1371/journal.pcbi.1011175] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 06/08/2023] [Accepted: 05/10/2023] [Indexed: 05/28/2023] Open
Abstract
Machine learning tools have proven useful across biological disciplines, allowing researchers to draw conclusions from large datasets, and opening up new opportunities for interpreting complex and heterogeneous biological data. Alongside the rapid growth of machine learning, there have also been growing pains: some models that appear to perform well have later been revealed to rely on features of the data that are artifactual or biased; this feeds into the general criticism that machine learning models are designed to optimize model performance over the creation of new biological insights. A natural question arises: how do we develop machine learning models that are inherently interpretable or explainable? In this manuscript, we describe the SWIF(r) reliability score (SRS), a method building on the SWIF(r) generative framework that reflects the trustworthiness of the classification of a specific instance. The concept of the reliability score has the potential to generalize to other machine learning methods. We demonstrate the utility of the SRS when faced with common challenges in machine learning including: 1) an unknown class present in testing data that was not present in training data, 2) systemic mismatch between training and testing data, and 3) instances of testing data that have missing values for some attributes. We explore these applications of the SRS using a range of biological datasets, from agricultural data on seed morphology, to 22 quantitative traits in the UK Biobank, and population genetic simulations and 1000 Genomes Project data. With each of these examples, we demonstrate how the SRS can allow researchers to interrogate their data and training approach thoroughly, and to pair their domain-specific knowledge with powerful machine-learning frameworks. We also compare the SRS to related tools for outlier and novelty detection, and find that it has comparable performance, with the advantage of being able to operate when some data are missing. The SRS, and the broader discussion of interpretable scientific machine learning, will aid researchers in the biological machine learning space as they seek to harness the power of machine learning without sacrificing rigor and biological insight.
Collapse
Affiliation(s)
- K. D. Ahlquist
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, Rhode Island, United States of America
| | - Lauren A. Sugden
- Department of Mathematics and Computer Science, Duquesne University, Pittsburgh, Pennsylvania, United States of America
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Ecology, Evolution and Organismal Biology, Brown University, Providence, Rhode Island, United States of America
- Data Science Initiative, Brown University, Providence, Rhode Island, United States of America
| |
Collapse
|
11
|
Couckuyt A, Seurinck R, Emmaneel A, Quintelier K, Novak D, Van Gassen S, Saeys Y. Challenges in translational machine learning. Hum Genet 2022; 141:1451-1466. [PMID: 35246744 PMCID: PMC8896412 DOI: 10.1007/s00439-022-02439-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Accepted: 02/08/2022] [Indexed: 11/25/2022]
Abstract
Machine learning (ML) algorithms are increasingly being used to help implement clinical decision support systems. In this new field, we define as "translational machine learning", joint efforts and strong communication between data scientists and clinicians help to span the gap between ML and its adoption in the clinic. These collaborations also improve interpretability and trust in translational ML methods and ultimately aim to result in generalizable and reproducible models. To help clinicians and bioinformaticians refine their translational ML pipelines, we review the steps from model building to the use of ML in the clinic. We discuss experimental setup, computational analysis, interpretability and reproducibility, and emphasize the challenges involved. We highly advise collaboration and data sharing between consortia and institutes to build multi-centric cohorts that facilitate ML methodologies that generalize across centers. In the end, we hope that this review provides a way to streamline translational ML and helps to tackle the challenges that come with it.
Collapse
Affiliation(s)
- Artuur Couckuyt
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Ruth Seurinck
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Annelies Emmaneel
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Katrien Quintelier
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
- Department of Pulmonary Diseases, Erasmus MC, Rotterdam, The Netherlands
| | - David Novak
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Sofie Van Gassen
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Yvan Saeys
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium.
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium.
| |
Collapse
|
12
|
Das S, Taylor K, Beaulah S, Gardner S. Systematic indication extension for drugs using patient stratification insights generated by combinatorial analytics. PATTERNS (NEW YORK, N.Y.) 2022; 3:100496. [PMID: 35755863 PMCID: PMC9214305 DOI: 10.1016/j.patter.2022.100496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Indication extension or repositioning of drugs can, if done well, provide a faster, cheaper, and derisked route to the approval of new therapies, creating new options to address pockets of unmet medical need for patients and offering the potential for significant commercial and clinical benefits. We look at the promises and challenges of different repositioning strategies and the disease insights and scalability that new high-resolution patient stratification methodologies can bring. This is exemplified by a systematic analysis of all development candidates and on-market drugs, which identified 477 indication extension opportunities across 30 chronic disease areas, each supported by patient stratification biomarkers. This illustrates the potential that new artificial intelligence (AI) and combinatorial analytics methods have to enhance the rate and cost of innovation across the drug discovery industry.
Collapse
Affiliation(s)
- Sayoni Das
- PrecisionLife, Unit 8b Bankside, Hanborough Business Park, Long Hanborough OX29 8LJ, UK
| | - Krystyna Taylor
- PrecisionLife, Unit 8b Bankside, Hanborough Business Park, Long Hanborough OX29 8LJ, UK
| | - Simon Beaulah
- PrecisionLife, Unit 8b Bankside, Hanborough Business Park, Long Hanborough OX29 8LJ, UK
| | - Steve Gardner
- PrecisionLife, Unit 8b Bankside, Hanborough Business Park, Long Hanborough OX29 8LJ, UK
| |
Collapse
|
13
|
Abstract
The scale of genetic, epigenomic, transcriptomic, cheminformatic and proteomic data available today, coupled with easy-to-use machine learning (ML) toolkits, has propelled the application of supervised learning in genomics research. However, the assumptions behind the statistical models and performance evaluations in ML software frequently are not met in biological systems. In this Review, we illustrate the impact of several common pitfalls encountered when applying supervised ML in genomics. We explore how the structure of genomics data can bias performance evaluations and predictions. To address the challenges associated with applying cutting-edge ML methods to genomics, we describe solutions and appropriate use cases where ML modelling shows great potential.
Collapse
|
14
|
Barsi S, Szalai B. Modeling in systems biology: Causal understanding before prediction? PATTERNS (NEW YORK, N.Y.) 2021; 2:100280. [PMID: 34179849 PMCID: PMC8212131 DOI: 10.1016/j.patter.2021.100280] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Babur et al. (2021) developed the CausalPath tool to infer causal signaling interactions in high-throughput proteomics data that may foster mechanical understanding from large-scale biological datasets.
Collapse
Affiliation(s)
- Szilvia Barsi
- Department of Physiology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Bence Szalai
- Department of Physiology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| |
Collapse
|
15
|
Wu Z, Johnston KE, Arnold FH, Yang KK. Protein sequence design with deep generative models. Curr Opin Chem Biol 2021; 65:18-27. [PMID: 34051682 DOI: 10.1016/j.cbpa.2021.04.004] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 04/02/2021] [Accepted: 04/07/2021] [Indexed: 12/20/2022]
Abstract
Protein engineering seeks to identify protein sequences with optimized properties. When guided by machine learning, protein sequence generation methods can draw on prior knowledge and experimental efforts to improve this process. In this review, we highlight recent applications of machine learning to generate protein sequences, focusing on the emerging field of deep generative methods.
Collapse
Affiliation(s)
- Zachary Wu
- Division of Chemistry and Chemical Engineering, California Institute of Technology, 1200 E California Blvd, Pasadena, 91125, CA, USA
| | - Kadina E Johnston
- Division of Biology and Biological Engineering, California Institute of Technology, 1200 E California Blvd, Pasadena, 91125, CA, USA
| | - Frances H Arnold
- Division of Chemistry and Chemical Engineering, California Institute of Technology, 1200 E California Blvd, Pasadena, 91125, CA, USA; Division of Biology and Biological Engineering, California Institute of Technology, 1200 E California Blvd, Pasadena, 91125, CA, USA
| | - Kevin K Yang
- Microsoft Research New England, 1 Memorial Drive, Cambridge, 02142, MA, USA.
| |
Collapse
|
16
|
Deep Automation Bias: How to Tackle a Wicked Problem of AI? BIG DATA AND COGNITIVE COMPUTING 2021. [DOI: 10.3390/bdcc5020018] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
The increasing use of AI in different societal contexts intensified the debate on risks, ethical problems and bias. Accordingly, promising research activities focus on debiasing to strengthen fairness, accountability and transparency in machine learning. There is, though, a tendency to fix societal and ethical issues with technical solutions that may cause additional, wicked problems. Alternative analytical approaches are thus needed to avoid this and to comprehend how societal and ethical issues occur in AI systems. Despite various forms of bias, ultimately, risks result from eventual rule conflicts between the AI system behavior due to feature complexity and user practices with limited options for scrutiny. Hence, although different forms of bias can occur, automation is their common ground. The paper highlights the role of automation and explains why deep automation bias (DAB) is a metarisk of AI. Based on former work it elaborates the main influencing factors and develops a heuristic model for assessing DAB-related risks in AI systems. This model aims at raising problem awareness and training on the sociotechnical risks resulting from AI-based automation and contributes to improving the general explicability of AI systems beyond technical issues.
Collapse
|