Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Dickinson Q, Meyer JG. Positional SHAP (PoSHAP) for Interpretation of machine learning models trained from biological sequences. PLoS Comput Biol 2022;18:e1009736. [PMID: 35089914 PMCID: PMC8797255 DOI: 10.1371/journal.pcbi.1009736] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 12/09/2021] [Indexed: 11/29/2022] Open

For:	Dickinson Q, Meyer JG. Positional SHAP (PoSHAP) for Interpretation of machine learning models trained from biological sequences. PLoS Comput Biol 2022;18:e1009736. [PMID: 35089914 PMCID: PMC8797255 DOI: 10.1371/journal.pcbi.1009736] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 12/09/2021] [Indexed: 11/29/2022] Open

Number

Cited by Other Article(s)

Zhang X, Tseo Y, Bai Y, Chen F, Uhler C. Prediction of protein subcellular localization in single cells. Nat Methods 2025;22:1265-1275. [PMID: 40360932 DOI: 10.1038/s41592-025-02696-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Accepted: 04/09/2025] [Indexed: 05/15/2025]

Zheng L, Wu X, Gu W, Wang R, Wang J, He H, Wang Z, Yi B, Zhang Y. Development and validation of a hypoxemia prediction model in middle-aged and elderly outpatients undergoing painless gastroscopy. Sci Rep 2025;15:17965. [PMID: 40410303 PMCID: PMC12102271 DOI: 10.1038/s41598-025-02540-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Accepted: 05/14/2025] [Indexed: 05/25/2025] Open

Abstract

Hypoxemia is a common complication associated with anesthesia in painless gastroscopy. With the aging of the social population, the number of cases of hypoxemia among middle-aged and elderly patients is increasing. However, tools for predicting hypoxemia in middle-aged and elderly patients are lacking. In this study, we investigated the risk factors for hypoxemia in middle-aged and elderly outpatients undergoing painless gastroscopy based on machine learning and constructed a risk prediction model. In this retrospective study, we included the data on 1,348 outpatients undergoing painless gastroscopy. In total, 26 characteristic variables, including demographic information, past medical history, and clinical data of the patients were included, and BorutaShap was used for feature selection. Five machine learning algorithm models, including logistic regression (LR), support vector machine (SVM), random forest (RF), extreme gradient boosting (XGB), and light gradient boosting machine (LightGBM), were selected. The best models were selected based on the area under the receiver operating characteristic curve (AUROC). Model feature importance was explained and analyzed using Shapley Additive Explanations (SHAP). The endpoint event of this study was considered to be hypoxemia during the procedure, defined as at least one occurrence of pulse oxygen saturation below 90% without probe misalignment or interference from the beginning of anesthesia induction to the end of painless gastroscopy. In the final cohort of 984 patients, 11% of patients (108/984) experienced hypoxemia during the painless gastroscopy procedure. The AUROCs of the five models were as follows: Logistic Regression (AUROC = 0.893, 95CI: 0.881-0.899), SVM (AUROC = 0.855, 95CI: 0.812-0.884), Random Forest (AUROC = 0.914, 95CI: 0.889-0.924), XGB (AUROC = 0.902, 95CI: 0.865-0.919), and LightGBM (AUROC = 0.891, 95CI: 0.847-0.917). Regarding the explanation of the importance of SHAP features, preoperative variables (baseline SpO2, body mass index, and micrognathia) and intraoperative variables (operating time of gastroscopy, induction dose of etomidate and propofol mixture, append anesthetic, cough, and repeated pharyngeal irritation) significantly contributed to the model. We identified eight potential risk factors related to the occurrence of hypoxemia in middle-aged and elderly patients undergoing painless gastroscopy, based on machine learning feature engineering. Among the five machine learning algorithms, RF exhibited the best predictive performance in the internal test set and had a certain degree of generalization ability in the external validation set, which indicated that the RF model was more suitable for the data framework of this study. This model was more likely to enhance the accuracy of hypoxemia prediction in middle-aged and elderly patients undergoing painless gastroscopy, and thus, it is suitable for assisting anesthesiologists in clinical decision-making.

Collapse

Huang RN, Luo SY, Huang T, Li XS, Zhou FC, Yin WH, Chen ZR, Yuan SZ, Li LY, Tang B, Qiao JD. The interaction of UBR4, LRP1, and OPHN1 in refractory epilepsy: Drosophila model to investigate the oligogenic effect on epilepsy. Neurobiol Dis 2025;212:106955. [PMID: 40374006 DOI: 10.1016/j.nbd.2025.106955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2025] [Revised: 05/11/2025] [Accepted: 05/12/2025] [Indexed: 05/17/2025] Open

Abstract

Refractory epilepsy is an intractable neurological disorder that can be associated with oligogenic/polygenic etiologies. Through trio-based whole-exome sequencing analysis, we identified a clinical case of refractory epilepsy with three candidate gene variants: UBR4, LRP1, and OPHN1. Utilizing the Gal4-UAS system and double-balancer tool, we generated single, double, and triple knockdown Drosophila models to investigate the interactions of the three candidate genes. Seizure behavioral experiments combined with logistic regression analysis revealed the individual epileptogenicity and significant synergistic epileptogenic effects of the three mutations. By constructing a SHAP-XGBoost machine learning model integrating seizure behavior data with knockdown efficiency metrics, we discovered that LRP1 mutation served as the primary effector in the oligogenic system. Based on transcriptome analysis, main related processes of oxidative stress and metabolic imbalance together with expressional dysregulation separately of 48, 52, and 43 epilepsy-associated genes were discovered to confirm the epileptogenicity of OPHN1 knockdown, UBR4-LRP1 knockdown, and UBR4-LRP1-OPHN1 knockdown. Up-regulation of COX7AL and ND-B8 enriched in metabolic pathways and down-regulation of Diedel enriched in extracellular space component were indicated to be responsible for the significant epileptogenicity of the oligogenic knockdown. For this clinical instance, epileptic pharmacoresistance was considered to be triggered by a combination of KIF gene family, SLC gene family, and ASIC gene family. This study established a novel framework to clarify the multiple genetic structure of epileptogenicity in refractory epilepsy with oligogenic background, which could be critical to translational medicine and precision therapy development.

Collapse

Affiliation(s)

Rui-Na Huang Department of Neurology, Institute of Neuroscience, Key Laboratory of Neurogenetics and Channelopathies of Guangdong Province and the Ministry of Education of China, The Second Affiliated Hospital of Guangzhou Medical University, Changgang Dong Road, Guangzhou 510000, China; The Second Clinical Medicine School, Guangzhou Medical University, Guangzhou 510000, China
Si-Yuan Luo The Second Clinical Medicine School, Guangzhou Medical University, Guangzhou 510000, China
Tao Huang The Second Clinical Medicine School, Guangzhou Medical University, Guangzhou 510000, China
Xiong-Sheng Li The Second Clinical Medicine School, Guangzhou Medical University, Guangzhou 510000, China
Fan-Chao Zhou The Second Clinical Medicine School, Guangzhou Medical University, Guangzhou 510000, China
Wei-Hao Yin The Second Clinical Medicine School, Guangzhou Medical University, Guangzhou 510000, China
Ze-Ru Chen The Second Clinical Medicine School, Guangzhou Medical University, Guangzhou 510000, China
Shi-Zhan Yuan Department of Neurology, Institute of Neuroscience, Key Laboratory of Neurogenetics and Channelopathies of Guangdong Province and the Ministry of Education of China, The Second Affiliated Hospital of Guangzhou Medical University, Changgang Dong Road, Guangzhou 510000, China
Ling-Ying Li Department of Neurology, Institute of Neuroscience, Key Laboratory of Neurogenetics and Channelopathies of Guangdong Province and the Ministry of Education of China, The Second Affiliated Hospital of Guangzhou Medical University, Changgang Dong Road, Guangzhou 510000, China
Bin Tang Department of Neurology, Institute of Neuroscience, Key Laboratory of Neurogenetics and Channelopathies of Guangdong Province and the Ministry of Education of China, The Second Affiliated Hospital of Guangzhou Medical University, Changgang Dong Road, Guangzhou 510000, China.
Jing-Da Qiao Department of Neurology, Institute of Neuroscience, Key Laboratory of Neurogenetics and Channelopathies of Guangdong Province and the Ministry of Education of China, The Second Affiliated Hospital of Guangzhou Medical University, Changgang Dong Road, Guangzhou 510000, China.

Collapse

Angelis J, Schröder EA, Xiao Z, Gabriel W, Wilhelm M. Peptide Property Prediction for Mass Spectrometry Using AI: An Introduction to State of the Art Models. Proteomics 2025;25:e202400398. [PMID: 40211610 PMCID: PMC12076536 DOI: 10.1002/pmic.202400398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2024] [Revised: 03/14/2025] [Accepted: 03/17/2025] [Indexed: 05/15/2025]

Arina P, Ferrari D, Kaczorek MR, Tetlow N, Dewar A, Stephens R, Martin D, Moonesinghe R, Singer M, Whittle J, Mazomenos EB. Assessing perioperative risks in a mixed elderly surgical population using machine learning: A multi-objective symbolic regression approach to cardiorespiratory fitness derived from cardiopulmonary exercise testing. PLOS DIGITAL HEALTH 2025;4:e0000851. [PMID: 40378351 DOI: 10.1371/journal.pdig.0000851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/13/2024] [Accepted: 04/05/2025] [Indexed: 05/18/2025]

Abstract

Accurate preoperative risk assessment is of great value to both patients and clinical teams. Several risk scores have been developed but are often not calibrated to the local institution, limited in terms of data input into the underlying models, and/or lack individual precision. Machine Learning (ML) models have the potential to address limitations in existing scoring systems. A database of 1190 elderly patients who underwent major elective surgery was analyzed retrospectively. Preoperative cardiorespiratory fitness data from cardiopulmonary exercise testing (CPET), demographic and clinical data were extracted and integrated into advanced machine learning (ML) algorithms. Multi-Objective-Symbolic-Regression (MOSR), a novel algorithm utilizing Genetic Programming to generate mathematical formulae for learning tasks, was employed to predict patient morbidity at Postoperative Day 3, as defined by the PostOperative Morbidity Survey (POMS). Shapley-Additive-exPlanations (SHAP) was subsequently used to analyze feature contributions. Model performance was benchmarked against existing risk prediction scores, namely the Portsmouth-Physiological-and-Operative-Severity-Score-for-the-Enumeration-of-Mortality-and-Morbidity (PPOSSUM) and the Duke-Activity-Status-Index, as well as linear regression using CPET features. A model was also developed for the same task using data directly extracted from the CPET time-series. The incorporation of cardiorespiratory fitness data enhanced the performance of all models for predicting postoperative morbidity by 20% compared to sole reliance on clinical data. Cardiorespiratory fitness features demonstrated greater importance than clinical features in the SHAP analysis. Models utilizing data taken directly from the CPET time-series demonstrated a 12% improvement over the cardiorespiratory fitness models. MOSR model surpassed all other models in every experiment, demonstrating excellent robustness and generalization capabilities. Integrating cardiorespiratory fitness data with ML models enables improved preoperative prediction of postoperative morbidity in elective surgical patients. The MOSR model stands out for its capacity to pinpoint essential features and build models that are both simple and accurate, showing excellent generalizability.

Collapse

Affiliation(s)

Pietro Arina Bloomsbury Institute of Intensive Care Medicine, University College London, London, United Kingdom Human Physiology and Performance Laboratory, Centre for Perioperative Medicine, Department of Targeted Intervention, University College London, London, United Kingdom
Davide Ferrari Department of Population Health Sciences, King's College London, London, United Kingdom
Maciej R Kaczorek Wellcome/EPSRC Centre of Interventional and Surgical Sciences and Department of Medical Physics and Biomedical Engineering, University College London, London, United Kingdom
Nicholas Tetlow Human Physiology and Performance Laboratory, Centre for Perioperative Medicine, Department of Targeted Intervention, University College London, London, United Kingdom
Amy Dewar Human Physiology and Performance Laboratory, Centre for Perioperative Medicine, Department of Targeted Intervention, University College London, London, United Kingdom
Robert Stephens Human Physiology and Performance Laboratory, Centre for Perioperative Medicine, Department of Targeted Intervention, University College London, London, United Kingdom
Daniel Martin Peninsula Medical School, University of Plymouth, Plymouth, Devon, United Kingdom
Ramani Moonesinghe Human Physiology and Performance Laboratory, Centre for Perioperative Medicine, Department of Targeted Intervention, University College London, London, United Kingdom
Mervyn Singer Bloomsbury Institute of Intensive Care Medicine, University College London, London, United Kingdom
John Whittle Human Physiology and Performance Laboratory, Centre for Perioperative Medicine, Department of Targeted Intervention, University College London, London, United Kingdom
Evangelos B Mazomenos Wellcome/EPSRC Centre of Interventional and Surgical Sciences and Department of Medical Physics and Biomedical Engineering, University College London, London, United Kingdom

Collapse

Yu X, Wang W, Wu R, Gong X, Ji Y, Feng Z. Construction of a machine learning-based interpretable prediction model for acute kidney injury in hospitalized patients. Sci Rep 2025;15:9313. [PMID: 40102467 PMCID: PMC11920398 DOI: 10.1038/s41598-025-90459-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2024] [Accepted: 02/13/2025] [Indexed: 03/20/2025] Open

Affiliation(s)

Xiang Yu First Medical Center of Chinese PLA General Hospital, Department of Nephrology, First Medical Center of Chinese PLA General Hospital, State Key Laboratory of Kidney Diseases, National Clinical Research Center for Kidney Diseases, Beijing Key Laboratory of Medical Devices and Integrated Traditional Chinese and Western Drug Development for Severe Kidney Diseases，Beijing Key Laboratory of Digital Intelligent TCM for the Preventionand Treatment of Pan-vascular Diseases，Key Disciplines of National Administration of Traditional Chinese Medicine(zyyzdxk-2023310), Beijing, 100853, China
WanLing Wang Medical Innovation Research Division, Chinese PLA General Hospital, Beijing, 100853, China
RiLiGe Wu Medical Innovation Research Division, Chinese PLA General Hospital, Beijing, 100853, China
XinYan Gong First Medical Center of Chinese PLA General Hospital, Department of Nephrology, First Medical Center of Chinese PLA General Hospital, State Key Laboratory of Kidney Diseases, National Clinical Research Center for Kidney Diseases, Beijing Key Laboratory of Medical Devices and Integrated Traditional Chinese and Western Drug Development for Severe Kidney Diseases，Beijing Key Laboratory of Digital Intelligent TCM for the Preventionand Treatment of Pan-vascular Diseases，Key Disciplines of National Administration of Traditional Chinese Medicine(zyyzdxk-2023310), Beijing, 100853, China
YuWei Ji First Medical Center of Chinese PLA General Hospital, Department of Nephrology, First Medical Center of Chinese PLA General Hospital, State Key Laboratory of Kidney Diseases, National Clinical Research Center for Kidney Diseases, Beijing Key Laboratory of Medical Devices and Integrated Traditional Chinese and Western Drug Development for Severe Kidney Diseases，Beijing Key Laboratory of Digital Intelligent TCM for the Preventionand Treatment of Pan-vascular Diseases，Key Disciplines of National Administration of Traditional Chinese Medicine(zyyzdxk-2023310), Beijing, 100853, China
Zhe Feng First Medical Center of Chinese PLA General Hospital, Department of Nephrology, First Medical Center of Chinese PLA General Hospital, State Key Laboratory of Kidney Diseases, National Clinical Research Center for Kidney Diseases, Beijing Key Laboratory of Medical Devices and Integrated Traditional Chinese and Western Drug Development for Severe Kidney Diseases，Beijing Key Laboratory of Digital Intelligent TCM for the Preventionand Treatment of Pan-vascular Diseases，Key Disciplines of National Administration of Traditional Chinese Medicine(zyyzdxk-2023310), Beijing, 100853, China.

Collapse

Charoenkwan P, Chumnanpuen P, Schaduangrat N, Shoombuatong W. Stack-AVP: A Stacked Ensemble Predictor Based on Multi-view Information for Fast and Accurate Discovery of Antiviral Peptides. J Mol Biol 2025;437:168853. [PMID: 39510347 DOI: 10.1016/j.jmb.2024.168853] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Revised: 10/22/2024] [Accepted: 10/31/2024] [Indexed: 11/15/2024]

Zhang SY, Zhang YD, Li H, Wang QY, Ye QF, Wang XM, Xia TH, He YE, Rong X, Wu TT, Wu RZ. Explainable machine learning model for predicting decline in platelet count after interventional closure in children with patent ductus arteriosus. Front Pediatr 2025;13:1519002. [PMID: 39981204 PMCID: PMC11839778 DOI: 10.3389/fped.2025.1519002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Accepted: 01/20/2025] [Indexed: 02/22/2025] Open

Lin TC, Chiueh PT, Hsiao TC. Challenges in Observation of Ultrafine Particles: Addressing Estimation Miscalculations and the Necessity of Temporal Trends. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2025;59:565-577. [PMID: 39670560 PMCID: PMC11741106 DOI: 10.1021/acs.est.4c07460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2024] [Revised: 11/29/2024] [Accepted: 12/02/2024] [Indexed: 12/14/2024]

Obeidat R, Alsmadi I, Baker QB, Al-Njadat A, Srinivasan S. Researching public health datasets in the era of deep learning: a systematic literature review. Health Informatics J 2025;31:14604582241307839. [PMID: 39794941 DOI: 10.1177/14604582241307839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2025]

Wang J, Xi R, Wang Y, Gao H, Gao M, Zhang X, Zhang L, Zhang Y. Toward molecular diagnosis of major depressive disorder by plasma peptides using a deep learning approach. Brief Bioinform 2024;26:bbae554. [PMID: 39592240 PMCID: PMC11596692 DOI: 10.1093/bib/bbae554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 09/30/2024] [Accepted: 10/01/2024] [Indexed: 11/28/2024] Open

Yuan Z, Peng J, Shu Z, Qin X, Zhong J. Interpretable multitemporal liver function indicator model for prediction and risk factor analysis of drug induced liver injury. Sci Rep 2024;14:21285. [PMID: 39261535 PMCID: PMC11390907 DOI: 10.1038/s41598-024-66952-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 07/05/2024] [Indexed: 09/13/2024] Open

Dhibar S, Jana B. Optimized Collective Variable for Collapse Transition in Linear Hydrophobic Polymers: Importance of Hydration Water and End-to-End Distance. J Chem Theory Comput 2024;20:7404-7415. [PMID: 39252562 DOI: 10.1021/acs.jctc.4c00753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]

Abstract

Choosing an appropriate collective variable (CV) for any biomolecular process is a challenging task. Researchers are developing methods to solve this issue using a variety of methodologies, most recently using machine learning (ML) methods. In this work, we investigate the mechanism of collapse transition across various lengths of polymer systems through adaptively sampled multiple short trajectories utilizing the Time Lagged Independent Component Analysis (TICA) framework. From TICA analysis, it is revealed that the radius of gyration (Rg) and end-to-end distance serve as good order parameters (OPs) for these systems describing overall energy landscapes. Markov state model (MSM) and mean first passage time (MFPT) analysis suggest that hydration water (Nw) plays a determining role in dictating the time scale and barrier for the collapsed transition for the C40 system. P-fold analysis on identifying transition state ensembles (TSE) identified by committor analysis also strengthens the role of Nw in such a transition. TICA, MSM, and committor analyses on the collapse transition for C45 reveal similarities with C40 systems in different aspects. Furthermore, we propose a pipeline integrating XGBoost regression along with an interpretable ML model, Shapley Additive exPlanation (SHAP) to precisely elucidate the contribution of each OP locally at the TSE. Through this approach, we observe that the collapse transition is primarily driven by Nw for both polymer systems. A carefully designed protocol for the collapsed transition of C60 systems indirectly reiterates the above result. Overall, our results suggest that while the end-to-end distance should be considered for better resolution of metastable states in the landscape, Nw is the crucial coordinate to be used in enhanced sampling for the exploration of actual collapse transitions for linear hydrophobic polymer systems. The Python code for analyzing the contribution of different OPs in the TSE using an ML-aided protocol is available on GitHub (https://github.com/saikat-ai/linear_polymer_project).

Collapse

Zhang X, Tseo Y, Bai Y, Chen F, Uhler C. Prediction of protein subcellular localization in single cells. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.25.605178. [PMID: 39091825 PMCID: PMC11291118 DOI: 10.1101/2024.07.25.605178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]

Borole P, Rajan A. Building trust in deep learning-based immune response predictors with interpretable explanations. Commun Biol 2024;7:279. [PMID: 38448546 PMCID: PMC10917751 DOI: 10.1038/s42003-024-05968-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 02/23/2024] [Indexed: 03/08/2024] Open

Kawamura N, Sato W, Shimokawa K, Fujita T, Kawanishi Y. Machine Learning-Based Interpretable Modeling for Subjective Emotional Dynamics Sensing Using Facial EMG. SENSORS (BASEL, SWITZERLAND) 2024;24:1536. [PMID: 38475072 DOI: 10.3390/s24051536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 02/03/2024] [Accepted: 02/26/2024] [Indexed: 03/14/2024]

Will A, Oliinyk D, Bleiholder C, Meier F. Peptide collision cross sections of 22 post-translational modifications. Anal Bioanal Chem 2023;415:6633-6645. [PMID: 37758903 PMCID: PMC10598134 DOI: 10.1007/s00216-023-04957-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 07/13/2023] [Accepted: 08/23/2023] [Indexed: 09/29/2023]

Karim MR, Islam T, Shajalal M, Beyan O, Lange C, Cochez M, Rebholz-Schuhmann D, Decker S. Explainable AI for Bioinformatics: Methods, Tools and Applications. Brief Bioinform 2023;24:bbad236. [PMID: 37478371 DOI: 10.1093/bib/bbad236] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/10/2023] [Accepted: 05/26/2023] [Indexed: 07/23/2023] Open

Abstract

Artificial intelligence (AI) systems utilizing deep neural networks and machine learning (ML) algorithms are widely used for solving critical problems in bioinformatics, biomedical informatics and precision medicine. However, complex ML models that are often perceived as opaque and black-box methods make it difficult to understand the reasoning behind their decisions. This lack of transparency can be a challenge for both end-users and decision-makers, as well as AI developers. In sensitive areas such as healthcare, explainability and accountability are not only desirable properties but also legally required for AI systems that can have a significant impact on human lives. Fairness is another growing concern, as algorithmic decisions should not show bias or discrimination towards certain groups or individuals based on sensitive attributes. Explainable AI (XAI) aims to overcome the opaqueness of black-box models and to provide transparency in how AI systems make decisions. Interpretable ML models can explain how they make predictions and identify factors that influence their outcomes. However, the majority of the state-of-the-art interpretable ML methods are domain-agnostic and have evolved from fields such as computer vision, automated reasoning or statistics, making direct application to bioinformatics problems challenging without customization and domain adaptation. In this paper, we discuss the importance of explainability and algorithmic transparency in the context of bioinformatics. We provide an overview of model-specific and model-agnostic interpretable ML methods and tools and outline their potential limitations. We discuss how existing interpretable ML methods can be customized and fit to bioinformatics research problems. Further, through case studies in bioimaging, cancer genomics and text mining, we demonstrate how XAI methods can improve transparency and decision fairness. Our review aims at providing valuable insights and serving as a starting point for researchers wanting to enhance explainability and decision transparency while solving bioinformatics problems. GitHub: https://github.com/rezacsedu/XAI-for-bioinformatics.

Collapse

Hartout P, Počuča B, Méndez-García C, Schleberger C. Investigating the human and nonobese diabetic mouse MHC class II immunopeptidome using protein language modeling. Bioinformatics 2023;39:btad469. [PMID: 37527005 PMCID: PMC10421966 DOI: 10.1093/bioinformatics/btad469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 06/17/2023] [Accepted: 07/31/2023] [Indexed: 08/03/2023] Open

Peng J, Wang L, Wang P, Pei Y. Density Functional Theory Computation and Machine Learning Studies of Interaction between Au₃ Clusters and 20 Natural Amino Acid Molecules. ACS OMEGA 2023;8:23024-23031. [PMID: 37396243 PMCID: PMC10308543 DOI: 10.1021/acsomega.3c02195] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Accepted: 05/17/2023] [Indexed: 07/04/2023]

Ye W, Chen X, Li P, Tao Y, Wang Z, Gao C, Cheng J, Li F, Yi D, Wei Z, Yi D, Wu Y. OEDL: an optimized ensemble deep learning method for the prediction of acute ischemic stroke prognoses using union features. Front Neurol 2023;14:1158555. [PMID: 37416306 PMCID: PMC10321134 DOI: 10.3389/fneur.2023.1158555] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Accepted: 05/22/2023] [Indexed: 07/08/2023] Open

Abstract

Background

Early stroke prognosis assessments are critical for decision-making regarding therapeutic intervention. We introduced the concepts of data combination, method integration, and algorithm parallelization, aiming to build an integrated deep learning model based on a combination of clinical and radiomics features and analyze its application value in prognosis prediction.

Methods

The research steps in this study include data source and feature extraction, data processing and feature fusion, model building and optimization, model training, and so on. Using data from 441 stroke patients, clinical and radiomics features were extracted, and feature selection was performed. Clinical, radiomics, and combined features were included to construct predictive models. We applied the concept of deep integration to the joint analysis of multiple deep learning methods, used a metaheuristic algorithm to improve the parameter search efficiency, and finally, developed an acute ischemic stroke (AIS) prognosis prediction method, namely, the optimized ensemble of deep learning (OEDL) method.

Results

Among the clinical features, 17 features passed the correlation check. Among the radiomics features, 19 features were selected. In the comparison of the prediction performance of each method, the OEDL method based on the concept of ensemble optimization had the best classification performance. In the comparison to the predictive performance of each feature, the inclusion of the combined features resulted in better classification performance than that of the clinical and radiomics features. In the comparison to the prediction performance of each balanced method, SMOTEENN, which is based on a hybrid sampling method, achieved the best classification performance than that of the unbalanced, oversampled, and undersampled methods. The OEDL method with combined features and mixed sampling achieved the best classification performance, with 97.89, 95.74, 94.75, 94.03, and 94.35% for Macro-AUC, ACC, Macro-R, Macro-P, and Macro-F1, respectively, and achieved advanced performance in comparison with that of methods in previous studies.

Conclusion

The OEDL approach proposed herein could effectively achieve improved stroke prognosis prediction performance, the effect of using combined data modeling was significantly better than that of single clinical or radiomics feature models, and the proposed method had a better intervention guidance value. Our approach is beneficial for optimizing the early clinical intervention process and providing the necessary clinical decision support for personalized treatment.

Collapse

Steyaert S, Pizurica M, Nagaraj D, Khandelwal P, Hernandez-Boussard T, Gentles AJ, Gevaert O. Multimodal data fusion for cancer biomarker discovery with deep learning. NAT MACH INTELL 2023;5:351-362. [PMID: 37693852 PMCID: PMC10484010 DOI: 10.1038/s42256-023-00633-5] [Citation(s) in RCA: 79] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 02/17/2023] [Indexed: 09/12/2023]

Huang AA, Huang SY. Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations. PLoS One 2023;18:e0281922. [PMID: 36821544 PMCID: PMC9949629 DOI: 10.1371/journal.pone.0281922] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 02/05/2023] [Indexed: 02/24/2023] Open

Abstract

Machine learning methods are widely used within the medical field. However, the reliability and efficacy of these models is difficult to assess, making it difficult for researchers to identify which machine-learning model to apply to their dataset. We assessed whether variance calculations of model metrics (e.g., AUROC, Sensitivity, Specificity) through bootstrap simulation and SHapely Additive exPlanations (SHAP) could increase model transparency and improve model selection. Data from the England National Health Services Heart Disease Prediction Cohort was used. After comparison of model metrics for XGBoost, Random Forest, Artificial Neural Network, and Adaptive Boosting, XGBoost was used as the machine-learning model of choice in this study. Boost-strap simulation (N = 10,000) was used to empirically derive the distribution of model metrics and covariate Gain statistics. SHapely Additive exPlanations (SHAP) to provide explanations to machine-learning output and simulation to evaluate the variance of model accuracy metrics. For the XGBoost modeling method, we observed (through 10,000 completed simulations) that the AUROC ranged from 0.771 to 0.947, a difference of 0.176, the balanced accuracy ranged from 0.688 to 0.894, a 0.205 difference, the sensitivity ranged from 0.632 to 0.939, a 0.307 difference, and the specificity ranged from 0.595 to 0.944, a 0.394 difference. Among 10,000 simulations completed, we observed that the gain for Angina ranged from 0.225 to 0.456, a difference of 0.231, for Cholesterol ranged from 0.148 to 0.326, a difference of 0.178, for maximum heart rate (MaxHR) ranged from 0.081 to 0.200, a range of 0.119, and for Age ranged from 0.059 to 0.157, difference of 0.098. Use of simulations to empirically evaluate the variability of model metrics and explanatory algorithms to observe if covariates match the literature are necessary for increased transparency, reliability, and utility of machine learning methods. These variance statistics, combined with model accuracy statistics can help researchers identify the best model for a given dataset.

Collapse

Kokkotis C, Giarmatzis G, Giannakou E, Moustakidis S, Tsatalas T, Tsiptsios D, Vadikolias K, Aggelousis N. An Explainable Machine Learning Pipeline for Stroke Prediction on Imbalanced Data. Diagnostics (Basel) 2022;12:2392. [PMID: 36292081 PMCID: PMC9600473 DOI: 10.3390/diagnostics12102392] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 09/26/2022] [Accepted: 09/27/2022] [Indexed: 11/16/2022] Open