1
|
Li R, Wang Q, Gao R, Shen R, Wang Q, Cui X, Jiang Z, Zhang L, Fang J. Sepsis Important Genes Identification Through Biologically Informed Deep Learning and Transcriptomic Analysis. Clin Exp Pharmacol Physiol 2025; 52:e70031. [PMID: 40356040 DOI: 10.1111/1440-1681.70031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2024] [Revised: 01/24/2025] [Accepted: 02/03/2025] [Indexed: 05/15/2025]
Abstract
Sepsis is a life-threatening disease caused by the dysregulation of the immune response. It is important to identify influential genes modulating the immune response in sepsis. In this study, we used P-NET, a biologically informed explainable artificial intelligence model, to evaluate the gene importance for sepsis. About 688 important genes were identified, and these genes were enriched in pathways involved in inflammation and immune regulation, such as the PI3K-Akt signalling pathway, necroptosis and the NF-κB signalling pathway. We further selected differentially expressed genes both at bulk and single-cell levels and found TIMP1, GSTO1 and MYL6 exhibited significant different expressions in multiple cell types. Moreover, the expression levels of these 3 genes were correlated with the abundance of important immune cells, such as M-MDSC cells. Further analysis demonstrated that these three genes were highly expressed in sepsis patients with worse outcomes, such as severe, non-survived and shock sepsis patients. Using a drug repositioning strategy, we found navitoclax, curcumin and rotenone could down-regulate and bind to these genes. In conclusion, TIMP1, GSTO1 and MYL6 may serve as promising biomarkers and targets for sepsis treatment.
Collapse
Affiliation(s)
- Ruichen Li
- University of Shanghai for Science and Technology, Shanghai, China
- Naval Medical Center, Naval Medical University, Shanghai, China
| | - Qiushi Wang
- Department of Critical Care Medicine, The First Affiliated Hospital of Shandong First Medical University, Shandong, China
| | - Ru Gao
- University of Shanghai for Science and Technology, Shanghai, China
- Naval Medical Center, Naval Medical University, Shanghai, China
| | - Rutao Shen
- The National Center for Liver Cancer, Naval Medical University, Shanghai, China
| | - Qihao Wang
- University of Shanghai for Science and Technology, Shanghai, China
| | - Xiuliang Cui
- The National Center for Liver Cancer, Naval Medical University, Shanghai, China
| | - Zhiming Jiang
- Department of Critical Care Medicine, The First Affiliated Hospital of Shandong First Medical University, Shandong, China
| | - Lijie Zhang
- Department of Information, Changhai Hospital, Naval Medical University, Shanghai, China
| | - Jingjing Fang
- Naval Medical Center, Naval Medical University, Shanghai, China
| |
Collapse
|
2
|
Madhan S, Kalaiselvan A. Omics data classification using constitutive artificial neural network optimized with single candidate optimizer. NETWORK (BRISTOL, ENGLAND) 2025; 36:343-367. [PMID: 38736309 DOI: 10.1080/0954898x.2024.2348726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 04/18/2024] [Accepted: 04/23/2024] [Indexed: 05/14/2024]
Abstract
Recent technical advancements enable omics-based biological study of molecules with very high throughput and low cost, such as genomic, proteomic, and microbionics'. To overcome this drawback, Omics Data Classification using Constitutive Artificial Neural Network Optimized with Single Candidate Optimizer (ODC-ZOA-CANN-SCO) is proposed in this manuscript. The input data is pre-processing by using Adaptive variational Bayesian filtering (AVBF) to replace missing values. The pre-processing data is fed to Zebra Optimization Algorithm (ZOA) for dimensionality reduction. Then, the Constitutive Artificial Neural Network (CANN) is employed to classify omics data. The weight parameter is optimized by Single Candidate Optimizer (SCO). The proposed ODC-ZOA-CANN-SCO method attains 25.36%, 21.04%, 22.18%, 26.90%, and 28.12% higher accuracy when analysed to the existing methods like multi-omics data integration utilizing adaptive graph learning and attention mode for patient categorization with biomarker identification (MOD-AGL-AM-PABI), deep learning method depending upon multi-omics data integration to create risk stratification prediction mode for skin cutaneous melanoma (DL-MODI-RSP-SCM), Deep belief network-base model for identifying Alzheimer's disease utilizing multi-omics data (DDN-DAD-MOD), hybrid cancer prediction depending upon multi-omics data and reinforcement learning state action reward state action (HCP-MOD-RL-SARSA), machine learning basis method under omics data including biological knowledge database for cancer clinical endpoint prediction (ML-ODBKD-CCEP) methods, respectively.
Collapse
Affiliation(s)
- Subramaniam Madhan
- Department of Computer Science and Engineering, University College of Engineering, Thirukkuvalai (A Constituent College of Anna University Chennai), Nagapattinam, Tamilnadu, India
| | - Anbarasan Kalaiselvan
- Department of Science and Humanities, University College of Engineering, Thirukkuvalai (A Constituent College of Anna University Chennai), Nagapattinam, Tamilnadu, India
| |
Collapse
|
3
|
Bedi P, Rani S, Gupta B, Bhasin V, Gole P. EpiBrCan-Lite: A lightweight deep learning model for breast cancer subtype classification using epigenomic data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 260:108553. [PMID: 39667144 DOI: 10.1016/j.cmpb.2024.108553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Revised: 11/14/2024] [Accepted: 12/03/2024] [Indexed: 12/14/2024]
Abstract
BACKGROUND AND OBJECTIVES Early breast cancer subtypes classification improves the survival rate as it facilitates prognosis of the patient. In literature this problem was prominently solved by various Machine Learning and Deep Learning techniques. However, these studies have three major shortcomings: huge Trainable Weight Parameters (TWP), suffer from low performance and class imbalance problem. METHODS This paper proposes a lightweight model named EpiBrCan-Lite for classifying breast cancer subtypes using DNA methylation data. This model encompasses three blocks namely Data Encoding, TransGRU, and Classification blocks. In Data Encoding block, the input features are encoded into equal sized chunks and then passed down to TransGRU block which is a modified version of traditional Transformer Encoder (TE). In TransGRU block, MLP module of traditional TE is replaced by GRU module, consisting of two GRU layers to reduce TWP and capture the long-range dependencies of input feature data. Furthermore, output of TransGRU block is passed to Classification block for classifying breast cancer into their subtypes. RESULTS The proposed model is validated using Accuracy, Precision, Recall, F1-score, FPR, and FNR metrics on TCGA breast cancer dataset. This dataset suffers from the class imbalance problem which is mitigated using Synthetic Minority Oversampling Technique (SMOTE). Experimentation results demonstrate that EpiBrCan-Lite model attained 95.85 % accuracy, 95.96 % recall, 95.85 % precision, 95.90 % F1-score, 1.03 % FPR, and 4.12 % FNR despite of utilizing only 1/1500 of TWP than other state-of-the-art models. CONCLUSION EpiBrCan-Lite model is efficiently classifying breast cancer subtypes, and being lightweight, it is suitable to be deployed on low computational powered devices.
Collapse
Affiliation(s)
- Punam Bedi
- Department of Computer Science, University of Delhi, Delhi, India.
| | - Surbhi Rani
- Department of Computer Science, University of Delhi, Delhi, India.
| | - Bhavna Gupta
- Keshav Mahavidyalaya, University of Delhi, New Delhi, India.
| | - Veenu Bhasin
- PGDAV College, University of Delhi, New Delhi, India.
| | - Pushkar Gole
- Department of Computer Science, University of Delhi, Delhi, India.
| |
Collapse
|
4
|
Ghosh S, Zhao X, Alim M, Brudno M, Bhat M. Artificial intelligence applied to 'omics data in liver disease: towards a personalised approach for diagnosis, prognosis and treatment. Gut 2025; 74:295-311. [PMID: 39174307 PMCID: PMC11874365 DOI: 10.1136/gutjnl-2023-331740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 07/24/2024] [Indexed: 08/24/2024]
Abstract
Advancements in omics technologies and artificial intelligence (AI) methodologies are fuelling our progress towards personalised diagnosis, prognosis and treatment strategies in hepatology. This review provides a comprehensive overview of the current landscape of AI methods used for analysis of omics data in liver diseases. We present an overview of the prevalence of different omics levels across various liver diseases, as well as categorise the AI methodology used across the studies. Specifically, we highlight the predominance of transcriptomic and genomic profiling and the relatively sparse exploration of other levels such as the proteome and methylome, which represent untapped potential for novel insights. Publicly available database initiatives such as The Cancer Genome Atlas and The International Cancer Genome Consortium have paved the way for advancements in the diagnosis and treatment of hepatocellular carcinoma. However, the same availability of large omics datasets remains limited for other liver diseases. Furthermore, the application of sophisticated AI methods to handle the complexities of multiomics datasets requires substantial data to train and validate the models and faces challenges in achieving bias-free results with clinical utility. Strategies to address the paucity of data and capitalise on opportunities are discussed. Given the substantial global burden of chronic liver diseases, it is imperative that multicentre collaborations be established to generate large-scale omics data for early disease recognition and intervention. Exploring advanced AI methods is also necessary to maximise the potential of these datasets and improve early detection and personalised treatment strategies.
Collapse
Affiliation(s)
- Soumita Ghosh
- Transplant AI Initiative, Ajmera Transplant Program, University Health Network, Toronto, Ontario, Canada
- Department of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Xun Zhao
- Transplant AI Initiative, Ajmera Transplant Program, University Health Network, Toronto, Ontario, Canada
| | - Mouaid Alim
- Transplant AI Initiative, Ajmera Transplant Program, University Health Network, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Michael Brudno
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
- Vector Institute of Artificial Intelligence, Toronto, Ontario, Canada
| | - Mamatha Bhat
- Transplant AI Initiative, Ajmera Transplant Program, University Health Network, Toronto, Ontario, Canada
- Department of Medicine, University of Toronto, Toronto, Ontario, Canada
- Division of Gastroenterology, University of Toronto Faculty of Medicine, Toronto, Ontario, Canada
- Toronto General Hospital Research Institute, University Health Network, Toronto, Ontario, Canada
| |
Collapse
|
5
|
Yadalam PK, Natarajan PM, Ardila CM. Variational graph autoencoder for reconstructed transcriptomic data associated with NLRP3 mediated pyroptosis in periodontitis. Sci Rep 2025; 15:1962. [PMID: 39809940 PMCID: PMC11733260 DOI: 10.1038/s41598-025-86455-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Accepted: 01/10/2025] [Indexed: 01/16/2025] Open
Abstract
The NLRP3 inflammasome, regulated by TLR4, plays a pivotal role in periodontitis by mediating inflammatory cytokine release and bone loss induced by Porphyromonas gingivalis. Periodontal disease creates a hypoxic environment, favoring anaerobic bacteria survival and exacerbating inflammation. The NLRP3 inflammasome triggers pyroptosis, a programmed cell death that amplifies inflammation and tissue damage. This study evaluates the efficacy of Variational Graph Autoencoders (VGAEs) in reconstructing gene data related to NLRP3-mediated pyroptosis in periodontitis. The NCBI GEO dataset GSE262663, containing three samples with and without hypoxia exposure, was analyzed using unsupervised K-means clustering. This method identifies natural groupings within biological data without prior labels. VGAE, a deep learning model, captures complex graph relationships for tasks like link prediction and edge detection. The VGAE model demonstrated exceptional performance with an accuracy of 99.42% and perfect precision. While it identified 5,820 false negatives, indicating a conservative approach, it accurately predicted 4,080 out of 9,900 positive samples. The model's latent space distribution differed significantly from the original data, suggesting a tightly clustered representation of the gene expression patterns. K-means clustering and VGAE show promise in gene expression analysis and graph structure reconstruction for periodontitis research.
Collapse
Affiliation(s)
- Pradeep K Yadalam
- Department of Periodontics, Saveetha Dental College, Saveetha Institute of Medical and Technology Sciences, SIMATS, Saveetha University, Chennai, 600077, Tamil Nadu, India
| | - Prabhu Manickam Natarajan
- Department of Clinical Sciences, Center of Medical and Bio-allied Health Sciences and Research, College of Dentistry, Ajman University, Ajman, 346, United Arab Emirates.
| | - Carlos M Ardila
- Department of Basic Sciences, Faculty of Dentistry, Universidad de Antioquia U de A, Medellín, 050010, Colombia.
| |
Collapse
|
6
|
Budhkar A, Song Q, Su J, Zhang X. Demystifying the black box: A survey on explainable artificial intelligence (XAI) in bioinformatics. Comput Struct Biotechnol J 2025; 27:346-359. [PMID: 39897059 PMCID: PMC11782883 DOI: 10.1016/j.csbj.2024.12.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Revised: 12/21/2024] [Accepted: 12/23/2024] [Indexed: 02/04/2025] Open
Abstract
The widespread adoption of Artificial Intelligence (AI) and machine learning (ML) tools across various domains has showcased their remarkable capabilities and performance. Black-box AI models raise concerns about decision transparency and user confidence. Therefore, explainable AI (XAI) and explainability techniques have rapidly emerged in recent years. This paper aims to review existing works on explainability techniques in bioinformatics, with a particular focus on omics and imaging. We seek to analyze the growing demand for XAI in bioinformatics, identify current XAI approaches, and highlight their limitations. Our survey emphasizes the specific needs of both bioinformatics applications and users when developing XAI methods and we particularly focus on omics and imaging data. Our analysis reveals a significant demand for XAI in bioinformatics, driven by the need for transparency and user confidence in decision-making processes. At the end of the survey, we provided practical guidelines for system developers.
Collapse
Affiliation(s)
- Aishwarya Budhkar
- Department of Computer Science, Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, 700 N Woodlawn Ave, Bloomington, IN 47408, USA
| | - Qianqian Song
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, 1889 Museum Rd, Suite 7000, Gainesville, FL 32611, USA
| | - Jing Su
- Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, HITS 3000 BSAT, Indianapolis, IN 46202, USA
| | - Xuhong Zhang
- Department of Computer Science, Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, 700 N Woodlawn Ave, Bloomington, IN 47408, USA
| |
Collapse
|
7
|
Abdelaziz EH, Ismail R, Mabrouk MS, Amin E. Multi-omics data integration and analysis pipeline for precision medicine: Systematic review. Comput Biol Chem 2024; 113:108254. [PMID: 39447405 DOI: 10.1016/j.compbiolchem.2024.108254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 09/05/2024] [Accepted: 10/14/2024] [Indexed: 10/26/2024]
Abstract
Precision medicine has gained considerable popularity since the "one-size-fits-all" approach did not seem very effective or reflective of the complexity of the human body. Subsequently, since single-omics does not reflect the complexity of the human body's inner workings, it did not result in the expected advancement in the medical field. Therefore, the multi-omics approach has emerged. The multi-omics approach involves integrating data from different omics technologies, such as DNA sequencing, RNA sequencing, mass spectrometry, and others, using computational methods and then analyzing the integrated result for different downstream analysis applications such as survival analysis, cancer classification, or biomarker identification. Most of the recent reviews were constrained to discussing one aspect of the multi-omics analysis pipeline, such as the dimensionality reduction step, the integration methods, or the interpretability aspect; however, very few provide a comprehensive review of every step of the analysis. This study aims to give an overview of the multi-omics analysis pipeline, starting with the most popular multi-omics databases used in recent literature, dimensionality reduction techniques, details the different types of data integration techniques and their downstream analysis applications, describes the most commonly used evaluation metrics, highlights the importance of model interpretability, and lastly discusses the challenges and potential future work for multi-omics data integration in precision medicine.
Collapse
Affiliation(s)
| | - Rasha Ismail
- Faculty of Computer and Information Sciences, Ainshams University, Cairo, Egypt.
| | - Mai S Mabrouk
- Information Technology and Computer Science School, Nile University, Cairo, Egypt.
| | - Eman Amin
- Faculty of Computer and Information Sciences, Ainshams University, Cairo, Egypt.
| |
Collapse
|
8
|
Wang FA, Li Y, Zeng T. Deep Learning of radiology-genomics integration for computational oncology: A mini review. Comput Struct Biotechnol J 2024; 23:2708-2716. [PMID: 39035833 PMCID: PMC11260400 DOI: 10.1016/j.csbj.2024.06.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 06/18/2024] [Accepted: 06/18/2024] [Indexed: 07/23/2024] Open
Abstract
In the field of computational oncology, patient status is often assessed using radiology-genomics, which includes two key technologies and data, such as radiology and genomics. Recent advances in deep learning have facilitated the integration of radiology-genomics data, and even new omics data, significantly improving the robustness and accuracy of clinical predictions. These factors are driving artificial intelligence (AI) closer to practical clinical applications. In particular, deep learning models are crucial in identifying new radiology-genomics biomarkers and therapeutic targets, supported by explainable AI (xAI) methods. This review focuses on recent developments in deep learning for radiology-genomics integration, highlights current challenges, and outlines some research directions for multimodal integration and biomarker discovery of radiology-genomics or radiology-omics that are urgently needed in computational oncology.
Collapse
Affiliation(s)
- Feng-ao Wang
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
| | - Yixue Li
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- Guangzhou National Laboratory, Guangzhou, China
- GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macau Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou Laboratory, Guangzhou Medical University, Guangzhou, China
| | - Tao Zeng
- Guangzhou National Laboratory, Guangzhou, China
- GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macau Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou Laboratory, Guangzhou Medical University, Guangzhou, China
| |
Collapse
|
9
|
Nikafshan Rad H, Su Z, Trinh A, Hakim Newton M, Shamsani J, NYGC ALS Consortium, Karim A, Sattar A. Amyotrophic lateral sclerosis diagnosis using machine learning and multi-omic data integration. Heliyon 2024; 10:e38583. [PMID: 39640633 PMCID: PMC11619964 DOI: 10.1016/j.heliyon.2024.e38583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 09/25/2024] [Accepted: 09/26/2024] [Indexed: 12/07/2024] Open
Abstract
Amyotrophic Lateral Sclerosis (ALS) is a complex and rare neurodegenerative disorder characterized by significant genetic, molecular, and clinical heterogeneity. Despite numerous endeavors to discover the genetic factors underlying ALS, a significant number of these factors remain unknown. This knowledge gap highlights the necessity for personalized medicine approaches that can provide more comprehensive information for the purposes of diagnosis, prognosis, and treatment of ALS. This work utilizes an innovative approach by employing a machine learning-facilitated, multi-omic model to develop a more comprehensive knowledge of ALS. Through unsupervised clustering on gene expression profiles, 9,847 genes associated with ALS pathways are isolated and integrated with 7,699 genes containing rare, presumed pathogenic genomic variants, leading to a comprehensive amalgamation of 17,546 genes. Subsequently, a Variational Autoencoder is applied to distil complex biomedical information from these genes, culminating in the creation of the proposed Multi-Omics for ALS (MOALS) model, which has been designed to expose intricate genotype-phenotype interconnections within the dataset. Our meticulous investigation elucidates several pivotal ALS signaling pathways and demonstrates that MOALS is a superior model, outclassing other machine learning models based on single omic approaches such as SNV and RNA expression, enhancing accuracy by 1.7 percent and 6.2 percent, respectively. The findings of this study suggest that analyzing the relationships within biological systems can provide heuristic insights into the biological mechanisms that help to make highly accurate ALS diagnosis tools and achieve more interpretable results.
Collapse
Affiliation(s)
- Hima Nikafshan Rad
- School of Information and Communication Technology, Griffith University, 170 Kessels Rd, Nathan, Brisbane, 4111, QLD, Australia
| | - Zheng Su
- GenieUs Genomics Pty Ltd, Sydney, 2000, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, The University of New South Wales, Sydney, 2052, NSW, Australia
| | - Anne Trinh
- GenieUs Genomics Pty Ltd, Sydney, 2000, NSW, Australia
| | - M.A. Hakim Newton
- School of Information and Physical Sciences, The University of Newcastle, University Drive, Callaghan, Newcastle, 2308, NSW, Australia
| | | | - NYGC ALS Consortium
- The New York Genome Center, 101 Avenue of the Americas, New York, 10013, NY, USA
| | - Abdul Karim
- School of Information and Communication Technology, Griffith University, 170 Kessels Rd, Nathan, Brisbane, 4111, QLD, Australia
| | - Abdul Sattar
- School of Information and Communication Technology, Griffith University, 170 Kessels Rd, Nathan, Brisbane, 4111, QLD, Australia
- Institute of Integrated and Intelligent Systems, Griffith University, 170 Kessels Rd, Nathan, Brisbane, 4111, QLD, Australia
| |
Collapse
|
10
|
Wang X, Wang X, Cheng Y, Luo C, Xia W, Gao Z, Bu W, Jiang Y, Fei Y, Shi W, Tang J, Liu L, Zhu J, Zhao X. Construction of metal interpretable scoring system and identification of tungsten as a novel risk factor in COPD. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2024; 283:116842. [PMID: 39106568 DOI: 10.1016/j.ecoenv.2024.116842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 07/24/2024] [Accepted: 08/02/2024] [Indexed: 08/09/2024]
Abstract
Numerous studies have highlighted the correlation between metal intake and deteriorated pulmonary function, emphasizing its pivotal role in the progression of Chronic Obstructive Pulmonary Disease (COPD). However, the efficacy of traditional models is often compromised due to overfitting and high bias in datasets with low-level exposure, rendering them ineffective in delineating the contemporary risk trends associated with pulmonary diseases. To address these limitations, we embarked on developing advanced, interpretable models, crucial for elucidating the intricate mechanisms of metal toxicity and enriching the domain knowledge embedded in toxicity models. In this endeavor, we scrutinized extensive, long-term metal exposure datasets from NHANES to explore the interplay between metal and pulmonary functionality. Employing a variety of machine-learning approaches, we opted for the "Mixer of Experts" model for its proficiency in identifying a myriad of toxicological trends and sensitivities. We conceptualized and illustrated the TSAP (Toxicity Score at Population-level), a metal interpretable scoring system offering performance nearly equivalent to the amalgamation of standard interpretable methods addressing the "black box" conundrum. This streamlined, bifurcated procedural analysis proved instrumental in discerning established risk factors, thereby uncovering Tungsten as a novel contributor to COPD risk. SYNOPSIS: TSAP achieved satisfied performance with transparent interpretability, suggesting tungsten intake need further action for COPD prevention.
Collapse
Affiliation(s)
- Xuehai Wang
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China
| | - Xiangdong Wang
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China
| | - Yulan Cheng
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China
| | - Chao Luo
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China
| | - Weiyi Xia
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China
| | - Zhengnan Gao
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China
| | - Wenxia Bu
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China
| | - Yichen Jiang
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China
| | - Yue Fei
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China
| | - Weiwei Shi
- Nantong Hospital to Nanjing University of Chinese Medicine, China
| | - Juan Tang
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China
| | - Lei Liu
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China; Department of Pathology, Affiliated Hospital of Nantong University, Nantong 226001, China.
| | - Jinfeng Zhu
- Nantong Hospital to Nanjing University of Chinese Medicine, China.
| | - Xinyuan Zhao
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China.
| |
Collapse
|
11
|
Li M, Cai Y, Zhang M, Deng S, Wang L. NNBGWO-BRCA marker: Neural Network and binary grey wolf optimization based Breast cancer biomarker discovery framework using multi-omics dataset. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 254:108291. [PMID: 38909399 DOI: 10.1016/j.cmpb.2024.108291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 05/09/2024] [Accepted: 06/16/2024] [Indexed: 06/25/2024]
Abstract
BACKGROUND AND OBJECTIVE Breast cancer is a multifaceted condition characterized by diverse features and a substantial mortality rate, underscoring the imperative for timely detection and intervention. The utilization of multi-omics data has gained significant traction in recent years to identify biomarkers and classify subtypes in breast cancer. This kind of research idea from part to whole will also be an inevitable trend in future life science research. Deep learning can integrate and analyze multi-omics data to predict cancer subtypes, which can further drive targeted therapies. However, there are few articles leveraging the nature of deep learning for feature selection. Therefore, this paper proposes a Neural Network and Binary grey Wolf Optimization based BReast CAncer bioMarker (NNBGWO-BRCAMarker) discovery framework using multi-omics data to obtain a series of biomarkers for precise classification of breast cancer subtypes. METHODS NNBGWO-BRCAMarker consists of two phases: in the first phase, relevant genes are selected using the weights obtained from a trained feedforward neural network; in the second phase, the binary grey wolf optimization algorithm is leveraged to further screen the selected genes, resulting in a set of potential breast cancer biomarkers. RESULTS The SVM classifier with RBF kernel achieved a classification accuracy of 0.9242 ± 0.03 when trained using the 80 biomarkers identified by NNBGWO-BRCAMarker, as evidenced by the experimental results. We conducted a comprehensive gene set analysis, prognostic analysis, and druggability analysis, unveiling 25 druggable genes, 16 enriched pathways strongly linked to specific subtypes of breast cancer, and 8 genes linked to prognostic outcomes. CONCLUSIONS The proposed framework successfully identified 80 biomarkers from the multi-omics data, enabling accurate classification of breast cancer subtypes. This discovery may offer novel insights for clinicians to pursue in further studies.
Collapse
Affiliation(s)
- Min Li
- School of Information Engineering, Nanchang Institute of Technology, No. 289 Tianxiang Road, Nanchang Jiangxi, PR China.
| | - Yuheng Cai
- School of Information Engineering, Nanchang Institute of Technology, No. 289 Tianxiang Road, Nanchang Jiangxi, PR China
| | - Mingzhuang Zhang
- School of Information Engineering, Nanchang Institute of Technology, No. 289 Tianxiang Road, Nanchang Jiangxi, PR China
| | - Shaobo Deng
- School of Information Engineering, Nanchang Institute of Technology, No. 289 Tianxiang Road, Nanchang Jiangxi, PR China
| | - Lei Wang
- School of Information Engineering, Nanchang Institute of Technology, No. 289 Tianxiang Road, Nanchang Jiangxi, PR China
| |
Collapse
|
12
|
van Hilten A, Katz S, Saccenti E, Niessen WJ, Roshchupkin GV. Designing interpretable deep learning applications for functional genomics: a quantitative analysis. Brief Bioinform 2024; 25:bbae449. [PMID: 39293804 PMCID: PMC11410376 DOI: 10.1093/bib/bbae449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 08/07/2024] [Accepted: 08/28/2024] [Indexed: 09/20/2024] Open
Abstract
Deep learning applications have had a profound impact on many scientific fields, including functional genomics. Deep learning models can learn complex interactions between and within omics data; however, interpreting and explaining these models can be challenging. Interpretability is essential not only to help progress our understanding of the biological mechanisms underlying traits and diseases but also for establishing trust in these model's efficacy for healthcare applications. Recognizing this importance, recent years have seen the development of numerous diverse interpretability strategies, making it increasingly difficult to navigate the field. In this review, we present a quantitative analysis of the challenges arising when designing interpretable deep learning solutions in functional genomics. We explore design choices related to the characteristics of genomics data, the neural network architectures applied, and strategies for interpretation. By quantifying the current state of the field with a predefined set of criteria, we find the most frequent solutions, highlight exceptional examples, and identify unexplored opportunities for developing interpretable deep learning models in genomics.
Collapse
Affiliation(s)
- Arno van Hilten
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| | - Sonja Katz
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Wiro J Niessen
- Department of Imaging Physics, Delft University of Technology, 2628 CD Delft, The Netherlands
| | - Gennady V Roshchupkin
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Department of Epidemiology, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| |
Collapse
|
13
|
Gross B, Dauvin A, Cabeli V, Kmetzsch V, El Khoury J, Dissez G, Ouardini K, Grouard S, Davi A, Loeb R, Esposito C, Hulot L, Ghermi R, Blum M, Darhi Y, Durand EY, Romagnoni A. Robust evaluation of deep learning-based representation methods for survival and gene essentiality prediction on bulk RNA-seq data. Sci Rep 2024; 14:17064. [PMID: 39048590 PMCID: PMC11269749 DOI: 10.1038/s41598-024-67023-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 07/08/2024] [Indexed: 07/27/2024] Open
Abstract
Deep learning (DL) has shown potential to provide powerful representations of bulk RNA-seq data in cancer research. However, there is no consensus regarding the impact of design choices of DL approaches on the performance of the learned representation, including the model architecture, the training methodology and the various hyperparameters. To address this problem, we evaluate the performance of various design choices of DL representation learning methods using TCGA and DepMap pan-cancer datasets and assess their predictive power for survival and gene essentiality predictions. We demonstrate that baseline methods achieve comparable or superior performance compared to more complex models on survival predictions tasks. DL representation methods, however, are the most efficient to predict the gene essentiality of cell lines. We show that auto-encoders (AE) are consistently improved by techniques such as masking and multi-head training. Our results suggest that the impact of DL representations and of pretraining are highly task- and architecture-dependent, highlighting the need for adopting rigorous evaluation guidelines. These guidelines for robust evaluation are implemented in a pipeline made available to the research community.
Collapse
|
14
|
Li M, Guo H, Wang K, Kang C, Yin Y, Zhang H. AVBAE-MODFR: A novel deep learning framework of embedding and feature selection on multi-omics data for pan-cancer classification. Comput Biol Med 2024; 177:108614. [PMID: 38796884 DOI: 10.1016/j.compbiomed.2024.108614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 02/27/2024] [Accepted: 05/11/2024] [Indexed: 05/29/2024]
Abstract
Integration analysis of cancer multi-omics data for pan-cancer classification has the potential for clinical applications in various aspects such as tumor diagnosis, analyzing clinically significant features, and providing precision medicine. In these applications, the embedding and feature selection on high-dimensional multi-omics data is clinically necessary. Recently, deep learning algorithms become the most promising cancer multi-omic integration analysis methods, due to the powerful capability of capturing nonlinear relationships. Developing effective deep learning architectures for cancer multi-omics embedding and feature selection remains a challenge for researchers in view of high dimensionality and heterogeneity. In this paper, we propose a novel two-phase deep learning model named AVBAE-MODFR for pan-cancer classification. AVBAE-MODFR achieves embedding by a multi2multi autoencoder based on the adversarial variational Bayes method and further performs feature selection utilizing a dual-net-based feature ranking method. AVBAE-MODFR utilizes AVBAE to pre-train the network parameters, which improves the classification performance and enhances feature ranking stability in MODFR. Firstly, AVBAE learns high-quality representation among multiple omics features for unsupervised pan-cancer classification. We design an efficient discriminator architecture to distinguish the latent distributions for updating forward variational parameters. Secondly, we propose MODFR to simultaneously evaluate multi-omics feature importance for feature selection by training a designed multi2one selector network, where the efficient evaluation approach based on the average gradient of random mask subsets can avoid bias caused by input feature drift. We conduct experiments on the TCGA pan-cancer dataset and compare it with four state-of-the-art methods for each phase. The results show the superiority of AVBAE-MODFR over SOTA methods.
Collapse
Affiliation(s)
- Minghe Li
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin, China
| | - Huike Guo
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin, China
| | - Keao Wang
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin, China
| | - Chuanze Kang
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin, China
| | - Yanbin Yin
- Department of Food Science and Technology, University of Nebraska - Lincoln, NE, USA
| | - Han Zhang
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin, China.
| |
Collapse
|
15
|
Wang FA, Zhuang Z, Gao F, He R, Zhang S, Wang L, Liu J, Li Y. TMO-Net: an explainable pretrained multi-omics model for multi-task learning in oncology. Genome Biol 2024; 25:149. [PMID: 38845006 PMCID: PMC11157742 DOI: 10.1186/s13059-024-03293-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 05/29/2024] [Indexed: 06/09/2024] Open
Abstract
Cancer is a complex disease composing systemic alterations in multiple scales. In this study, we develop the Tumor Multi-Omics pre-trained Network (TMO-Net) that integrates multi-omics pan-cancer datasets for model pre-training, facilitating cross-omics interactions and enabling joint representation learning and incomplete omics inference. This model enhances multi-omics sample representation and empowers various downstream oncology tasks with incomplete multi-omics datasets. By employing interpretable learning, we characterize the contributions of distinct omics features to clinical outcomes. The TMO-Net model serves as a versatile framework for cross-modal multi-omics learning in oncology, paving the way for tumor omics-specific foundation models.
Collapse
Affiliation(s)
- Feng-Ao Wang
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China
- Guangzhou National Laboratory, Guangzhou, 510005, China
| | - Zhenfeng Zhuang
- Department of Computer Science at the School of Informatics, Xiamen University, Xiamen, 361005, China
| | - Feng Gao
- Department of Colorectal Surgery, The Sixth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, 510655, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200433, China
- Biomedical Innovation Center, The Sixth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, 510655, China
| | - Ruikun He
- BYHEALTH Institute of Nutrition & Health, Guangzhou, 510000, China
| | - Shaoting Zhang
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200433, China
| | - Liansheng Wang
- Department of Computer Science at the School of Informatics, Xiamen University, Xiamen, 361005, China.
| | - Junwei Liu
- Guangzhou National Laboratory, Guangzhou, 510005, China.
| | - Yixue Li
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China.
- Guangzhou National Laboratory, Guangzhou, 510005, China.
- Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, 200030, China.
- GZMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macau Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou Medical University, Guangzhou, 511436, China.
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, 200433, China.
- Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai, 200032, China.
| |
Collapse
|
16
|
Tran TO, Vo TH, Le NQK. Omics-based deep learning approaches for lung cancer decision-making and therapeutics development. Brief Funct Genomics 2024; 23:181-192. [PMID: 37519050 DOI: 10.1093/bfgp/elad031] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 07/04/2023] [Accepted: 07/13/2023] [Indexed: 08/01/2023] Open
Abstract
Lung cancer has been the most common and the leading cause of cancer deaths globally. Besides clinicopathological observations and traditional molecular tests, the advent of robust and scalable techniques for nucleic acid analysis has revolutionized biological research and medicinal practice in lung cancer treatment. In response to the demands for minimally invasive procedures and technology development over the past decade, many types of multi-omics data at various genome levels have been generated. As omics data grow, artificial intelligence models, particularly deep learning, are prominent in developing more rapid and effective methods to potentially improve lung cancer patient diagnosis, prognosis and treatment strategy. This decade has seen genome-based deep learning models thriving in various lung cancer tasks, including cancer prediction, subtype classification, prognosis estimation, cancer molecular signatures identification, treatment response prediction and biomarker development. In this study, we summarized available data sources for deep-learning-based lung cancer mining and provided an update on recent deep learning models in lung cancer genomics. Subsequently, we reviewed the current issues and discussed future research directions of deep-learning-based lung cancer genomics research.
Collapse
Affiliation(s)
- Thi-Oanh Tran
- International Ph.D. Program in Cell Therapy and Regenerative Medicine, College of Medicine, Taipei Medical University, No 250 Wuxing Street, 110, Taipei, Taiwan
- AIBioMed Research Group, Taipei Medical University, No 250 Wuxing Street, 110, Taipei, Taiwan
- Hematology and Blood Transfusion Center, Bach Mai Hospital, No 78 Giai Phong Street, Hanoi, Viet Nam
| | - Thanh Hoa Vo
- Department of Science, School of Science and Computing, South East Technological University, Waterford X91 K0EK, Ireland
- Pharmaceutical and Molecular Biotechnology Research Center (PMBRC), South East Technological University, Waterford X91 K0EK, Ireland
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, 250 Wuxing Street, 110, Taipei, Taiwan
- AIBioMed Research Group, Taipei Medical University, No 250 Wuxing Street, 110, Taipei, Taiwan
- Research Center for Artificial Intelligence in Medicine, Taipei Medical University, 250 Wuxing Street, 110, Taipei, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, 252 Wuxing Street, 110, Taipei, Taiwan
| |
Collapse
|
17
|
Jiang L, Xu C, Bai Y, Liu A, Gong Y, Wang YP, Deng HW. Autosurv: interpretable deep learning framework for cancer survival analysis incorporating clinical and multi-omics data. NPJ Precis Oncol 2024; 8:4. [PMID: 38182734 PMCID: PMC10770412 DOI: 10.1038/s41698-023-00494-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 12/05/2023] [Indexed: 01/07/2024] Open
Abstract
Accurate prognosis for cancer patients can provide critical information for optimizing treatment plans and improving life quality. Combining omics data and demographic/clinical information can offer a more comprehensive view of cancer prognosis than using omics or clinical data alone and can also reveal the underlying disease mechanisms at the molecular level. In this study, we developed and validated a deep learning framework to extract information from high-dimensional gene expression and miRNA expression data and conduct prognosis prediction for breast cancer and ovarian-cancer patients using multiple independent multi-omics datasets. Our model achieved significantly better prognosis prediction than the current machine learning and deep learning approaches in various settings. Moreover, an interpretation method was applied to tackle the "black-box" nature of deep neural networks and we identified features (i.e., genes, miRNA, demographic/clinical variables) that were important to distinguish predicted high- and low-risk patients. The significance of the identified features was partially supported by previous studies.
Collapse
Affiliation(s)
- Lindong Jiang
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Chao Xu
- Department of Biostatistics and Epidemiology, University of Oklahoma Health Sciences Center, Oklahoma City, OK, 73104, USA
| | - Yuntong Bai
- Department of Biomedical Engineering, School of Science and Engineering, Tulane University, New Orleans, LA, 70118, USA
| | - Anqi Liu
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Yun Gong
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Yu-Ping Wang
- Department of Biomedical Engineering, School of Science and Engineering, Tulane University, New Orleans, LA, 70118, USA
| | - Hong-Wen Deng
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, 70112, USA.
| |
Collapse
|
18
|
Wani NA, Kumar R, Bedi J. DeepXplainer: An interpretable deep learning based approach for lung cancer detection using explainable artificial intelligence. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 243:107879. [PMID: 37897989 DOI: 10.1016/j.cmpb.2023.107879] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 10/17/2023] [Accepted: 10/20/2023] [Indexed: 10/30/2023]
Abstract
BACKGROUND AND OBJECTIVE Artificial intelligence (AI) has several uses in the healthcare industry, some of which include healthcare management, medical forecasting, practical making of decisions, and diagnosis. AI technologies have reached human-like performance, but their use is limited since they are still largely viewed as opaque black boxes. This distrust remains the primary factor for their limited real application, particularly in healthcare. As a result, there is a need for interpretable predictors that provide better predictions and also explain their predictions. METHODS This study introduces "DeepXplainer", a new interpretable hybrid deep learning-based technique for detecting lung cancer and providing explanations of the predictions. This technique is based on a convolutional neural network and XGBoost. XGBoost is used for class label prediction after "DeepXplainer" has automatically learned the features of the input using its many convolutional layers. For providing explanations or explainability of the predictions, an explainable artificial intelligence method known as "SHAP" is implemented. RESULTS The open-source "Survey Lung Cancer" dataset was processed using this method. On multiple parameters, including accuracy, sensitivity, F1-score, etc., the proposed method outperformed the existing methods. The proposed method obtained an accuracy of 97.43%, a sensitivity of 98.71%, and an F1-score of 98.08. After the model has made predictions with this high degree of accuracy, each prediction is explained by implementing an explainable artificial intelligence method at both the local and global levels. CONCLUSIONS A deep learning-based classification model for lung cancer is proposed with three primary components: one for feature learning, another for classification, and a third for providing explanations for the predictions made by the proposed hybrid (ConvXGB) model. The proposed "DeepXplainer" has been evaluated using a variety of metrics, and the results demonstrate that it outperforms the current benchmarks. Providing explanations for the predictions, the proposed approach may help doctors in detecting and treating lung cancer patients more effectively.
Collapse
Affiliation(s)
- Niyaz Ahmad Wani
- Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala (PIN: 147004), Punjab, India.
| | - Ravinder Kumar
- Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala (PIN: 147004), Punjab, India.
| | - Jatin Bedi
- Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala (PIN: 147004), Punjab, India.
| |
Collapse
|
19
|
Jurenaite N, León-Periñán D, Donath V, Torge S, Jäkel R. SetQuence & SetOmic: Deep set transformers for whole genome and exome tumour analysis. Biosystems 2024; 235:105095. [PMID: 38065399 DOI: 10.1016/j.biosystems.2023.105095] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 10/17/2023] [Accepted: 11/28/2023] [Indexed: 12/21/2023]
Abstract
In oncology, Deep Learning has shown great potential to personalise tasks such as tumour type classification, based on per-patient omics data-sets. Being high dimensional, incorporation of such data in one model is a challenge, often leading to one-dimensional studies and, therefore, information loss. Instead, we first propose relying on non-fixed sets of whole genome or whole exome variant-associated sequences, which can be used for supervised learning of oncology-relevant tasks by our Set Transformer based Deep Neural Network, SetQuence. We optimise this architecture to improve its efficiency. This allows for exploration of not just coding but also non-coding variants, from large datasets. Second, we extend the model to incorporate these representations together with multiple other sources of omics data in a flexible way with SetOmic. Evaluation, using these representations, shows improved robustness and reduced information loss compared to previous approaches, while still being computationally tractable. By means of Explainable Artificial Intelligence methods, our models are able to recapitulate the biological contribution of highly attributed features in the tumours studied. This validation opens the door to novel directions in multi-faceted genome and exome wide biomarker discovery and personalised treatment among other presently clinically relevant tasks.
Collapse
Affiliation(s)
- Neringa Jurenaite
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), TU Dresden, Chemnitzer Str 46b, Dresden, 01187, Saxony, Germany.
| | - Daniel León-Periñán
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), TU Dresden, Chemnitzer Str 46b, Dresden, 01187, Saxony, Germany; Max-Delbrück-Centrum für Molekulare Medizin, Hannoversche Str. 28, Berlin, 10115, Germany.
| | - Veronika Donath
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), TU Dresden, Chemnitzer Str 46b, Dresden, 01187, Saxony, Germany.
| | - Sunna Torge
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), TU Dresden, Chemnitzer Str 46b, Dresden, 01187, Saxony, Germany.
| | - René Jäkel
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), TU Dresden, Chemnitzer Str 46b, Dresden, 01187, Saxony, Germany.
| |
Collapse
|
20
|
Li C, Wang T, Lin X. Analyzing omics data by feature combinations based on kernel functions. J Bioinform Comput Biol 2023; 21:2350021. [PMID: 37852788 DOI: 10.1142/s021972002350021x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2023]
Abstract
Defining meaningful feature (molecule) combinations can enhance the study of disease diagnosis and prognosis. However, feature combinations are complex and various in biosystems, and the existing methods examine the feature cooperation in a single, fixed pattern for all feature pairs, such as linear combination. To identify the appropriate combination between two features and evaluate feature combination more comprehensively, this paper adopts kernel functions to study feature relationships and proposes a new omics data analysis method KF-[Formula: see text]-TSP. Besides linear combination, KF-[Formula: see text]-TSP also explores the nonlinear combination of features, and allows hybridizing multiple kernel functions to evaluate feature interaction from multiple views. KF-[Formula: see text]-TSP selects [Formula: see text] > 0 top-scoring pairs to build an ensemble classifier. Experimental results show that KF-[Formula: see text]-TSP with multiple kernel functions which evaluates feature combinations from multiple views is better than that with only one kernel function. Meanwhile, KF-[Formula: see text]-TSP performs better than TSP family algorithms and the previous methods based on conversion strategy in most cases. It performs similarly to the popular machine learning methods in omics data analysis, but involves fewer feature pairs. In the procedure of physiological and pathological changes, molecular interactions can be both linear and nonlinear. Hence, KF-[Formula: see text]-TSP, which can measure molecular combination from multiple perspectives, can help to mine information closely related to physiological and pathological changes and study disease mechanism.
Collapse
Affiliation(s)
- Chao Li
- School of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Dalian, Liaoning 116024, P. R. China
| | - Tianxiang Wang
- School of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Dalian, Liaoning 116024, P. R. China
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Dalian, Liaoning 116024, P. R. China
| |
Collapse
|
21
|
Yang H, Liu Y, Yang Y, Li D, Wang Z. InDEP: an interpretable machine learning approach to predict cancer driver genes from multi-omics data. Brief Bioinform 2023; 24:bbad318. [PMID: 37649392 DOI: 10.1093/bib/bbad318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 06/14/2023] [Accepted: 08/16/2023] [Indexed: 09/01/2023] Open
Abstract
Cancer driver genes are critical in driving tumor cell growth, and precisely identifying these genes is crucial in advancing our understanding of cancer pathogenesis and developing targeted cancer drugs. Despite the current methods for discovering cancer driver genes that mainly rely on integrating multi-omics data, many existing models are overly complex, and it is difficult to interpret the results accurately. This study aims to address this issue by introducing InDEP, an interpretable machine learning framework based on cascade forests. InDEP is designed with easy-to-interpret features, cascade forests based on decision trees and a KernelSHAP module that enables fine-grained post-hoc interpretation. Integrating multi-omics data, InDEP can identify essential features of classified driver genes at both the gene and cancer-type levels. The framework accurately identifies driver genes, discovers new patterns that make genes as driver genes and refines the cancer driver gene catalog. In comparison with state-of-the-art methods, InDEP proved to be more accurate on the test set and identified reliable candidate driver genes. Mutational features were the primary drivers for InDEP's identifying driver genes, with other omics features also contributing. At the gene level, the framework concluded that substitution-type mutations were the main reason most genes were identified as driver genes. InDEP's ability to identify reliable candidate driver genes opens up new avenues for precision oncology and discovering new biomedical knowledge. This framework can help advance cancer research by providing an interpretable method for identifying cancer driver genes and their contribution to cancer pathogenesis, facilitating the development of targeted cancer drugs.
Collapse
Affiliation(s)
- Hai Yang
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237, Shanghai, PR China
| | - Yawen Liu
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237, Shanghai, PR China
| | - Yijing Yang
- Department of Computer Science, University of Illinois Urbana-Champaign, Champaign, Illinois, United States of America
| | - Dongdong Li
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237, Shanghai, PR China
| | - Zhe Wang
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237, Shanghai, PR China
| |
Collapse
|
22
|
Cheng KP, Shen WX, Jiang YY, Chen Y, Chen YZ, Tan Y. Deep learning of 2D-Restructured gene expression representations for improved low-sample therapeutic response prediction. Comput Biol Med 2023; 164:107245. [PMID: 37480677 DOI: 10.1016/j.compbiomed.2023.107245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 06/27/2023] [Accepted: 07/07/2023] [Indexed: 07/24/2023]
Abstract
Clinical outcome prediction is important for stratified therapeutics. Machine learning (ML) and deep learning (DL) methods facilitate therapeutic response prediction from transcriptomic profiles of cells and clinical samples. Clinical transcriptomic DL is challenged by the low-sample sizes (34-286 subjects), high-dimensionality (up to 21,653 genes) and unordered nature of clinical transcriptomic data. The established methods rely on ML algorithms at accuracy levels of 0.6-0.8 AUC/ACC values. Low-sample DL algorithms are needed for enhanced prediction capability. Here, an unsupervised manifold-guided algorithm was employed for restructuring transcriptomic data into ordered image-like 2D-representations, followed by efficient DL of these 2D-representations with deep ConvNets. Our DL models significantly outperformed the state-of-the-art (SOTA) ML models on 82% of 17 low-sample benchmark datasets (53% with >0.05 AUC/ACC improvement). They are more robust than the SOTA models in cross-cohort prediction tasks, and in identifying robust biomarkers and response-dependent variational patterns consistent with experimental indications.
Collapse
Affiliation(s)
- Kai Ping Cheng
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, Shenzhen, 518132, PR China
| | - Wan Xiang Shen
- Bioinformatics and Drug Design Group, Department of Pharmacy, Center for Computational Science and Engineering, National University of Singapore, 117543, Singapore
| | - Yu Yang Jiang
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, PR China
| | - Yan Chen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China
| | - Yu Zong Chen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, Shenzhen, 518132, PR China.
| | - Ying Tan
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; The Institute of Drug Discovery Technology, Ningbo University, Ningbo, 315211, PR China; Shenzhen Kivita Innovative Drug Discovery Institute, Shenzhen, 518110, PR China.
| |
Collapse
|
23
|
Jiang L, Xu C, Bai Y, Liu A, Gong Y, Wang YP, Deng HW. AUTOSURV: INTERPRETABLE DEEP LEARNING FRAMEWORK FOR CANCER SURVIVAL ANALYSIS INCORPORATING CLINICAL AND MULTI-OMICS DATA. RESEARCH SQUARE 2023:rs.3.rs-2486756. [PMID: 37609286 PMCID: PMC10441464 DOI: 10.21203/rs.3.rs-2486756/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Accurate prognosis for cancer patients can provide critical information for optimizing treatment plans and improving life quality. Combining omics data and demographic/clinical information can offer a more comprehensive view of cancer prognosis than using omics or clinical data alone and can reveal the underlying disease mechanisms at the molecular level. In this study, we developed a novel deep learning framework to extract information from high-dimensional gene expression and miRNA expression data and conduct prognosis prediction for breast cancer and ovarian cancer patients. Our model achieved significantly better prognosis prediction than the conventional Cox Proportional Hazard model and other competitive deep learning approaches in various settings. Moreover, an interpretation approach was applied to tackle the "black-box" nature of deep neural networks and we identified features (i.e., genes, miRNA, demographic/clinical variables) that made important contributions to distinguishing predicted high- and low-risk patients. The identified associations were partially supported by previous studies.
Collapse
Affiliation(s)
- Lindong Jiang
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, 70112
| | - Chao Xu
- Department of Biostatistics and Epidemiology, University of Oklahoma Health Sciences Center, Oklahoma City, OK, 73104
| | - Yuntong Bai
- Department of Biomedical Engineering, School of Science and Engineering, Tulane University, New Orleans, LA, 70118
| | - Anqi Liu
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, 70112
| | - Yun Gong
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, 70112
| | - Yu-Ping Wang
- Department of Biomedical Engineering, School of Science and Engineering, Tulane University, New Orleans, LA, 70118
| | - Hong-Wen Deng
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, 70112
| |
Collapse
|
24
|
Zhang S, Zhang S, Yi H, Ma S. Aligned deep neural network for integrative analysis with high-dimensional input. J Biomed Inform 2023; 144:104434. [PMID: 37391115 PMCID: PMC10534141 DOI: 10.1016/j.jbi.2023.104434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 05/20/2023] [Accepted: 06/19/2023] [Indexed: 07/02/2023]
Abstract
OBJECTIVE Deep neural network (DNN) techniques have demonstrated significant advantages over regression and some other techniques. In recent studies, DNN-based analysis has been conducted on data with high-dimensional input such as omics measurements. In such analysis, regularization, in particular penalization, has been applied to regularize estimation and distinguish relevant input variables from irrelevant ones. A unique challenge arises from the "lack of information" attributable to high dimensionality of input and limited size of training data. For many data/studies, there exist other data/studies that may be relevant and can potentially provide additional information to boost performance. METHODS In this study, we conduct integrative analysis of multiple independent datasets/studies, with the goal of borrowing information across each other and improving overall performance. Significantly different from regression-based integrative analysis (where alignment can be easily achieved based on covariates), alignment across multiple DNNs can be nontrivial. We develop ANNI, an Aligned DNN technique for Integrative analysis with high-dimensional input. Penalization is applied for regularized estimation, selection of important input variables, and, equally importantly, information borrowing across multiple DNNs. An effective computational algorithm is developed. RESULTS Extensive simulations demonstrate competitive performance of the proposed technique. The analysis of cancer omics data further establishes its practical utility.
Collapse
Affiliation(s)
- Shunqin Zhang
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China; Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing, China; Department of Biostatistics, Yale University, New Haven, CT, USA
| | - Sanguo Zhang
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China; Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing, China
| | - Huangdi Yi
- Department of Biostatistics, Yale University, New Haven, CT, USA
| | - Shuangge Ma
- Department of Biostatistics, Yale University, New Haven, CT, USA.
| |
Collapse
|
25
|
Nagpal S, Mande SS. Environmental insults and compensative responses: when microbiome meets cancer. Discov Oncol 2023; 14:130. [PMID: 37453005 DOI: 10.1007/s12672-023-00745-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 07/04/2023] [Indexed: 07/18/2023] Open
Abstract
Tumor microenvironment has recently been ascribed a new hallmark-the polymorphic microbiome. Accumulating evidence regarding the tissue specific territories of tumor-microbiome have opened new and interesting avenues. A pertinent question is regarding the functional consequence of the interface between host-microbiome and cancer. Given microbial communities have predominantly been explored through an ecological perspective, it is important that the foundational aspects of ecological stress and the fight to 'survive and thrive' are accounted for tumor-micro(b)environment as well. Building on existing evidence and classical microbial ecology, here we attempt to characterize the ecological stresses and the compensative responses of the microorganisms inside the tumor microenvironment. What insults would microbes experience inside the cancer jungle? How would they respond to these insults? How the interplay of stress and microbial quest for survival would influence the fate of tumor? This work asks these questions and tries to describe this underdiscussed ecological interface of the tumor and its microbiota. It is hoped that a larger scientific thought on the importance of microbial competition sensing vis-à-vis tumor-microenvironment would be stimulated.
Collapse
Affiliation(s)
- Sunil Nagpal
- TCS Research, Tata Consultancy Services Ltd, Pune, 411013, India.
- CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), New Delhi, 110025, India.
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India.
| | - Sharmila S Mande
- TCS Research, Tata Consultancy Services Ltd, Pune, 411013, India.
| |
Collapse
|
26
|
Chen Z, Yang Z, Zhu L, Gao P, Matsubara T, Kanaya S, Altaf-Ul-Amin M. Learning vector quantized representation for cancer subtypes identification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 236:107543. [PMID: 37100024 DOI: 10.1016/j.cmpb.2023.107543] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 02/13/2023] [Accepted: 04/07/2023] [Indexed: 05/21/2023]
Abstract
BACKGROUND AND OBJECTIVE Defining and separating cancer subtypes is essential for facilitating personalized therapy modality and prognosis of patients. The definition of subtypes has been constantly recalibrated as a result of our deepened understanding. During this recalibration, researchers often rely on clustering of cancer data to provide an intuitive visual reference that could reveal the intrinsic characteristics of subtypes. The data being clustered are often omics data such as transcriptomics that have strong correlations to the underlying biological mechanism. However, while existing studies have shown promising results, they suffer from issues associated with omics data: sample scarcity and high dimensionality while they impose unrealistic assumptions to extract useful features from the data while avoiding overfitting to spurious correlations. METHODS This paper proposes to leverage a recent strong generative model, Vector-Quantized Variational AutoEncoder, to tackle the data issues and extract discrete representations that are crucial to the quality of subsequent clustering by retaining only information relevant to reconstructing the input. RESULTS Extensive experiments and medical analysis on multiple datasets comprising 10 distinct cancers demonstrate the proposed clustering results can significantly and robustly improve prognosis over prevalent subtyping systems. CONCLUSION Our proposal does not impose strict assumptions on data distribution; while, its latent features are better representations of the transcriptomic data in different cancer subtypes, capable of yielding superior clustering performance with any mainstream clustering method.
Collapse
Affiliation(s)
- Zheng Chen
- Graduate School of Engineering Science, Osaka University, Japan.
| | - Ziwei Yang
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Japan
| | - Lingwei Zhu
- Department of Computing Science, University of Alberta, Canada
| | - Peng Gao
- Institute for Quantitative Biosciences, University of Tokyo, Japan
| | | | - Shigehiko Kanaya
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Japan; Data Science Center, Nara Insitute of Science and Technology, Japan
| | - Md Altaf-Ul-Amin
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Japan
| |
Collapse
|
27
|
Wysocka M, Wysocki O, Zufferey M, Landers D, Freitas A. A systematic review of biologically-informed deep learning models for cancer: fundamental trends for encoding and interpreting oncology data. BMC Bioinformatics 2023; 24:198. [PMID: 37189058 PMCID: PMC10186658 DOI: 10.1186/s12859-023-05262-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 03/30/2023] [Indexed: 05/17/2023] Open
Abstract
BACKGROUND There is an increasing interest in the use of Deep Learning (DL) based methods as a supporting analytical framework in oncology. However, most direct applications of DL will deliver models with limited transparency and explainability, which constrain their deployment in biomedical settings. METHODS This systematic review discusses DL models used to support inference in cancer biology with a particular emphasis on multi-omics analysis. It focuses on how existing models address the need for better dialogue with prior knowledge, biological plausibility and interpretability, fundamental properties in the biomedical domain. For this, we retrieved and analyzed 42 studies focusing on emerging architectural and methodological advances, the encoding of biological domain knowledge and the integration of explainability methods. RESULTS We discuss the recent evolutionary arch of DL models in the direction of integrating prior biological relational and network knowledge to support better generalisation (e.g. pathways or Protein-Protein-Interaction networks) and interpretability. This represents a fundamental functional shift towards models which can integrate mechanistic and statistical inference aspects. We introduce a concept of bio-centric interpretability and according to its taxonomy, we discuss representational methodologies for the integration of domain prior knowledge in such models. CONCLUSIONS The paper provides a critical outlook into contemporary methods for explainability and interpretability used in DL for cancer. The analysis points in the direction of a convergence between encoding prior knowledge and improved interpretability. We introduce bio-centric interpretability which is an important step towards formalisation of biological interpretability of DL models and developing methods that are less problem- or application-specific.
Collapse
Affiliation(s)
- Magdalena Wysocka
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
| | - Oskar Wysocki
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| | - Marie Zufferey
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| | - Dónal Landers
- DeLondra Oncology Ltd, 38 Carlton Avenue, Wilmslow, SK9 4EP UK
| | - André Freitas
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| |
Collapse
|
28
|
Wang H, Yao Z, Luo R, Liu J, Wang Z, Zhang G. LaCOme: Learning the latent convolutional patterns among transcriptomic features to improve classifications. Gene 2023; 862:147246. [PMID: 36736509 DOI: 10.1016/j.gene.2023.147246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 12/22/2022] [Accepted: 01/27/2023] [Indexed: 02/04/2023]
Abstract
OMIC is a novel approach that analyses entire genetic or molecular profiles in humans and other organisms. It involves identifying and quantifying biological molecules that contribute to a species' structure, function, and dynamics. Finding the secrets of OMIC is like deciphering the biochemical code, but building data-driven models to mine the hidden phenotypic trait information has been a research hotspot. Transcriptome analysis is a popular biological technology for characterizing living systems' overall health, including cells and tissues. Individual transcript expression levels are known to be correlated with those of other transcripts. Nevertheless, most computational studies do not fully exploit these inter-feature correlations. Differential expression analyses, for example, assume that the expression levels of the transcripts are independent. Thus, we propose extracting these inter-feature correlations using the convolutional neural network (CNN) and transforming the transcriptomic features into a new space of convolutional transcriptomic (LaCOme) features. On most transcriptomic datasets in use, a series of comprehensive experiments have demonstrated that engineered LaCOme features outperform the original transcriptomic features in classification performances. Based on experimental results, OMIC data from biological samples could be further enriched using CNN to enhance computational analysis results. Also, feature rough screening can be used to extract valuable information from OMIC, regardless of the algorithm used to select features. It may always be better to create a novel feature than to keep the original. Furthermore, we investigated the feasibility of the feature construction method through cross-validation and independent verification, hoping to develop a more efficient and effective method.
Collapse
Affiliation(s)
- Hongyu Wang
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Software, Jilin University, Changchun, Jilin 130012, China
| | - Zhaomin Yao
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China
| | - Renli Luo
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China
| | - Jiahao Liu
- School of Mathematical Sciences, Chongqing Normal University, Chongqing 401331, China
| | - Zhiguo Wang
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China.
| | - Guoxu Zhang
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China.
| |
Collapse
|
29
|
Benkirane H, Pradat Y, Michiels S, Cournède PH. CustOmics: A versatile deep-learning based strategy for multi-omics integration. PLoS Comput Biol 2023; 19:e1010921. [PMID: 36877736 PMCID: PMC10019780 DOI: 10.1371/journal.pcbi.1010921] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 03/16/2023] [Accepted: 02/04/2023] [Indexed: 03/07/2023] Open
Abstract
The availability of patient cohorts with several types of omics data opens new perspectives for exploring the disease's underlying biological processes and developing predictive models. It also comes with new challenges in computational biology in terms of integrating high-dimensional and heterogeneous data in a fashion that captures the interrelationships between multiple genes and their functions. Deep learning methods offer promising perspectives for integrating multi-omics data. In this paper, we review the existing integration strategies based on autoencoders and propose a new customizable one whose principle relies on a two-phase approach. In the first phase, we adapt the training to each data source independently before learning cross-modality interactions in the second phase. By taking into account each source's singularity, we show that this approach succeeds at taking advantage of all the sources more efficiently than other strategies. Moreover, by adapting our architecture to the computation of Shapley additive explanations, our model can provide interpretable results in a multi-source setting. Using multiple omics sources from different TCGA cohorts, we demonstrate the performance of the proposed method for cancer on test cases for several tasks, such as the classification of tumor types and breast cancer subtypes, as well as survival outcome prediction. We show through our experiments the great performances of our architecture on seven different datasets with various sizes and provide some interpretations of the results obtained. Our code is available on (https://github.com/HakimBenkirane/CustOmics).
Collapse
Affiliation(s)
- Hakim Benkirane
- Université Paris-Saclay, CentraleSupélec, Lab of Mathematics and Informatics (MICS), Gif-sur-Yvette, France
- Oncostat U1018, Inserm, Université Paris-Saclay, Équipe Labellisée Ligue Contre le Cancer, CESP, Villejuif, France
| | - Yoann Pradat
- Université Paris-Saclay, CentraleSupélec, Lab of Mathematics and Informatics (MICS), Gif-sur-Yvette, France
| | - Stefan Michiels
- Oncostat U1018, Inserm, Université Paris-Saclay, Équipe Labellisée Ligue Contre le Cancer, CESP, Villejuif, France
- Bureau de Biostatistique et d’Épidémiologie, Gustave Roussy, Université Paris-Saclay, Villejuif, France
| | - Paul-Henry Cournède
- Université Paris-Saclay, CentraleSupélec, Lab of Mathematics and Informatics (MICS), Gif-sur-Yvette, France
- * E-mail:
| |
Collapse
|
30
|
Hauptmann T, Kramer S. A fair experimental comparison of neural network architectures for latent representations of multi-omics for drug response prediction. BMC Bioinformatics 2023; 24:45. [PMID: 36788531 PMCID: PMC9926634 DOI: 10.1186/s12859-023-05166-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 01/31/2023] [Indexed: 02/16/2023] Open
Abstract
BACKGROUND Recent years have seen a surge of novel neural network architectures for the integration of multi-omics data for prediction. Most of the architectures include either encoders alone or encoders and decoders, i.e., autoencoders of various sorts, to transform multi-omics data into latent representations. One important parameter is the depth of integration: the point at which the latent representations are computed or merged, which can be either early, intermediate, or late. The literature on integration methods is growing steadily, however, close to nothing is known about the relative performance of these methods under fair experimental conditions and under consideration of different use cases. RESULTS We developed a comparison framework that trains and optimizes multi-omics integration methods under equal conditions. We incorporated early integration, PCA and four recently published deep learning methods: MOLI, Super.FELT, OmiEmbed, and MOMA. Further, we devised a novel method, Omics Stacking, that combines the advantages of intermediate and late integration. Experiments were conducted on a public drug response data set with multiple omics data (somatic point mutations, somatic copy number profiles and gene expression profiles) that was obtained from cell lines, patient-derived xenografts, and patient samples. Our experiments confirmed that early integration has the lowest predictive performance. Overall, architectures that integrate triplet loss achieved the best results. Statistical differences can, overall, rarely be observed, however, in terms of the average ranks of methods, Super.FELT is consistently performing best in a cross-validation setting and Omics Stacking best in an external test set setting. CONCLUSIONS We recommend researchers to follow fair comparison protocols, as suggested in the paper. When faced with a new data set, Super.FELT is a good option in the cross-validation setting as well as Omics Stacking in the external test set setting. Statistical significances are hardly observable, despite trends in the algorithms' rankings. Future work on refined methods for transfer learning tailored for this domain may improve the situation for external test sets. The source code of all experiments is available under https://github.com/kramerlab/Multi-Omics_analysis.
Collapse
Affiliation(s)
- Tony Hauptmann
- Department of Computer Science, Johannes Gutenberg University Mainz, Mainz, Germany.
| | - Stefan Kramer
- grid.5802.f0000 0001 1941 7111Department of Computer Science, Johannes Gutenberg University Mainz, Mainz, Germany
| |
Collapse
|
31
|
Loh HW, Ooi CP, Seoni S, Barua PD, Molinari F, Acharya UR. Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011-2022). COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 226:107161. [PMID: 36228495 DOI: 10.1016/j.cmpb.2022.107161] [Citation(s) in RCA: 155] [Impact Index Per Article: 51.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 09/16/2022] [Accepted: 09/25/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND AND OBJECTIVES Artificial intelligence (AI) has branched out to various applications in healthcare, such as health services management, predictive medicine, clinical decision-making, and patient data and diagnostics. Although AI models have achieved human-like performance, their use is still limited because they are seen as a black box. This lack of trust remains the main reason for their low use in practice, especially in healthcare. Hence, explainable artificial intelligence (XAI) has been introduced as a technique that can provide confidence in the model's prediction by explaining how the prediction is derived, thereby encouraging the use of AI systems in healthcare. The primary goal of this review is to provide areas of healthcare that require more attention from the XAI research community. METHODS Multiple journal databases were thoroughly searched using PRISMA guidelines 2020. Studies that do not appear in Q1 journals, which are highly credible, were excluded. RESULTS In this review, we surveyed 99 Q1 articles covering the following XAI techniques: SHAP, LIME, GradCAM, LRP, Fuzzy classifier, EBM, CBR, rule-based systems, and others. CONCLUSION We discovered that detecting abnormalities in 1D biosignals and identifying key text in clinical notes are areas that require more attention from the XAI research community. We hope this is review will encourage the development of a holistic cloud system for a smart city.
Collapse
Affiliation(s)
- Hui Wen Loh
- School of Science and Technology, Singapore University of Social Sciences, Singapore
| | - Chui Ping Ooi
- School of Science and Technology, Singapore University of Social Sciences, Singapore
| | - Silvia Seoni
- Department of Electronics and Telecommunications, Biolab, Politecnico di Torino, Torino 10129, Italy
| | - Prabal Datta Barua
- Faculty of Engineering and Information Technology, University of Technology Sydney, Australia; School of Business (Information Systems), Faculty of Business, Education, Law & Arts, University of Southern Queensland, Australia
| | - Filippo Molinari
- Department of Electronics and Telecommunications, Biolab, Politecnico di Torino, Torino 10129, Italy
| | - U Rajendra Acharya
- School of Science and Technology, Singapore University of Social Sciences, Singapore; School of Business (Information Systems), Faculty of Business, Education, Law & Arts, University of Southern Queensland, Australia; School of Engineering, Ngee Ann Polytechnic, Singapore; Department of Bioinformatics and Medical Engineering, Asia University, Taiwan; Research Organization for Advanced Science and Technology (IROAST), Kumamoto University, Kumamoto, Japan.
| |
Collapse
|
32
|
Qin R, Mahal LK, Bojar D. Deep learning explains the biology of branched glycans from single-cell sequencing data. iScience 2022; 25:105163. [PMID: 36217547 PMCID: PMC9547197 DOI: 10.1016/j.isci.2022.105163] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Revised: 09/06/2022] [Accepted: 09/16/2022] [Indexed: 11/03/2022] Open
Abstract
Glycosylation is ubiquitous and often dysregulated in disease. However, the regulation and functional significance of various types of glycosylation at cellular levels is hard to unravel experimentally. Multi-omics, single-cell measurements such as SUGAR-seq, which quantifies transcriptomes and cell surface glycans, facilitate addressing this issue. Using SUGAR-seq data, we pioneered a deep learning model to predict the glycan phenotypes of cells (mouse T lymphocytes) from transcripts, with the example of predicting β1,6GlcNAc-branching across T cell subtypes (test set F1 score: 0.9351). Model interpretation via SHAP (SHapley Additive exPlanations) identified highly predictive genes, in part known to impact (i) branched glycan levels and (ii) the biology of branched glycans. These genes included physiologically relevant low-abundance genes that were not captured by conventional differential expression analysis. Our work shows that interpretable deep learning models are promising for uncovering novel functions and regulatory mechanisms of glycans from integrated transcriptomic and glycomic datasets.
Collapse
Affiliation(s)
- Rui Qin
- Department of Chemistry, University of Alberta, Edmonton, AB T6G 2G2, Canada
| | - Lara K. Mahal
- Department of Chemistry, University of Alberta, Edmonton, AB T6G 2G2, Canada
| | - Daniel Bojar
- Department of Chemistry and Molecular Biology, University of Gothenburg, 405 30 Gothenburg, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, 405 30 Gothenburg, Sweden
| |
Collapse
|
33
|
Tsimenidis S, Vrochidou E, Papakostas GA. Omics Data and Data Representations for Deep Learning-Based Predictive Modeling. Int J Mol Sci 2022; 23:12272. [PMID: 36293133 PMCID: PMC9603455 DOI: 10.3390/ijms232012272] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 10/03/2022] [Accepted: 10/12/2022] [Indexed: 11/25/2022] Open
Abstract
Medical discoveries mainly depend on the capability to process and analyze biological datasets, which inundate the scientific community and are still expanding as the cost of next-generation sequencing technologies is decreasing. Deep learning (DL) is a viable method to exploit this massive data stream since it has advanced quickly with there being successive innovations. However, an obstacle to scientific progress emerges: the difficulty of applying DL to biology, and this because both fields are evolving at a breakneck pace, thus making it hard for an individual to occupy the front lines of both of them. This paper aims to bridge the gap and help computer scientists bring their valuable expertise into the life sciences. This work provides an overview of the most common types of biological data and data representations that are used to train DL models, with additional information on the models themselves and the various tasks that are being tackled. This is the essential information a DL expert with no background in biology needs in order to participate in DL-based research projects in biomedicine, biotechnology, and drug discovery. Alternatively, this study could be also useful to researchers in biology to understand and utilize the power of DL to gain better insights into and extract important information from the omics data.
Collapse
Affiliation(s)
| | | | - George A. Papakostas
- MLV Research Group, Department of Computer Science, International Hellenic University, 65404 Kavala, Greece
| |
Collapse
|
34
|
A novel liver cancer diagnosis method based on patient similarity network and DenseGCN. Sci Rep 2022; 12:6797. [PMID: 35474072 PMCID: PMC9043215 DOI: 10.1038/s41598-022-10441-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 04/05/2022] [Indexed: 11/17/2022] Open
Abstract
Liver cancer is the main malignancy in terms of mortality rate, accurate diagnosis can help the treatment outcome of liver cancer. Patient similarity network is an important information which helps in cancer diagnosis. However, recent works rarely take patient similarity into consideration. To address this issue, we constructed patient similarity network using three liver cancer omics data, and proposed a novel liver cancer diagnosis method consisted of similarity network fusion, denoising autoencoder and dense graph convolutional neural network to capitalize on patient similarity network and multi omics data. We compared our proposed method with other state-of-the-art methods and machine learning methods on TCGA-LIHC dataset to evaluate its performance. The results confirmed that our proposed method surpasses these comparison methods in terms of all the metrics. Especially, our proposed method has attained an accuracy up to 0.9857.
Collapse
|