1
|
Fu HY, Liu YY, Zhang MY, Yang HX. Enrichment Analysis and Deep Learning in Biomedical Ontology: Applications and Advancements. CHINESE MEDICAL SCIENCES JOURNAL = CHUNG-KUO I HSUEH K'O HSUEH TSA CHIH 2025; 40:45-56. [PMID: 40164517 DOI: 10.24920/004464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Biomedical big data, characterized by its massive scale, multi-dimensionality, and heterogeneity, offers novel perspectives for disease research, elucidates biological principles, and simultaneously prompts changes in related research methodologies. Biomedical ontology, as a shared formal conceptual system, not only offers standardized terms for multi-source biomedical data but also provides a solid data foundation and framework for biomedical research. In this review, we summarize enrichment analysis and deep learning for biomedical ontology based on its structure and semantic annotation properties, highlighting how technological advancements are enabling the more comprehensive use of ontology information. Enrichment analysis represents an important application of ontology to elucidate the potential biological significance for a particular molecular list. Deep learning, on the other hand, represents an increasingly powerful analytical tool that can be more widely combined with ontology for analysis and prediction. With the continuous evolution of big data technologies, the integration of these technologies with biomedical ontologies is opening up exciting new possibilities for advancing biomedical research.
Collapse
Affiliation(s)
- Hong-Yu Fu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yang-Yang Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Mei-Yi Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Hai-Xiu Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| |
Collapse
|
2
|
Creux C, Zehraoui F, Radvanyi F, Tahi F. MMnc: multi-modal interpretable representation for non-coding RNA classification and class annotation. BIOINFORMATICS (OXFORD, ENGLAND) 2025; 41:btaf051. [PMID: 39891346 PMCID: PMC11890286 DOI: 10.1093/bioinformatics/btaf051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Revised: 01/16/2025] [Accepted: 01/29/2025] [Indexed: 02/03/2025]
Abstract
MOTIVATION As the biological roles and disease implications of non-coding RNAs continue to emerge, the need to thoroughly characterize previously unexplored non-coding RNAs becomes increasingly urgent. These molecules hold potential as biomarkers and therapeutic targets. However, the vast and complex nature of non-coding RNAs data presents a challenge. We introduce MMnc, an interpretable deep-learning approach designed to classify non-coding RNAs into functional groups. MMnc leverages multiple data sources-such as the sequence, secondary structure, and expression-using attention-based multi-modal data integration. This ensures the learning of meaningful representations while accounting for missing sources in some samples. RESULTS Our findings demonstrate that MMnc achieves high classification accuracy across diverse non-coding RNA classes. The method's modular architecture allows for the consideration of multiple types of modalities, whereas other tools only consider one or two at most. MMnc is resilient to missing data, ensuring that all available information is effectively utilized. Importantly, the generated attention scores offer interpretable insights into the underlying patterns of the different non-coding RNA classes, potentially driving future non-coding RNA research and applications. AVAILABILITY AND IMPLEMENTATION Data and source code can be found at EvryRNA.ibisc.univ-evry.fr/EvryRNA/MMnc.
Collapse
Affiliation(s)
- Constance Creux
- Université Paris-Saclay, Univ Evry, IBISC, Evry-Courcouronnes 91020, France
- Molecular Oncology, PSL Research University, CNRS, UMR 144, Institut Curie, Paris 75248, France
| | - Farida Zehraoui
- Université Paris-Saclay, Univ Evry, IBISC, Evry-Courcouronnes 91020, France
| | - François Radvanyi
- Molecular Oncology, PSL Research University, CNRS, UMR 144, Institut Curie, Paris 75248, France
| | - Fariza Tahi
- Université Paris-Saclay, Univ Evry, IBISC, Evry-Courcouronnes 91020, France
| |
Collapse
|
3
|
Xu Z, Liao H, Huang L, Chen Q, Lan W, Li S. IBPGNET: lung adenocarcinoma recurrence prediction based on neural network interpretability. Brief Bioinform 2024; 25:bbae080. [PMID: 38557672 PMCID: PMC10982951 DOI: 10.1093/bib/bbae080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/31/2024] [Accepted: 02/07/2024] [Indexed: 04/04/2024] Open
Abstract
Lung adenocarcinoma (LUAD) is the most common histologic subtype of lung cancer. Early-stage patients have a 30-50% probability of metastatic recurrence after surgical treatment. Here, we propose a new computational framework, Interpretable Biological Pathway Graph Neural Networks (IBPGNET), based on pathway hierarchy relationships to predict LUAD recurrence and explore the internal regulatory mechanisms of LUAD. IBPGNET can integrate different omics data efficiently and provide global interpretability. In addition, our experimental results show that IBPGNET outperforms other classification methods in 5-fold cross-validation. IBPGNET identified PSMC1 and PSMD11 as genes associated with LUAD recurrence, and their expression levels were significantly higher in LUAD cells than in normal cells. The knockdown of PSMC1 and PSMD11 in LUAD cells increased their sensitivity to afatinib and decreased cell migration, invasion and proliferation. In addition, the cells showed significantly lower EGFR expression, indicating that PSMC1 and PSMD11 may mediate therapeutic sensitivity through EGFR expression.
Collapse
Affiliation(s)
- Zhanyu Xu
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi Zhuang Autonomous Region 530021, China
| | - Haibo Liao
- School of computer, Electronic and Information, Guangxi University, Nanning, Guangxi Zhuang Autonomous Region 530021, China
| | - Liuliu Huang
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi Zhuang Autonomous Region 530021, China
| | - Qingfeng Chen
- School of computer, Electronic and Information, Guangxi University, Nanning, Guangxi Zhuang Autonomous Region 530021, China
| | - Wei Lan
- School of computer, Electronic and Information, Guangxi University, Nanning, Guangxi Zhuang Autonomous Region 530021, China
| | - Shikang Li
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi Zhuang Autonomous Region 530021, China
| |
Collapse
|
4
|
Sousa RT, Silva S, Pesquita C. Explaining protein-protein interactions with knowledge graph-based semantic similarity. Comput Biol Med 2024; 170:108076. [PMID: 38308873 DOI: 10.1016/j.compbiomed.2024.108076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 12/11/2023] [Accepted: 01/27/2024] [Indexed: 02/05/2024]
Abstract
The application of artificial intelligence and machine learning methods for several biomedical applications, such as protein-protein interaction prediction, has gained significant traction in recent decades. However, explainability is a key aspect of using machine learning as a tool for scientific discovery. Explainable artificial intelligence approaches help clarify algorithmic mechanisms and identify potential bias in the data. Given the complexity of the biomedical domain, explanations should be grounded in domain knowledge which can be achieved by using ontologies and knowledge graphs. These knowledge graphs express knowledge about a domain by capturing different perspectives of the representation of real-world entities. However, the most popular way to explore knowledge graphs with machine learning is through using embeddings, which are not explainable. As an alternative, knowledge graph-based semantic similarity offers the advantage of being explainable. Additionally, similarity can be computed to capture different semantic aspects within the knowledge graph and increasing the explainability of predictive approaches. We propose a novel method to generate explainable vector representations, KGsim2vec, that uses aspect-oriented semantic similarity features to represent pairs of entities in a knowledge graph. Our approach employs a set of machine learning models, including decision trees, genetic programming, random forest and eXtreme gradient boosting, to predict relations between entities. The experiments reveal that considering multiple semantic aspects when representing the similarity between two entities improves explainability and predictive performance. KGsim2vec performs better than black-box methods based on knowledge graph embeddings or graph neural networks. Moreover, KGsim2vec produces global models that can capture biological phenomena and elucidate data biases.
Collapse
Affiliation(s)
- Rita T Sousa
- LASIGE, Faculdade de Ciências da Universidade de Lisboa, Lisboa, Portugal.
| | - Sara Silva
- LASIGE, Faculdade de Ciências da Universidade de Lisboa, Lisboa, Portugal
| | - Catia Pesquita
- LASIGE, Faculdade de Ciências da Universidade de Lisboa, Lisboa, Portugal
| |
Collapse
|
5
|
Esser-Skala W, Fortelny N. Reliable interpretability of biology-inspired deep neural networks. NPJ Syst Biol Appl 2023; 9:50. [PMID: 37816807 PMCID: PMC10564878 DOI: 10.1038/s41540-023-00310-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 09/15/2023] [Indexed: 10/12/2023] Open
Abstract
Deep neural networks display impressive performance but suffer from limited interpretability. Biology-inspired deep learning, where the architecture of the computational graph is based on biological knowledge, enables unique interpretability where real-world concepts are encoded in hidden nodes, which can be ranked by importance and thereby interpreted. In such models trained on single-cell transcriptomes, we previously demonstrated that node-level interpretations lack robustness upon repeated training and are influenced by biases in biological knowledge. Similar studies are missing for related models. Here, we test and extend our methodology for reliable interpretability in P-NET, a biology-inspired model trained on patient mutation data. We observe variability of interpretations and susceptibility to knowledge biases, and identify the network properties that drive interpretation biases. We further present an approach to control the robustness and biases of interpretations, which leads to more specific interpretations. In summary, our study reveals the broad importance of methods to ensure robust and bias-aware interpretability in biology-inspired deep learning.
Collapse
Affiliation(s)
- Wolfgang Esser-Skala
- Computational Systems Biology Group, Department of Biosciences and Medical Biology, University of Salzburg, Hellbrunner Straße 34, 5020, Salzburg, Austria
| | - Nikolaus Fortelny
- Computational Systems Biology Group, Department of Biosciences and Medical Biology, University of Salzburg, Hellbrunner Straße 34, 5020, Salzburg, Austria.
| |
Collapse
|
6
|
Cheng KP, Shen WX, Jiang YY, Chen Y, Chen YZ, Tan Y. Deep learning of 2D-Restructured gene expression representations for improved low-sample therapeutic response prediction. Comput Biol Med 2023; 164:107245. [PMID: 37480677 DOI: 10.1016/j.compbiomed.2023.107245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 06/27/2023] [Accepted: 07/07/2023] [Indexed: 07/24/2023]
Abstract
Clinical outcome prediction is important for stratified therapeutics. Machine learning (ML) and deep learning (DL) methods facilitate therapeutic response prediction from transcriptomic profiles of cells and clinical samples. Clinical transcriptomic DL is challenged by the low-sample sizes (34-286 subjects), high-dimensionality (up to 21,653 genes) and unordered nature of clinical transcriptomic data. The established methods rely on ML algorithms at accuracy levels of 0.6-0.8 AUC/ACC values. Low-sample DL algorithms are needed for enhanced prediction capability. Here, an unsupervised manifold-guided algorithm was employed for restructuring transcriptomic data into ordered image-like 2D-representations, followed by efficient DL of these 2D-representations with deep ConvNets. Our DL models significantly outperformed the state-of-the-art (SOTA) ML models on 82% of 17 low-sample benchmark datasets (53% with >0.05 AUC/ACC improvement). They are more robust than the SOTA models in cross-cohort prediction tasks, and in identifying robust biomarkers and response-dependent variational patterns consistent with experimental indications.
Collapse
Affiliation(s)
- Kai Ping Cheng
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, Shenzhen, 518132, PR China
| | - Wan Xiang Shen
- Bioinformatics and Drug Design Group, Department of Pharmacy, Center for Computational Science and Engineering, National University of Singapore, 117543, Singapore
| | - Yu Yang Jiang
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, PR China
| | - Yan Chen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China
| | - Yu Zong Chen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, Shenzhen, 518132, PR China.
| | - Ying Tan
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; The Institute of Drug Discovery Technology, Ningbo University, Ningbo, 315211, PR China; Shenzhen Kivita Innovative Drug Discovery Institute, Shenzhen, 518110, PR China.
| |
Collapse
|
7
|
Creux C, Zehraoui F, Hanczar B, Tahi F. A3SOM, abstained explainable semi-supervised neural network based on self-organizing map. PLoS One 2023; 18:e0286137. [PMID: 37228138 DOI: 10.1371/journal.pone.0286137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 05/09/2023] [Indexed: 05/27/2023] Open
Abstract
In the sea of data generated daily, unlabeled samples greatly outnumber labeled ones. This is due to the fact that, in many application areas, labels are scarce or hard to obtain. In addition, unlabeled samples might belong to new classes that are not available in the label set associated with data. In this context, we propose A3SOM, an abstained explainable semi-supervised neural network that associates a self-organizing map to dense layers in order to classify samples. Abstained classification enables the detection of new classes and class overlaps. The use of a self-organizing map in A3SOM allows integrated visualization and makes the model explainable. Along with describing our approach, this paper shows that the method is competitive with other classifiers and demonstrates the benefits of including abstention rules. A use case is presented on breast cancer subtype classification and discovery to show the relevance of our method in real-world medical problems.
Collapse
Affiliation(s)
- Constance Creux
- Univ Evry, IBISC, Université Paris-Saclay, Evry-Courcouronnes, France
| | - Farida Zehraoui
- Univ Evry, IBISC, Université Paris-Saclay, Evry-Courcouronnes, France
| | - Blaise Hanczar
- Univ Evry, IBISC, Université Paris-Saclay, Evry-Courcouronnes, France
| | - Fariza Tahi
- Univ Evry, IBISC, Université Paris-Saclay, Evry-Courcouronnes, France
| |
Collapse
|
8
|
Lisboa P, Saralajew S, Vellido A, Fernández-Domenech R, Villmann T. The Coming of Age of Interpretable and Explainable Machine Learning Models. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.02.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2023]
|
9
|
Mohammed MA, Abdulkareem KH, Dinar AM, Zapirain BG. Rise of Deep Learning Clinical Applications and Challenges in Omics Data: A Systematic Review. Diagnostics (Basel) 2023; 13:664. [PMID: 36832152 PMCID: PMC9955380 DOI: 10.3390/diagnostics13040664] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 02/05/2023] [Accepted: 02/07/2023] [Indexed: 02/12/2023] Open
Abstract
This research aims to review and evaluate the most relevant scientific studies about deep learning (DL) models in the omics field. It also aims to realize the potential of DL techniques in omics data analysis fully by demonstrating this potential and identifying the key challenges that must be addressed. Numerous elements are essential for comprehending numerous studies by surveying the existing literature. For example, the clinical applications and datasets from the literature are essential elements. The published literature highlights the difficulties encountered by other researchers. In addition to looking for other studies, such as guidelines, comparative studies, and review papers, a systematic approach is used to search all relevant publications on omics and DL using different keyword variants. From 2018 to 2022, the search procedure was conducted on four Internet search engines: IEEE Xplore, Web of Science, ScienceDirect, and PubMed. These indexes were chosen because they offer enough coverage and linkages to numerous papers in the biological field. A total of 65 articles were added to the final list. The inclusion and exclusion criteria were specified. Of the 65 publications, 42 are clinical applications of DL in omics data. Furthermore, 16 out of 65 articles comprised the review publications based on single- and multi-omics data from the proposed taxonomy. Finally, only a small number of articles (7/65) were included in papers focusing on comparative analysis and guidelines. The use of DL in studying omics data presented several obstacles related to DL itself, preprocessing procedures, datasets, model validation, and testbed applications. Numerous relevant investigations were performed to address these issues. Unlike other review papers, our study distinctly reflects different observations on omics with DL model areas. We believe that the result of this study can be a useful guideline for practitioners who look for a comprehensive view of the role of DL in omics data analysis.
Collapse
Affiliation(s)
- Mazin Abed Mohammed
- College of Computer Science and Information Technology, University of Anbar, Anbar 31001, Iraq
- eVIDA Lab, University of Deusto, 48007 Bilbao, Spain
| | - Karrar Hameed Abdulkareem
- College of Agriculture, Al-Muthanna University, Samawah 66001, Iraq
- College of Engineering, University of Warith Al-Anbiyaa, Karbala 56001, Iraq
| | - Ahmed M. Dinar
- Computer Engineering Department, University of Technology- Iraq, Baghdad 19006, Iraq
| | | |
Collapse
|