1
|
Li J, Lu X, Jiang K, Tang D, Ning B, Sun F. TARSL: Triple-Attention Cross-Network Representation Learning to Predict Synthetic Lethality for Anti-Cancer Drug Discovery. IEEE J Biomed Health Inform 2025; 29:1680-1691. [PMID: 37603479 DOI: 10.1109/jbhi.2023.3306768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2023]
Abstract
Cancer is a multifaceted disease that results from co-mutations of multi biological molecules. A promising strategy for cancer therapy involves in exploiting the phenomenon of Synthetic Lethality (SL) by targeting the SL partner of cancer gene. Since traditional methods for SL prediction suffer from high-cost, time-consuming and off-targets effects, computational approaches have been efficient complementary to these methods. Most of existing approaches treat SL associations as independent of other biological interaction networks, and fail to consider other information from various biological networks. Despite some approaches have integrated different networks to capture multi-modal features of genes for SL prediction, these methods implicitly assume that all sources and levels of information contribute equally to the SL associations. As such, a comprehensive and flexible framework for learning gene cross-network representations for SL prediction is still lacking. In this work, we present a novel Triple-Attention cross-network Representation learning for SL prediction (TARSL) by capturing molecular features from heterogeneous sources. We employ three-level attention modules to consider the different contribution of multi-level information. In particular, feature-level attention can capture the correlations between molecular feature and network link, node-level attention can differentiate the importance of various neighbors, and network-level attention can concentrate on important network and reduce the effects of irrelated networks. We perform comprehensive experiments on human SL datasets and these results have proven that our model is consistently superior to baseline methods and predicted SL associations could aid in designing anti-cancer drugs.
Collapse
|
2
|
Rajan S, Schwarz E. Network-based artificial intelligence approaches for advancing personalized psychiatry. Am J Med Genet B Neuropsychiatr Genet 2024; 195:e32997. [PMID: 39031613 DOI: 10.1002/ajmg.b.32997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 05/24/2024] [Accepted: 06/06/2024] [Indexed: 07/22/2024]
Abstract
Psychiatric disorders have a complex biological underpinning likely involving an interplay of genetic and environmental risk contributions. Substantial efforts are being made to use artificial intelligence approaches to integrate features within and across data types to increase our etiological understanding and advance personalized psychiatry. Network science offers a conceptual framework for exploring the often complex relationships across different levels of biological organization, from cellular mechanistic to brain-functional and phenotypic networks. Utilizing such network information effectively as part of artificial intelligence approaches is a promising route toward a more in-depth understanding of illness biology, the deciphering of patient heterogeneity, and the identification of signatures that may be sufficiently predictive to be clinically useful. Here, we present examples of how network information has been used as part of artificial intelligence within psychiatry and beyond and outline future perspectives on how personalized psychiatry approaches may profit from a closer integration of psychiatric research, artificial intelligence development, and network science.
Collapse
Affiliation(s)
- Sivanesan Rajan
- Hector Institute for Artificial Intelligence in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
| | - Emanuel Schwarz
- Hector Institute for Artificial Intelligence in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
- German Center for Mental Health (DZPG), partner site Mannheim-Heidelberg-Ulm, Mannheim, Germany
| |
Collapse
|
3
|
Liang H, Luo H, Sang Z, Jia M, Jiang X, Wang Z, Cong S, Yao X. GREMI: An Explainable Multi-Omics Integration Framework for Enhanced Disease Prediction and Module Identification. IEEE J Biomed Health Inform 2024; 28:6983-6996. [PMID: 39110558 DOI: 10.1109/jbhi.2024.3439713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/10/2024]
Abstract
Multi-omics integration has demonstrated promising performance in complex disease prediction. However, existing research typically focuses on maximizing prediction accuracy, while often neglecting the essential task of discovering meaningful biomarkers. This issue is particularly important in biomedicine, as molecules often interact rather than function individually to influence disease outcomes. To this end, we propose a two-phase framework named GREMI to assist multi-omics classification and explanation. In the prediction phase, we propose to improve prediction performance by employing a graph attention architecture on sample-wise co-functional networks to incorporate biomolecular interaction information for enhanced feature representation, followed by the integration of a joint-late mixed strategy and the true-class-probability block to adaptively evaluate classification confidence at both feature and omics levels. In the interpretation phase, we propose a multi-view approach to explain disease outcomes from the interaction module perspective, providing a more intuitive understanding and biomedical rationale. We incorporate Monte Carlo tree search (MCTS) to explore local-view subgraphs and pinpoint modules that highly contribute to disease characterization from the global-view. Extensive experiments demonstrate that the proposed framework outperforms state-of-the-art methods in seven different classification tasks, and our model effectively addresses data mutual interference when the number of omics types increases. We further illustrate the functional- and disease-relevance of the identified modules, as well as validate the classification performance of discovered modules using an independent cohort.
Collapse
|
4
|
van Hilten A, Katz S, Saccenti E, Niessen WJ, Roshchupkin GV. Designing interpretable deep learning applications for functional genomics: a quantitative analysis. Brief Bioinform 2024; 25:bbae449. [PMID: 39293804 PMCID: PMC11410376 DOI: 10.1093/bib/bbae449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 08/07/2024] [Accepted: 08/28/2024] [Indexed: 09/20/2024] Open
Abstract
Deep learning applications have had a profound impact on many scientific fields, including functional genomics. Deep learning models can learn complex interactions between and within omics data; however, interpreting and explaining these models can be challenging. Interpretability is essential not only to help progress our understanding of the biological mechanisms underlying traits and diseases but also for establishing trust in these model's efficacy for healthcare applications. Recognizing this importance, recent years have seen the development of numerous diverse interpretability strategies, making it increasingly difficult to navigate the field. In this review, we present a quantitative analysis of the challenges arising when designing interpretable deep learning solutions in functional genomics. We explore design choices related to the characteristics of genomics data, the neural network architectures applied, and strategies for interpretation. By quantifying the current state of the field with a predefined set of criteria, we find the most frequent solutions, highlight exceptional examples, and identify unexplored opportunities for developing interpretable deep learning models in genomics.
Collapse
Affiliation(s)
- Arno van Hilten
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| | - Sonja Katz
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Wiro J Niessen
- Department of Imaging Physics, Delft University of Technology, 2628 CD Delft, The Netherlands
| | - Gennady V Roshchupkin
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Department of Epidemiology, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| |
Collapse
|
5
|
Ouyang W, Peng Q, Lai Z, Huang H, Huang Z, Xie X, Lin R, Wang Z, Yao H, Yu Y. Synergistic role of activated CD4 + memory T cells and CXCL13 in augmenting cancer immunotherapy efficacy. Heliyon 2024; 10:e27151. [PMID: 38495207 PMCID: PMC10943356 DOI: 10.1016/j.heliyon.2024.e27151] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 02/13/2024] [Accepted: 02/26/2024] [Indexed: 03/19/2024] Open
Abstract
The development of immune checkpoint inhibitors (ICIs) has significantly advanced cancer treatment. However, their efficacy is not consistent across all patients, underscoring the need for personalized approaches. In this study, we examined the relationship between activated CD4+ memory T cell expression and ICI responsiveness. A notable correlation was observed between increased activated CD4+ memory T cell expression and better patient survival in various cohorts. Additionally, the chemokine CXCL13 was identified as a potential prognostic biomarker, with higher expression levels associated with improved outcomes. Further analysis highlighted CXCL13's role in influencing the Tumor Microenvironment, emphasizing its relevance in tumor immunity. Using these findings, we developed a deep learning model by the Multi-Layer Aggregation Graph Neural Network method. This model exhibited promise in predicting ICI treatment efficacy, suggesting its potential application in clinical practice.
Collapse
Affiliation(s)
- Wenhao Ouyang
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Department of Medicine Oncology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| | - Qing Peng
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Department of Medicine Oncology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| | - Zijia Lai
- Clinical Medicine College, Guangdong Medical University, Zhanjiang, China
| | - Hong Huang
- Clinical Medicine College, Guilin Medical University, Guilin, China
| | - Zhenjun Huang
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Department of Medicine Oncology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| | - Xinxin Xie
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Department of Medicine Oncology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| | - Ruichong Lin
- Faculty of Medicine, Macau University of Science and Technology, Taipa, Macao, China
| | - Zehua Wang
- Faculty of Medicine, Macau University of Science and Technology, Taipa, Macao, China
| | - Herui Yao
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Department of Medicine Oncology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| | - Yunfang Yu
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Department of Medicine Oncology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
- Faculty of Medicine, Macau University of Science and Technology, Taipa, Macao, China
| |
Collapse
|
6
|
Luo H, Liang H, Liu H, Fan Z, Wei Y, Yao X, Cong S. TEMINET: A Co-Informative and Trustworthy Multi-Omics Integration Network for Diagnostic Prediction. Int J Mol Sci 2024; 25:1655. [PMID: 38338932 PMCID: PMC10855161 DOI: 10.3390/ijms25031655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 01/20/2024] [Accepted: 01/26/2024] [Indexed: 02/12/2024] Open
Abstract
Advancing the domain of biomedical investigation, integrated multi-omics data have shown exceptional performance in elucidating complex human diseases. However, as the variety of omics information expands, precisely perceiving the informativeness of intra- and inter-omics becomes challenging due to the intricate interrelations, thus presenting significant challenges in the integration of multi-omics data. To address this, we introduce a novel multi-omics integration approach, referred to as TEMINET. This approach enhances diagnostic prediction by leveraging an intra-omics co-informative representation module and a trustworthy learning strategy used to address inter-omics fusion. Considering the multifactorial nature of complex diseases, TEMINET utilizes intra-omics features to construct disease-specific networks; then, it applies graph attention networks and a multi-level framework to capture more collective informativeness than pairwise relations. To perceive the contribution of co-informative representations within intra-omics, we designed a trustworthy learning strategy to identify the reliability of each omics in integration. To integrate inter-omics information, a combined-beliefs fusion approach is deployed to harmonize the trustworthy representations of different omics types effectively. Our experiments across four different diseases using mRNA, methylation, and miRNA data demonstrate that TEMINET achieves advanced performance and robustness in classification tasks.
Collapse
Affiliation(s)
- Haoran Luo
- Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China; (H.L.); (Z.F.)
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Hong Liang
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Hongwei Liu
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Zhoujie Fan
- Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China; (H.L.); (Z.F.)
| | - Yanhui Wei
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Xiaohui Yao
- Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China; (H.L.); (Z.F.)
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Shan Cong
- Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China; (H.L.); (Z.F.)
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| |
Collapse
|
7
|
Galluzzo Y. A comprehensive review of the data and knowledge graphs approaches in bioinformatics. COMPUTER SCIENCE AND INFORMATION SYSTEMS 2024; 21:1055-1075. [DOI: 10.2298/csis230530027g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
Abstract
The scientific community is currently showing strong interest in constructing knowledge graphs from heterogeneous domains (genomic, pharmaceutical, clinical etc.). The main goal here is to support researchers in gaining an immediate overview of the biomedical and clinical data that can be utilized to construct and extend KGs. A in-depth overview of the available biomedical data and the latest applications of knowledge graphs, from the biological to the clinical context, is provided showing the most recent methods of representing biomedical knowledge with embeddings (KGEs). Furthermore, this review, differentiates biomedical databases based on their construction process (whether manually curated by experts or not), aiming to offer a detailed overview and guide researchers in selecting the appropriate database for their research considering to the specific project needs, available resources, and data complexity. In conclusion, the review highlights current challenges: integration of different knowledge graphs and the interpretability of predictions of new relations.
Collapse
|
8
|
Xing X, Zhu M, Chen Z, Yuan Y. Comprehensive learning and adaptive teaching: Distilling multi-modal knowledge for pathological glioma grading. Med Image Anal 2024; 91:102990. [PMID: 37864912 DOI: 10.1016/j.media.2023.102990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 08/28/2023] [Accepted: 10/02/2023] [Indexed: 10/23/2023]
Abstract
The fusion of multi-modal data, e.g., pathology slides and genomic profiles, can provide complementary information and benefit glioma grading. However, genomic profiles are difficult to obtain due to the high costs and technical challenges, thus limiting the clinical applications of multi-modal diagnosis. In this work, we investigate the realistic problem where paired pathology-genomic data are available during training, while only pathology slides are accessible for inference. To solve this problem, a comprehensive learning and adaptive teaching framework is proposed to improve the performance of pathological grading models by transferring the privileged knowledge from the multi-modal teacher to the pathology student. For comprehensive learning of the multi-modal teacher, we propose a novel Saliency-Aware Masking (SA-Mask) strategy to explore richer disease-related features from both modalities by masking the most salient features. For adaptive teaching of the pathology student, we first devise a Local Topology Preserving and Discrepancy Eliminating Contrastive Distillation (TDC-Distill) module to align the feature distributions of the teacher and student models. Furthermore, considering the multi-modal teacher may include incorrect information, we propose a Gradient-guided Knowledge Refinement (GK-Refine) module that builds a knowledge bank and adaptively absorbs the reliable knowledge according to their agreement in the gradient space. Experiments on the TCGA GBM-LGG dataset show that our proposed distillation framework improves the pathological glioma grading and outperforms other KD methods. Notably, with the sole pathology slides, our method achieves comparable performance with existing multi-modal methods. The code is available at https://github.com/CUHK-AIM-Group/MultiModal-learning.
Collapse
Affiliation(s)
- Xiaohan Xing
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong Special Administrative Region, China; Department of Radiation Oncology, Stanford University, USA
| | - Meilu Zhu
- Department of Mechanical Engineering, City University of Hong Kong, Hong Kong Special Administrative Region, China
| | - Zhen Chen
- Centre for Artificial Intelligence and Robotics (CAIR), Hong Kong Institute of Science & Innovation, Chinese Academy of Sciences, China
| | - Yixuan Yuan
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong Special Administrative Region, China; Department of Electronic Engineering, The Chinese University of Hong Kong; CUHK Shenzhen Research institute, China.
| |
Collapse
|
9
|
Jacobson DH, Pan S, Fisher J, Secrier M. Multi-scale characterisation of homologous recombination deficiency in breast cancer. Genome Med 2023; 15:90. [PMID: 37919776 PMCID: PMC10621207 DOI: 10.1186/s13073-023-01239-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Accepted: 09/26/2023] [Indexed: 11/04/2023] Open
Abstract
BACKGROUND Homologous recombination is a robust, broadly error-free mechanism of double-strand break repair, and deficiencies lead to PARP inhibitor sensitivity. Patients displaying homologous recombination deficiency can be identified using 'mutational signatures'. However, these patterns are difficult to reliably infer from exome sequencing. Additionally, as mutational signatures are a historical record of mutagenic processes, this limits their utility in describing the current status of a tumour. METHODS We apply two methods for characterising homologous recombination deficiency in breast cancer to explore the features and heterogeneity associated with this phenotype. We develop a likelihood-based method which leverages small insertions and deletions for high-confidence classification of homologous recombination deficiency for exome-sequenced breast cancers. We then use multinomial elastic net regression modelling to develop a transcriptional signature of heterogeneous homologous recombination deficiency. This signature is then applied to single-cell RNA-sequenced breast cancer cohorts enabling analysis of homologous recombination deficiency heterogeneity and differential patterns of tumour microenvironment interactivity. RESULTS We demonstrate that the inclusion of indel events, even at low levels, improves homologous recombination deficiency classification. Whilst BRCA-positive homologous recombination deficient samples display strong similarities to those harbouring BRCA1/2 defects, they appear to deviate in microenvironmental features such as hypoxic signalling. We then present a 228-gene transcriptional signature which simultaneously characterises homologous recombination deficiency and BRCA1/2-defect status, and is associated with PARP inhibitor response. Finally, we show that this signature is applicable to single-cell transcriptomics data and predict that these cells present a distinct milieu of interactions with their microenvironment compared to their homologous recombination proficient counterparts, typified by a decreased cancer cell response to TNFα signalling. CONCLUSIONS We apply multi-scale approaches to characterise homologous recombination deficiency in breast cancer through the development of mutational and transcriptional signatures. We demonstrate how indels can improve homologous recombination deficiency classification in exome-sequenced breast cancers. Additionally, we demonstrate the heterogeneity of homologous recombination deficiency, especially in relation to BRCA1/2-defect status, and show that indications of this feature can be captured at a single-cell level, enabling further investigations into interactions between DNA repair deficient cells and their tumour microenvironment.
Collapse
Affiliation(s)
- Daniel H Jacobson
- UCL Genetics Institute, Department of Genetics, Evolution and Environment, University College London, Gower Street, London, WC1E 6BT, UK
- UCL Cancer Institute, University College London, Paul O'Gorman Building, 72 Huntley Street, London, WC1E 6BT, UK
| | - Shi Pan
- UCL Genetics Institute, Department of Genetics, Evolution and Environment, University College London, Gower Street, London, WC1E 6BT, UK
| | - Jasmin Fisher
- UCL Cancer Institute, University College London, Paul O'Gorman Building, 72 Huntley Street, London, WC1E 6BT, UK
| | - Maria Secrier
- UCL Genetics Institute, Department of Genetics, Evolution and Environment, University College London, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
10
|
Gong H, Zhang Y, Dong C, Wang Y, Chen G, Liang B, Li H, Liu L, Xu J, Li G. Unbiased curriculum learning enhanced global-local graph neural network for protein thermodynamic stability prediction. Bioinformatics 2023; 39:btad589. [PMID: 37740312 PMCID: PMC10918760 DOI: 10.1093/bioinformatics/btad589] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 08/04/2023] [Accepted: 09/21/2023] [Indexed: 09/24/2023] Open
Abstract
MOTIVATION Proteins play crucial roles in biological processes, with their functions being closely tied to thermodynamic stability. However, measuring stability changes upon point mutations of amino acid residues using physical methods can be time-consuming. In recent years, several computational methods for protein thermodynamic stability prediction (PTSP) based on deep learning have emerged. Nevertheless, these approaches either overlook the natural topology of protein structures or neglect the inherent noisy samples resulting from theoretical calculation or experimental errors. RESULTS We propose a novel Global-Local Graph Neural Network powered by Unbiased Curriculum Learning for the PTSP task. Our method first builds a Siamese graph neural network to extract protein features before and after mutation. Since the graph's topological changes stem from local node mutations, we design a local feature transformation module to make the model focus on the mutated site. To address model bias caused by noisy samples, which represent unavoidable errors from physical experiments, we introduce an unbiased curriculum learning method. This approach effectively identifies and re-weights noisy samples during the training process. Extensive experiments demonstrate that our proposed method outperforms advanced protein stability prediction methods, and surpasses state-of-the-art learning methods for regression prediction tasks. AVAILABILITY AND IMPLEMENTATION All code and data is available at https://github.com/haifangong/UCL-GLGNN.
Collapse
Affiliation(s)
- Haifan Gong
- Shanghai Artificial Intelligence Laboratory, Shanghai 200000, China
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
- SRIBD, Chinese University of Hong Kong (Shenzhen), Shenzhen 518000, China
| | - Yumeng Zhang
- Shanghai Jiao Tong University, Shanghai 200000, China
| | - Chenhe Dong
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Yue Wang
- Qilu Hospital, Shandong University, Shandong 250000, China
| | - Guanqi Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Bilin Liang
- Shanghai Artificial Intelligence Laboratory, Shanghai 200000, China
| | - Haofeng Li
- SRIBD, Chinese University of Hong Kong (Shenzhen), Shenzhen 518000, China
| | - Lanxuan Liu
- Shanghai Artificial Intelligence Laboratory, Shanghai 200000, China
| | - Jie Xu
- Shanghai Artificial Intelligence Laboratory, Shanghai 200000, China
| | - Guanbin Li
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| |
Collapse
|
11
|
Choi SR, Lee M. Transformer Architecture and Attention Mechanisms in Genome Data Analysis: A Comprehensive Review. BIOLOGY 2023; 12:1033. [PMID: 37508462 PMCID: PMC10376273 DOI: 10.3390/biology12071033] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/18/2023] [Accepted: 07/21/2023] [Indexed: 07/30/2023]
Abstract
The emergence and rapid development of deep learning, specifically transformer-based architectures and attention mechanisms, have had transformative implications across several domains, including bioinformatics and genome data analysis. The analogous nature of genome sequences to language texts has enabled the application of techniques that have exhibited success in fields ranging from natural language processing to genomic data. This review provides a comprehensive analysis of the most recent advancements in the application of transformer architectures and attention mechanisms to genome and transcriptome data. The focus of this review is on the critical evaluation of these techniques, discussing their advantages and limitations in the context of genome data analysis. With the swift pace of development in deep learning methodologies, it becomes vital to continually assess and reflect on the current standing and future direction of the research. Therefore, this review aims to serve as a timely resource for both seasoned researchers and newcomers, offering a panoramic view of the recent advancements and elucidating the state-of-the-art applications in the field. Furthermore, this review paper serves to highlight potential areas of future investigation by critically evaluating studies from 2019 to 2023, thereby acting as a stepping-stone for further research endeavors.
Collapse
Affiliation(s)
| | - Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea;
| |
Collapse
|
12
|
Duan M, Wang Y, Zhao D, Liu H, Zhang G, Li K, Zhang H, Huang L, Zhang R, Zhou F. Orchestrating information across tissues via a novel multitask GAT framework to improve quantitative gene regulation relation modeling for survival analysis. Brief Bioinform 2023; 24:bbad238. [PMID: 37427963 DOI: 10.1093/bib/bbad238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Revised: 05/29/2023] [Accepted: 06/08/2023] [Indexed: 07/11/2023] Open
Abstract
Survival analysis is critical to cancer prognosis estimation. High-throughput technologies facilitate the increase in the dimension of genic features, but the number of clinical samples in cohorts is relatively small due to various reasons, including difficulties in participant recruitment and high data-generation costs. Transcriptome is one of the most abundantly available OMIC (referring to the high-throughput data, including genomic, transcriptomic, proteomic and epigenomic) data types. This study introduced a multitask graph attention network (GAT) framework DQSurv for the survival analysis task. We first used a large dataset of healthy tissue samples to pretrain the GAT-based HealthModel for the quantitative measurement of the gene regulatory relations. The multitask survival analysis framework DQSurv used the idea of transfer learning to initiate the GAT model with the pretrained HealthModel and further fine-tuned this model using two tasks i.e. the main task of survival analysis and the auxiliary task of gene expression prediction. This refined GAT was denoted as DiseaseModel. We fused the original transcriptomic features with the difference vector between the latent features encoded by the HealthModel and DiseaseModel for the final task of survival analysis. The proposed DQSurv model stably outperformed the existing models for the survival analysis of 10 benchmark cancer types and an independent dataset. The ablation study also supported the necessity of the main modules. We released the codes and the pretrained HealthModel to facilitate the feature encodings and survival analysis of transcriptome-based future studies, especially on small datasets. The model and the code are available at http://www.healthinformaticslab.org/supp/.
Collapse
Affiliation(s)
- Meiyu Duan
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
| | - Yueying Wang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
| | - Dong Zhao
- School of Biology and Engineering, and Engineering Research Center of Medical Biotechnology, Guizhou Medical University, Guiyang, Guizhou 550025, China
| | - Hongmei Liu
- School of Biology and Engineering, and Engineering Research Center of Medical Biotechnology, Guizhou Medical University, Guiyang, Guizhou 550025, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China, 130012
| | - Gongyou Zhang
- School of Biology and Engineering, and Engineering Research Center of Medical Biotechnology, Guizhou Medical University, Guiyang, Guizhou 550025, China
| | - Kewei Li
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
| | - Haotian Zhang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
| | - Lan Huang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China, 130012
| | - Ruochi Zhang
- School of Artificial Intelligence, Jilin University, Changchun, China, 130012
| | - Fengfeng Zhou
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China, 130012
| |
Collapse
|
13
|
Feldner-Busztin D, Firbas Nisantzis P, Edmunds SJ, Boza G, Racimo F, Gopalakrishnan S, Limborg MT, Lahti L, de Polavieja GG. Dealing with dimensionality: the application of machine learning to multi-omics data. Bioinformatics 2023; 39:6986971. [PMID: 36637211 PMCID: PMC9907220 DOI: 10.1093/bioinformatics/btad021] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 12/02/2022] [Accepted: 01/11/2023] [Indexed: 01/14/2023] Open
Abstract
MOTIVATION Machine learning (ML) methods are motivated by the need to automate information extraction from large datasets in order to support human users in data-driven tasks. This is an attractive approach for integrative joint analysis of vast amounts of omics data produced in next generation sequencing and other -omics assays. A systematic assessment of the current literature can help to identify key trends and potential gaps in methodology and applications. We surveyed the literature on ML multi-omic data integration and quantitatively explored the goals, techniques and data involved in this field. We were particularly interested in examining how researchers use ML to deal with the volume and complexity of these datasets. RESULTS Our main finding is that the methods used are those that address the challenges of datasets with few samples and many features. Dimensionality reduction methods are used to reduce the feature count alongside models that can also appropriately handle relatively few samples. Popular techniques include autoencoders, random forests and support vector machines. We also found that the field is heavily influenced by the use of The Cancer Genome Atlas dataset, which is accessible and contains many diverse experiments. AVAILABILITY AND IMPLEMENTATION All data and processing scripts are available at this GitLab repository: https://gitlab.com/polavieja_lab/ml_multi-omics_review/ or in Zenodo: https://doi.org/10.5281/zenodo.7361807. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dylan Feldner-Busztin
- Champalimaud Centre for the Unknown, Champalimaud Foundation, 1400-038 Lisbon, Portugal
| | | | - Shelley Jane Edmunds
- Center for Evolutionary Hologenomics, GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, 1353 Copenhagen, Denmark
| | - Gergely Boza
- Centre for Ecological Research, 1113 Budapest, Hungary
| | - Fernando Racimo
- Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Shyam Gopalakrishnan
- Center for Evolutionary Hologenomics, GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, 1353 Copenhagen, Denmark
| | - Morten Tønsberg Limborg
- Center for Evolutionary Hologenomics, GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, 1353 Copenhagen, Denmark
| | - Leo Lahti
- Department of Computing, University of Turku, 20014 Turku, Finland
| | | |
Collapse
|
14
|
Song X, Li J, Qian X. Diagnosis of Glioblastoma Multiforme Progression via Interpretable Structure-Constrained Graph Neural Networks. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:380-390. [PMID: 36018877 DOI: 10.1109/tmi.2022.3202037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Glioblastoma multiforme (GBM) is the most common type of brain tumors with high recurrence and mortality rates. After chemotherapy treatment, GBM patients still show a high rate of differentiating pseudoprogression (PsP), which is often confused as true tumor progression (TTP) due to high phenotypical similarities. Thus, it is crucial to construct an automated diagnosis model for differentiating between these two types of glioma progression. However, attaining this goal is impeded by the limited data availability and the high demand for interpretability in clinical settings. In this work, we propose an interpretable structure-constrained graph neural network (ISGNN) with enhanced features to automatically discriminate between PsP and TTP. This network employs a metric-based meta-learning strategy to aggregate class-specific graph nodes, focus on meta-tasks associated with various small graphs, thus improving the classification performance on small-scale datasets. Specifically, a node feature enhancement module is proposed to account for the relative importance of node features and enhance their distinguishability through inductive learning. A graph generation constraint module enables learning reasonable graph structures to improve the efficiency of information diffusion while avoiding propagation errors. Furthermore, model interpretability can be naturally enhanced based on the learned node features and graph structures that are closely related to the classification results. Comprehensive experimental evaluation of our method demonstrated excellent interpretable results in the diagnosis of glioma progression. In general, our work provides a novel systematic GNN approach for dealing with data scarcity and enhancing decision interpretability. Our source codes will be released at https://github.com/SJTUBME-QianLab/GBM-GNN.
Collapse
|
15
|
Tian Z, Peng X, Fang H, Zhang W, Dai Q, Ye Y. MHADTI: predicting drug-target interactions via multiview heterogeneous information network embedding with hierarchical attention mechanisms. Brief Bioinform 2022; 23:6761042. [PMID: 36242566 DOI: 10.1093/bib/bbac434] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 08/19/2022] [Accepted: 09/08/2022] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION Discovering the drug-target interactions (DTIs) is a crucial step in drug development such as the identification of drug side effects and drug repositioning. Since identifying DTIs by web-biological experiments is time-consuming and costly, many computational-based approaches have been proposed and have become an efficient manner to infer the potential interactions. Although extensive effort is invested to solve this task, the prediction accuracy still needs to be improved. More especially, heterogeneous network-based approaches do not fully consider the complex structure and rich semantic information in these heterogeneous networks. Therefore, it is still a challenge to predict DTIs efficiently. RESULTS In this study, we develop a novel method via Multiview heterogeneous information network embedding with Hierarchical Attention mechanisms to discover potential Drug-Target Interactions (MHADTI). Firstly, MHADTI constructs different similarity networks for drugs and targets by utilizing their multisource information. Combined with the known DTI network, three drug-target heterogeneous information networks (HINs) with different views are established. Secondly, MHADTI learns embeddings of drugs and targets from multiview HINs with hierarchical attention mechanisms, which include the node-level, semantic-level and graph-level attentions. Lastly, MHADTI employs the multilayer perceptron to predict DTIs with the learned deep feature representations. The hierarchical attention mechanisms could fully consider the importance of nodes, meta-paths and graphs in learning the feature representations of drugs and targets, which makes their embeddings more comprehensively. Extensive experimental results demonstrate that MHADTI performs better than other SOTA prediction models. Moreover, analysis of prediction results for some interested drugs and targets further indicates that MHADTI has advantages in discovering DTIs. AVAILABILITY AND IMPLEMENTATION https://github.com/pxystudy/MHADTI.
Collapse
Affiliation(s)
- Zhen Tian
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Xiangyu Peng
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Haichuan Fang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Wenjie Zhang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Qiguo Dai
- School of Computer Science and Engineering, Dalian Minzu University, Dalian,116600, China
| | - Yangdong Ye
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| |
Collapse
|