1
|
Alsaedi S, Ogasawara M, Alarawi M, Gao X, Gojobori T. AI-powered precision medicine: utilizing genetic risk factor optimization to revolutionize healthcare. NAR Genom Bioinform 2025; 7:lqaf038. [PMID: 40330081 PMCID: PMC12051108 DOI: 10.1093/nargab/lqaf038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 02/11/2025] [Accepted: 04/17/2025] [Indexed: 05/08/2025] Open
Abstract
The convergence of artificial intelligence (AI) and biomedical data is transforming precision medicine by enabling the use of genetic risk factors (GRFs) for customized healthcare services based on individual needs. Although GRFs play an essential role in disease susceptibility, progression, and therapeutic outcomes, a gap exists in exploring their contribution to AI-powered precision medicine. This paper addresses this need by investigating the significance and potential of utilizing GRFs with AI in the medical field. We examine their applications, particularly emphasizing their impact on disease prediction, treatment personalization, and overall healthcare improvement. This review explores the application of AI algorithms to optimize the use of GRFs, aiming to advance precision medicine in disease screening, patient stratification, drug discovery, and understanding disease mechanisms. Through a variety of case studies and examples, we demonstrate the potential of incorporating GRFs facilitated by AI into medical practice, resulting in more precise diagnoses, targeted therapies, and improved patient outcomes. This review underscores the potential of GRFs, empowered by AI, to enhance precision medicine by improving diagnostic accuracy, treatment precision, and individualized healthcare solutions.
Collapse
Affiliation(s)
- Sakhaa Alsaedi
- Computer Science, Division of Computer, Electrical and Mathematical Sciences and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Center of Excellence on Smart Health, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Center of Excellence for Generative AI, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- College of Computer Science and Engineering (CCSE), Taibah University, 42353 Madinah, Kingdom of Saudi Arabia
| | - Michihiro Ogasawara
- Department of Internal Medicine and Rheumatology, Juntendo University, 113-8431 Tokyo, Japan
| | - Mohammed Alarawi
- Center of Excellence on Smart Health, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Center of Excellence for Generative AI, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Biological and Environmental Sciences and Engineering, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
| | - Xin Gao
- Computer Science, Division of Computer, Electrical and Mathematical Sciences and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Center of Excellence on Smart Health, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Center of Excellence for Generative AI, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
| | - Takashi Gojobori
- Center of Excellence on Smart Health, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Center of Excellence for Generative AI, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Biological and Environmental Sciences and Engineering, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Kingdom of Saudi Arabia
- Marine Open Innovation Institute (MaOI), 113-8431 Shizuoka, Japan
| |
Collapse
|
2
|
Ye M, Ren S, Luo H, Wu X, Lian H, Cai X, Ji Y. Integration of graph neural networks and transcriptomics analysis identify key pathways and gene signature for immunotherapy response and prognosis of skin melanoma. BMC Cancer 2025; 25:648. [PMID: 40205338 PMCID: PMC11983817 DOI: 10.1186/s12885-025-13611-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Accepted: 01/29/2025] [Indexed: 04/11/2025] Open
Abstract
OBJECTIVE The assessment of immunotherapy plays a pivotal role in the clinical management of skin melanoma. Graph neural networks (GNNs), alongside other deep learning algorithms and bioinformatics approaches, have demonstrated substantial promise in advancing cancer diagnosis and treatment strategies. METHODS GNNs models were developed to predict the response to immunotherapy and to pinpoint key pathways. Utilizing the genes from these key pathways, multi-omics bioinformatics methods were employed to refine the construction of a gene signature, termed responseScore, aimed at enhancing the precision of immunotherapy response predictions. Subsequently, responseScore was explored from the perspectives of prognosis, genetic variation, pathway enrichment, and the tumor microenvironment. Concurrently, the association among 13 genes contributing to responseScore and factors such as immunotherapy response, prognosis, and the tumor microenvironment was investigated. Among these genes, PSMB6 was subjected to an in-depth analysis of its biological effect through experimental approaches like transfection and co-culture. RESULTS In the finalized model utilizing GNNs, it has revealed an AUC of 0.854 within the training dataset and 0.824 within the testing set, pinpointing key pathways such as R-HSA-70,268. The indicator named as responseScore excelled in its predictive accuracy regarding immunotherapy response and patient prognosis. Investigations into genetic variation, pathway enrichment, tumor microenvironment disclosed a profound association between responseScore and the enhancement of immune cell infiltration and anti-tumor immunity. A negative correlation was observed between the expression of PSMB6 and immune genes, with elevated PSMB6 expression correlating with poor prognosis. ELISA detection after co-cultivation experiments revealed significant reductions in the levels of cytokines IL-6 and IL-1β in specimens from the PCDH-PSMB6 group. CONCLUSION The GNNs prediction model and the responseScore developed in this research effectively indicate the immunotherapy response and prognosis for patients with skin melanoma. Additionally, responseScore provides insights into the tumor microenvironment and the characteristics of tumor immunity of melanoma. Thirteen genes identified in this study show promise as potential tumor markers or therapeutic targets. Notably, PSMB6 emerges as a potential therapeutic target for skin melanoma, where its elevated expression exhibits an inhibitory effect on the tumor immunity.
Collapse
Affiliation(s)
- Maodong Ye
- Medical Cosmetic Center, First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong, 515041, P.R. China
| | - Shuai Ren
- Medical Cosmetic Center, First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong, 515041, P.R. China
| | - Huanjuan Luo
- Shantou University Medical College, Shantou, Guangdong, 515041, P.R. China
| | - Xiumin Wu
- Shantou University Medical College, Shantou, Guangdong, 515041, P.R. China
| | - Hongwei Lian
- Shantou University Medical College, Shantou, Guangdong, 515041, P.R. China
| | - Xiangna Cai
- Medical Cosmetic Center, First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong, 515041, P.R. China
| | - Yingchang Ji
- Medical Cosmetic Center, First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong, 515041, P.R. China.
| |
Collapse
|
3
|
Tan CY, Ong HF, Lim CH, Tan MS, Ooi EH, Wong K. Amogel: a multi-omics classification framework using associative graph neural networks with prior knowledge for biomarker identification. BMC Bioinformatics 2025; 26:94. [PMID: 40155814 PMCID: PMC11954243 DOI: 10.1186/s12859-025-06111-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2024] [Accepted: 03/10/2025] [Indexed: 04/01/2025] Open
Abstract
The advent of high-throughput sequencing technologies, such as DNA microarray and DNA sequencing, has enabled effective analysis of cancer subtypes and targeted treatment. Furthermore, numerous studies have highlighted the capability of graph neural networks (GNN) to model complex biological systems and capture non-linear interactions in high-throughput data. GNN has proven to be useful in leveraging multiple types of omics data, including prior biological knowledge from various sources, such as transcriptomics, genomics, proteomics, and metabolomics, to improve cancer classification. However, current works do not fully utilize the non-linear learning potential of GNN and lack of the integration ability to analyse high-throughput multi-omics data simultaneously with prior biological knowledge. Nevertheless, relying on limited prior knowledge in generating gene graphs might lead to less accurate classification due to undiscovered significant gene-gene interactions, which may require expert intervention and can be time-consuming. Hence, this study proposes a graph classification model called associative multi-omics graph embedding learning (AMOGEL) to effectively integrate multi-omics datasets and prior knowledge through GNN coupled with association rule mining (ARM). AMOGEL employs an early fusion technique using ARM to mine intra-omics and inter-omics relationships, forming a multi-omics synthetic information graph before the model training. Moreover, AMOGEL introduces multi-dimensional edges, with multi-omics gene associations or edges as the main contributors and prior knowledge edges as auxiliary contributors. Additionally, it uses a gene ranking technique based on attention scores, considering the relationships between neighbouring genes. Several experiments were performed on BRCA and KIPAN cancer subtypes to demonstrate the integration of multi-omics datasets (miRNA, mRNA, and DNA methylation) with prior biological knowledge of protein-protein interactions, KEGG pathways and Gene Ontology. The experimental results showed that the AMOGEL outperformed the current state-of-the-art models in terms of classification accuracy, F1 score and AUC score. The findings of this study represent a crucial step forward in advancing the effective integration of multi-omics data and prior knowledge to improve cancer subtype classification.
Collapse
Affiliation(s)
- Chia Yan Tan
- School of Information Technology, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Petaling Jaya, Selangor, Malaysia.
| | - Huey Fang Ong
- School of Information Technology, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Petaling Jaya, Selangor, Malaysia
| | - Chern Hong Lim
- School of Information Technology, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Petaling Jaya, Selangor, Malaysia
| | - Mei Sze Tan
- School of Information Technology, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Petaling Jaya, Selangor, Malaysia
| | - Ean Hin Ooi
- School of Engineering, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Petaling Jaya, Selangor, Malaysia
| | - KokSheik Wong
- School of Information Technology, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Petaling Jaya, Selangor, Malaysia
| |
Collapse
|
4
|
Thapa K, Kinali M, Pei S, Luna A, Babur Ö. Strategies to include prior knowledge in omics analysis with deep neural networks. PATTERNS (NEW YORK, N.Y.) 2025; 6:101203. [PMID: 40182174 PMCID: PMC11963003 DOI: 10.1016/j.patter.2025.101203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/05/2025]
Abstract
High-throughput molecular profiling technologies have revolutionized molecular biology research in the past decades. One important use of molecular data is to make predictions of phenotypes and other features of the organisms using machine learning algorithms. Deep learning models have become increasingly popular for this task due to their ability to learn complex non-linear patterns. Applying deep learning to molecular profiles, however, is challenging due to the very high dimensionality of the data and relatively small sample sizes, causing models to overfit. A solution is to incorporate biological prior knowledge to guide the learning algorithm for processing the functionally related input together. This helps regularize the models and improve their generalizability and interpretability. Here, we describe three major strategies proposed to use prior knowledge in deep learning models to make predictions based on molecular profiles. We review the related deep learning architectures, including the major ideas in relatively new graph neural networks.
Collapse
Affiliation(s)
- Kisan Thapa
- Computer Science Department, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, MA 02125, USA
| | - Meric Kinali
- Computer Science Department, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, MA 02125, USA
| | - Shichao Pei
- Computer Science Department, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, MA 02125, USA
| | - Augustin Luna
- Developmental Therapeutics Branch, Center for Cancer Research, National Cancer Institute, NIH, 9000 Rockville Pike, Bathesda, MD 20892, USA
- Computational Biology Branch, National Library of Medicine, NIH, 9000 Rockville Pike, Bathesda, MD 20892, USA
| | - Özgün Babur
- Computer Science Department, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, MA 02125, USA
| |
Collapse
|
5
|
McNeela D, Sala F, Gitter A. Product Manifold Representations for Learning on Biological Pathways. ARXIV 2025:arXiv:2401.15478v2. [PMID: 39975438 PMCID: PMC11838783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Machine learning models that embed graphs in non-Euclidean spaces have shown substantial benefits in a variety of contexts, but their application has not been studied extensively in the biological domain, particularly with respect to biological pathway graphs. Such graphs exhibit a variety of complex network structures, presenting challenges to existing embedding approaches. Learning high-quality embeddings for biological pathway graphs is important for researchers looking to understand the underpinnings of disease and train high-quality predictive models on these networks. In this work, we investigate the effects of embedding pathway graphs in non-Euclidean mixed-curvature spaces and compare against traditional Euclidean graph representation learning models. We then train a supervised model using the learned node embeddings to predict missing protein-protein interactions in pathway graphs. We find large reductions in distortion and boosts on in-distribution edge prediction performance as a result of using mixed-curvature embeddings and their corresponding graph neural network models. However, we find that mixed-curvature representations underperform existing baselines on out-of-distribution edge prediction performance suggesting that these representations may overfit to the training graph topology. We provide our Mixed-Curvature Product Graph Convolutional Network code at https://github.com/mcneela/Mixed-Curvature-GCN and our pathway analysis code at https://github.com/mcneela/Mixed-Curvature-Pathways.
Collapse
Affiliation(s)
- Daniel McNeela
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA
- Morgridge Institute for Research, Madison, WI, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Frederic Sala
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA
| | - Anthony Gitter
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA
- Morgridge Institute for Research, Madison, WI, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
6
|
Huang W, Gong H, Zhang H, Wang Y, Wan X, Li G, Li H, Shen H. BCNet: Bronchus Classification via Structure Guided Representation Learning. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:489-498. [PMID: 39178085 DOI: 10.1109/tmi.2024.3448468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/25/2024]
Abstract
CT-based bronchial tree analysis is a key step for the diagnosis of lung and airway diseases. However, the topology of bronchial trees varies across individuals, which presents a challenge to the automatic bronchus classification. To solve this issue, we propose the Bronchus Classification Network (BCNet), a structure-guided framework that exploits the segment-level topological information using point clouds to learn the voxel-level features. BCNet has two branches, a Point-Voxel Graph Neural Network (PV-GNN) for segment classification, and a Convolutional Neural Network (CNN) for voxel labeling. The two branches are simultaneously trained to learn topology-aware features for their shared backbone while it is feasible to run only the CNN branch for the inference. Therefore, BCNet maintains the same inference efficiency as its CNN baseline. Experimental results show that BCNet significantly exceeds the state-of-the-art methods by over 8.0% both on F1-score for classifying bronchus. Furthermore, we contribute BronAtlas: an open-access benchmark of bronchus imaging analysis with high-quality voxel-wise annotations of both anatomical and abnormal bronchial segments. The benchmark is available at https://osf.io/pskr9/?viewonly=94fa3d87274b4095ac9a4b88cc9a1341.
Collapse
|
7
|
Ren S, Lu Y, Zhang G, Xie K, Chen D, Cai X, Ye M. Integration of Graph Neural Networks and multi-omics analysis identify the predictive factor and key gene for immunotherapy response and prognosis of bladder cancer. J Transl Med 2024; 22:1141. [PMID: 39716185 DOI: 10.1186/s12967-024-05976-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Accepted: 12/13/2024] [Indexed: 12/25/2024] Open
Abstract
OBJECTIVE The evaluation of the efficacy of immunotherapy is of great value for the clinical treatment of bladder cancer. Graph Neural Networks (GNNs), pathway analysis and multi-omics analysis have shown great potential in the field of cancer diagnosis and treatment. METHODS A GNNs model was constructed to predict the immunotherapy response and identify key pathways. Based on the genes of key pathways, bioinformatic methods were used to generate a simple linear scoring model, namely responseScore. The intrinsic mechanism of responseScore was explored from the perspectives of multi-omics analysis. The relationship between each gene involved in responseScore and prognosis was also explored. Transfection experiments with human bladder cancer cells were used to investigate the biological effects of PSMB9 gene. RESULTS The final GNNs model had an AUC of 0.785 on the training set and an AUC of 0.839 on the validation set. R-HSA-69620 and others were identified as key pathways. ResponseScore had a good performance in predicted the immunotherapy response and prognosis. Analysis results from genetic variation, pathways and tumor microenvironment, showed that responseScore was significantly associated with immune cell infiltration and anti-tumor immunity. The results of single-cell analysis showed that responseScore was closely related to the functional state of natural killer cells. Compared with the PCDH-NC group, cell migration and proliferation were significantly inhibited while cell apoptosis increased in the PCDH-PSMB9 group. CONCLUSION The GNNs predictive model and responseScore constructed in this study can reflect the immunotherapy response and prognosis of bladder cancer patients. ResponseScore can also reflect features such as tumor microenvironment, antitumor immunity, and natural killer cell function status in bladder cancer. PSMB9 was identified as a significant gene for prognosis. High expression of PSMB9 can inhibit bladder cancer cell migration and proliferation while increasing cell apoptosis.
Collapse
Affiliation(s)
- Shuai Ren
- Medical Cosmetic Center, First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong, 515041, People's Republic of China
| | - Yongjian Lu
- Shantou University Medical College, Shantou, Guangdong, 515041, People's Republic of China
| | - Guangping Zhang
- Shantou University Medical College, Shantou, Guangdong, 515041, People's Republic of China
| | - Ke Xie
- Medical Cosmetic Center, First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong, 515041, People's Republic of China
| | - Danni Chen
- Shantou University Medical College, Shantou, Guangdong, 515041, People's Republic of China
| | - Xiangna Cai
- Medical Cosmetic Center, First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong, 515041, People's Republic of China
| | - Maodong Ye
- Medical Cosmetic Center, First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong, 515041, People's Republic of China.
| |
Collapse
|
8
|
van Hilten A, Katz S, Saccenti E, Niessen WJ, Roshchupkin GV. Designing interpretable deep learning applications for functional genomics: a quantitative analysis. Brief Bioinform 2024; 25:bbae449. [PMID: 39293804 PMCID: PMC11410376 DOI: 10.1093/bib/bbae449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 08/07/2024] [Accepted: 08/28/2024] [Indexed: 09/20/2024] Open
Abstract
Deep learning applications have had a profound impact on many scientific fields, including functional genomics. Deep learning models can learn complex interactions between and within omics data; however, interpreting and explaining these models can be challenging. Interpretability is essential not only to help progress our understanding of the biological mechanisms underlying traits and diseases but also for establishing trust in these model's efficacy for healthcare applications. Recognizing this importance, recent years have seen the development of numerous diverse interpretability strategies, making it increasingly difficult to navigate the field. In this review, we present a quantitative analysis of the challenges arising when designing interpretable deep learning solutions in functional genomics. We explore design choices related to the characteristics of genomics data, the neural network architectures applied, and strategies for interpretation. By quantifying the current state of the field with a predefined set of criteria, we find the most frequent solutions, highlight exceptional examples, and identify unexplored opportunities for developing interpretable deep learning models in genomics.
Collapse
Affiliation(s)
- Arno van Hilten
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| | - Sonja Katz
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Wiro J Niessen
- Department of Imaging Physics, Delft University of Technology, 2628 CD Delft, The Netherlands
| | - Gennady V Roshchupkin
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Department of Epidemiology, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| |
Collapse
|
9
|
Yan H, Weng D, Li D, Gu Y, Ma W, Liu Q. Prior knowledge-guided multilevel graph neural network for tumor risk prediction and interpretation via multi-omics data integration. Brief Bioinform 2024; 25:bbae184. [PMID: 38670157 PMCID: PMC11052635 DOI: 10.1093/bib/bbae184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/11/2024] [Accepted: 04/06/2024] [Indexed: 04/28/2024] Open
Abstract
The interrelation and complementary nature of multi-omics data can provide valuable insights into the intricate molecular mechanisms underlying diseases. However, challenges such as limited sample size, high data dimensionality and differences in omics modalities pose significant obstacles to fully harnessing the potential of these data. The prior knowledge such as gene regulatory network and pathway information harbors useful gene-gene interaction and gene functional module information. To effectively integrate multi-omics data and make full use of the prior knowledge, here, we propose a Multilevel-graph neural network (GNN): a hierarchically designed deep learning algorithm that sequentially leverages multi-omics data, gene regulatory networks and pathway information to extract features and enhance accuracy in predicting survival risk. Our method achieved better accuracy compared with existing methods. Furthermore, key factors nonlinearly associated with the tumor pathogenesis are prioritized by employing two interpretation algorithms (i.e. GNN-Explainer and IGscore) for neural networks, at gene and pathway level, respectively. The top genes and pathways exhibit strong associations with disease in survival analyses, many of which such as SEC61G and CYP27B1 are previously reported in the literature.
Collapse
Affiliation(s)
- Hongxi Yan
- Department of Computer Science, Beihang University, XueYuan Road, 100191, BeiJing, China
| | - Dawei Weng
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Dongguo Li
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Yu Gu
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Wenji Ma
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, 227 South Chongqing Road, 200025, Shanghai, China
| | - Qingjie Liu
- Department of Computer Science, Beihang University, XueYuan Road, 100191, BeiJing, China
| |
Collapse
|
10
|
Castaneda EU, Baker EJ. KNeXT: a NetworkX-based topologically relevant KEGG parser. Front Genet 2024; 15:1292394. [PMID: 38415058 PMCID: PMC10896898 DOI: 10.3389/fgene.2024.1292394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 01/25/2024] [Indexed: 02/29/2024] Open
Abstract
Automating the recreation of gene and mixed gene-compound networks from Kyoto Encyclopedia of Genes and Genomes (KEGG) Markup Language (KGML) files is challenging because the data structure does not preserve the independent or loosely connected neighborhoods in which they were originally derived, referred to here as its topological environment. Identical accession numbers may overlap, causing neighborhoods to artificially collapse based on duplicated identifiers. This causes current parsers to create misleading or erroneous graphical representations when mixed gene networks are converted to gene-only networks. To overcome these challenges we created a python-based KEGG NetworkX Topological (KNeXT) parser that allows users to accurately recapitulate genetic networks and mixed networks from KGML map data. The software, archived as a python package index (PyPI) file to ensure broad application, is designed to ingest KGML files through built-in APIs and dynamically create high-fidelity topological representations. The utilization of NetworkX's framework to generate tab-separated files additionally ensures that KNeXT results may be imported into other graph frameworks and maintain programmatic access to the original x-y axis positions to each node in the KEGG pathway. KNeXT is a well-described Python 3 package that allows users to rapidly download and aggregate specific KGML files and recreate KEGG pathways based on a range of user-defined settings. KNeXT is platform-independent, distinctive, and it is not written on top of other Python parsers. Furthermore, KNeXT enables users to parse entire local folders or single files through command line scripts and convert the output into NCBI or UniProt IDs. KNeXT provides an ability for researchers to generate pathway visualizations while persevering the original context of a KEGG pathway. Source code is freely available at https://github.com/everest-castaneda/knext.
Collapse
Affiliation(s)
- Everest Uriel Castaneda
- Department of Biology, Baylor University, Waco, TX, United States
- School of Engineering and Computer Science, Baylor University, Waco, TX, United States
| | - Erich J Baker
- Department of Mathematics and Computer Science, Belmont University, Nashville, TN, United States
| |
Collapse
|
11
|
Wang H, Zhang L, Zhao H, Wu R, Sun X, Cen Y, Zhang L. Feature multi-level attention spatio-temporal graph residual network: A novel approach to ammonia nitrogen concentration prediction in water bodies by integrating external influences and spatio-temporal correlations. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 906:167591. [PMID: 37802332 DOI: 10.1016/j.scitotenv.2023.167591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/16/2023] [Accepted: 10/03/2023] [Indexed: 10/08/2023]
Abstract
Accurate prediction of ammonia nitrogen concentration in water is of great significance for urban water quality management and pollution early warning. In order to improve the prediction accuracy of ammonia nitrogen concentration in water, this study developed a novel model based on graph neural networks called Feature Multi-level Attention Spatio-Temporal Graph Residual Network (FMA-STGRN). The FMA-STGRN model utilizes external influencing factors such as meteorological factors and point of interest data, as well as the spatio-temporal correlation information of ammonia nitrogen concentration between water quality monitoring stations, to accurately predict the concentration of ammonia nitrogen in water. The model consists of four main components: feature multi-level attention module, spatial graph convolution module, temporal-domain residual decomposition module, and feature fusion and output module. Through the organic combination of these four modules, FMA-STGRN can more effectively explore the complex spatio-temporal correlation relationships between water quality monitoring stations and more accurately integrate and utilize external influencing factors, thereby improving the prediction accuracy of ammonia nitrogen concentration in water. Experimental results show that the FMA-STGRN model outperforms other benchmark models such as RF, MART, MLP, LSTM, GRU, ST-GCN, and ST-GAT in various aspects. In addition, a series of feature ablation experiments were conducted to further reveal the key contributions of meteorological factors and point of interest data to the model performance. Overall, our research provides a powerful and practical tool for water quality monitoring and urban water management, with broad application prospects.
Collapse
Affiliation(s)
- Hongqing Wang
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lifu Zhang
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
| | - Hongying Zhao
- School of Earth and Space Sciences, Peking University, Beijing 100871, China.
| | - Rong Wu
- Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China
| | - Xuejian Sun
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
| | - Yi Cen
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
| | - Linshan Zhang
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
| |
Collapse
|
12
|
Gogoshin G, Rodin AS. Graph Neural Networks in Cancer and Oncology Research: Emerging and Future Trends. Cancers (Basel) 2023; 15:5858. [PMID: 38136405 PMCID: PMC10742144 DOI: 10.3390/cancers15245858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/09/2023] [Accepted: 12/14/2023] [Indexed: 12/24/2023] Open
Abstract
Next-generation cancer and oncology research needs to take full advantage of the multimodal structured, or graph, information, with the graph data types ranging from molecular structures to spatially resolved imaging and digital pathology, biological networks, and knowledge graphs. Graph Neural Networks (GNNs) efficiently combine the graph structure representations with the high predictive performance of deep learning, especially on large multimodal datasets. In this review article, we survey the landscape of recent (2020-present) GNN applications in the context of cancer and oncology research, and delineate six currently predominant research areas. We then identify the most promising directions for future research. We compare GNNs with graphical models and "non-structured" deep learning, and devise guidelines for cancer and oncology researchers or physician-scientists, asking the question of whether they should adopt the GNN methodology in their research pipelines.
Collapse
Affiliation(s)
- Grigoriy Gogoshin
- Department of Computational and Quantitative Medicine, Beckman Research Institute, and Diabetes and Metabolism Research Institute, City of Hope National Medical Center, 1500 East Duarte Road, Duarte, CA 91010, USA
| | - Andrei S. Rodin
- Department of Computational and Quantitative Medicine, Beckman Research Institute, and Diabetes and Metabolism Research Institute, City of Hope National Medical Center, 1500 East Duarte Road, Duarte, CA 91010, USA
| |
Collapse
|
13
|
Gong H, Zhang Y, Dong C, Wang Y, Chen G, Liang B, Li H, Liu L, Xu J, Li G. Unbiased curriculum learning enhanced global-local graph neural network for protein thermodynamic stability prediction. Bioinformatics 2023; 39:btad589. [PMID: 37740312 PMCID: PMC10918760 DOI: 10.1093/bioinformatics/btad589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 08/04/2023] [Accepted: 09/21/2023] [Indexed: 09/24/2023] Open
Abstract
MOTIVATION Proteins play crucial roles in biological processes, with their functions being closely tied to thermodynamic stability. However, measuring stability changes upon point mutations of amino acid residues using physical methods can be time-consuming. In recent years, several computational methods for protein thermodynamic stability prediction (PTSP) based on deep learning have emerged. Nevertheless, these approaches either overlook the natural topology of protein structures or neglect the inherent noisy samples resulting from theoretical calculation or experimental errors. RESULTS We propose a novel Global-Local Graph Neural Network powered by Unbiased Curriculum Learning for the PTSP task. Our method first builds a Siamese graph neural network to extract protein features before and after mutation. Since the graph's topological changes stem from local node mutations, we design a local feature transformation module to make the model focus on the mutated site. To address model bias caused by noisy samples, which represent unavoidable errors from physical experiments, we introduce an unbiased curriculum learning method. This approach effectively identifies and re-weights noisy samples during the training process. Extensive experiments demonstrate that our proposed method outperforms advanced protein stability prediction methods, and surpasses state-of-the-art learning methods for regression prediction tasks. AVAILABILITY AND IMPLEMENTATION All code and data is available at https://github.com/haifangong/UCL-GLGNN.
Collapse
Affiliation(s)
- Haifan Gong
- Shanghai Artificial Intelligence Laboratory, Shanghai 200000, China
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
- SRIBD, Chinese University of Hong Kong (Shenzhen), Shenzhen 518000, China
| | - Yumeng Zhang
- Shanghai Jiao Tong University, Shanghai 200000, China
| | - Chenhe Dong
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Yue Wang
- Qilu Hospital, Shandong University, Shandong 250000, China
| | - Guanqi Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Bilin Liang
- Shanghai Artificial Intelligence Laboratory, Shanghai 200000, China
| | - Haofeng Li
- SRIBD, Chinese University of Hong Kong (Shenzhen), Shenzhen 518000, China
| | - Lanxuan Liu
- Shanghai Artificial Intelligence Laboratory, Shanghai 200000, China
| | - Jie Xu
- Shanghai Artificial Intelligence Laboratory, Shanghai 200000, China
| | - Guanbin Li
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| |
Collapse
|
14
|
Pang J, Liang B, Ding R, Yan Q, Chen R, Xu J. A denoised multi-omics integration framework for cancer subtype classification and survival prediction. Brief Bioinform 2023; 24:bbad304. [PMID: 37594302 DOI: 10.1093/bib/bbad304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 07/04/2023] [Accepted: 08/04/2023] [Indexed: 08/19/2023] Open
Abstract
The availability of high-throughput sequencing data creates opportunities to comprehensively understand human diseases as well as challenges to train machine learning models using such high dimensions of data. Here, we propose a denoised multi-omics integration framework, which contains a distribution-based feature denoising algorithm, Feature Selection with Distribution (FSD), for dimension reduction and a multi-omics integration framework, Attention Multi-Omics Integration (AttentionMOI) to predict cancer prognosis and identify cancer subtypes. We demonstrated that FSD improved model performance either using single omic data or multi-omics data in 15 The Cancer Genome Atlas Program (TCGA) cancers for survival prediction and kidney cancer subtype identification. And our integration framework AttentionMOI outperformed machine learning models and current multi-omics integration algorithms with high dimensions of features. Furthermore, FSD identified features that were associated to cancer prognosis and could be considered as biomarkers.
Collapse
Affiliation(s)
- Jiali Pang
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Bilin Liang
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Ruifeng Ding
- Department of Anesthesiology, Changzheng Hospital, Second Affiliated Hospital of Naval Medical University, Shanghai, China
| | - Qiujuan Yan
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Ruiyao Chen
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Jie Xu
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| |
Collapse
|
15
|
Kim SY. GNN-surv: Discrete-Time Survival Prediction Using Graph Neural Networks. Bioengineering (Basel) 2023; 10:1046. [PMID: 37760148 PMCID: PMC10525217 DOI: 10.3390/bioengineering10091046] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 08/31/2023] [Accepted: 09/04/2023] [Indexed: 09/29/2023] Open
Abstract
Survival prediction models play a key role in patient prognosis and personalized treatment. However, their accuracy can be improved by incorporating patient similarity networks, which uncover complex data patterns. Our study uses Graph Neural Networks (GNNs) to enhance discrete-time survival predictions (GNN-surv) by leveraging relationships in these networks. We build these networks using cancer patients' genomic and clinical data and train various GNN models on them, integrating Logistic Hazard and PMF survival models. GNN-surv models exhibit superior performance in survival prediction across two urologic cancer datasets, outperforming traditional MLP models. They maintain robustness and effectiveness under varying graph construction hyperparameter μ values, with performance boosts of up to 14.6% and 7.9% in the time-dependent concordance index and reductions in the integrated brier score of 26.7% and 24.1% in the BLCA and KIRC datasets, respectively. Notably, these models also maintain their effectiveness across three different types of GNN models, suggesting potential adaptability to other cancer datasets. The superior performance of our GNN-surv models underscores their wide applicability in the fields of oncology and personalized medicine, providing clinicians with a more accurate tool for patient prognosis and personalized treatment planning. Future studies can further optimize these models by incorporating other survival models or additional data modalities.
Collapse
Affiliation(s)
- So Yeon Kim
- Department of Artificial Intelligence, Ajou University, Suwon 16499, Republic of Korea;
- Department of Software and Computer Engineering, Ajou University, Suwon 16499, Republic of Korea
| |
Collapse
|